Problem The database architecture reached its limit when a critical node stopped responding. A standby node experienced a PostgreSQL Patroni replica failure, repeatedly logging “incorrect resource manager data checksum” errors. The system stopped working because corrupted Write-Ahead Log segments completely broke the replication stream. A dangerous shortcut would involve running continuous base backups to force […]
Database 8 May 2026 Enabling WAL archiving on a DR Patroni standby to allow backups from the replicaProblem: A customer running Patroni-managed PostgreSQL v15.17 (Patroni 3.3.2) asked whether Commvault backups can be taken from the DR site’s Standby Leader. The DR cluster is a replicating standby of Production. The request asked specifically whether WAL file generation can be started on the DR Standby Leader and whether Commvault’s option to delete WALs after […]
Knowledge Base Case Studies Data Management and Analytics Database 25 Apr 2026 PostgreSQL cluster recovery after unclean shutdownProblem: A Patroni-managed PostgreSQL cluster (PostgreSQL 15.17, Patroni 3.3.2) running asynchronous replication intermittently failed to rejoin an ex-master after a physical host reboot performed as part of high‑availability testing. Test pattern: hard shutdown of the former primary for ~5 minutes, then restart. Symptom observed on restart: the node failed to rejoin with the error “requested […]
Knowledge Base Case Studies Data Management and Analytics Database 10 Apr 2026 Troubleshooting inconsistent SELECT performance across identical Patroni PostgreSQL clustersProblem: Seven identical Patroni-managed PostgreSQL 15.15 clusters (Patroni 3.3.2) showed divergent behaviour for the same SELECT query: clusters labelled Cust 1–5 returned results quickly while Cust 6–7 returned the same query much slower. Dataset sizes, indexes, constraints and foreign keys were reported as equivalent across clusters. One temporary mitigation (set enable_mergejoin = false) had been […]
Case Studies Data Management and Analytics Database 3 Apr 2026 Pruning oversized Cassandra ‘backups’ folders while preserving incremental retentionProblem: The data mount was approaching full capacity. Investigation showed that the keyspace-level backups directories inside the Cassandra data directories were consuming the majority of space (≈150 GB). No active snapshots were present on the cluster when the issue was reported. The cluster configuration had incremental_backups: true in cassandra.yaml because the customer wanted to retain […]
Knowledge Base Case Studies Data Management and Analytics Database 3 Apr 2026 Resolving high RAM utilization in a Patroni PostgreSQL clusterProblem: A Patroni-managed PostgreSQL cluster (PostgreSQL 15.8, Patroni 2.1.4) was reporting sustained high memory consumption on a host with ~275 GB of physical RAM and only ~8% free. The environment had HugePages enabled at the OS level. Patroni configuration showed shared_buffers=68GB, work_mem=20MB, max_parallel_workers_per_gather=4 and max_connections=2000. Observed behavior included consistently ~90% system memory utilization across a […]
Case Studies Data Management and Analytics Database 3 Mar 2026 Resolving Partitioned Table Update Delays and Space Retention in PostgreSQLProblem: An application UPDATE statement against a partitioned PostgreSQL table (UPDATE tab1 SET sys_update_date = $1, agg_status = $2, output_filename = $3, merge_type = $4 WHERE period_key = $5 and record_id = $6) experienced consistent slowdowns in a nightly early‑morning window. The customer reported that partition file sizes showed normal small footprints for partitions p1–p12, […]
Case Studies Data Management and Analytics Database 25 Feb 2026 Resolving Redis timeouts caused by Redisson MapCache eviction and Lua blockingProblem: Applications using Redisson MapCache began throwing RedisResponseTimeoutException errors (client timeout = 3000 ms) during eviction activity. Errors were raised while executing EVALSHA with context pointing to org.redisson.eviction.MapCacheEvictionTask; application threads were reporting “Unable to evict elements” outputs correlated with redisson-timer-* threads. Environment details: a Redis Cluster (client traffic on shard port 7000) served multiple integration […]
Case Studies Data Management and Analytics Database 30 Jan 2026 PostgreSQL Predicate Pushdown OptimizationProblem: The customer reported degraded performance in a PostgreSQL query joining multiple views and tables, including V_CDD_PRF_PARTY_PARTY_R_DRF. Although the outer query applied a highly selective filter (C.FIRST_PARTY_KEY = 'CDD000248470'), the PostgreSQL optimizer did not push this predicate into the view. As a result, the view was fully evaluated, leading to increased CPU usage, higher I/O, […]
Database