Problem:

An OLTP system using Patroni-managed PostgreSQL 15 experienced global slowdowns under load: CPU climbed to ~80%, large numbers of concurrent sessions, and application timeouts. The client reported bursts of traffic (hundreds of workers in parallel) routed through HAProxy to the leader, and provided PostgreSQL and Patroni logs, auto_explain output, and query samples. Key database configuration: shared_buffers=82GB, work_mem=2MB, max_connections=3000, default_statistics_target=100. The application’s heavy queries targeted daily range‑partitioned tables (hundreds of partitions) and produced long-running operations and widespread LWLock wait events attributed to LockManager contention.

Process:

Step 1: Verify symptom scope and collect runtime artifacts

Activity logs and client telemetry showed two high-load windows with thousands of log lines per hour and nearly two thousand distinct backend PIDs in the hour. The uploaded artifacts included PostgreSQL server logs with auto_explain, Patroni logs, query samples, and an extracted schema for the partitioned CDR tables. Confirming the scale mattered because it established whether the issue was isolated slow queries or system‑wide contention.

Step 2: Inspect wait events and session state

pg_stat_activity samples and auto_explain output were examined next. A large fraction of active sessions reported waiting on LWLock:LockManager or were CPU bound rather than waiting on I/O. This indicated a serialization point inside PostgreSQL lock management rather than disk saturation or a single bad plan, directing investigation toward relation/lock counts and partition behavior.

Step 3: Review schema, partitioning and index definitions

The schema dump for the CDR tables showed RANGE partitioning by a timestamp column with ~306 daily partitions and no index leading on the subscriber identifier used by the application queries. This discovery mattered because every partition is a separate relation: queries that cannot prune partitions will open and lock many relations, multiplying lock acquisition work and lock-manager pressure.

Step 4: Correlate query shape with lock volume

Auto_explain plans and logged statements showed the problematic queries filtered by subscriber/account fields but lacked a constant predicate on the partition key, forcing an Append across hundreds of partitions and sequential scans inside each. Measured buffer reads and per‑query durations demonstrated that single queries touched large numbers of partitions and buffers, explaining long-lived relation locks that exceeded PostgreSQL’s per-backend fast-path slot limit (16) and spilled locks into the shared LockManager partitions.

Step 5: Measure lock-manager overflow

pg_locks aggregation during the incident window showed many non-fastpath locks (lock entries beyond the first 16 per backend) and confirmed that lock-manager partitions (protected by a small number of LWLocks) were saturated. This clarified the exact contention mechanism and justified changes that reduce the number of relation locks per query and the number of concurrent lock holders.

Step 6: Identify configuration contributors and HA risk

Configuration review highlighted work_mem=2MB and max_connections=3000. Excessive backend concurrency increased the number of concurrent lock holders; low work_mem produced on-disk spills that prolonged lock hold times. Patroni health-check queries were observed to time out under saturation, proving a near-miss for unintended failover—so protecting the HA monitoring connection became a priority.

Step 7: Implemented remediation and transition to solution

Based on the evidence, remediation focused on reducing per-query lock count and per-partition scan cost while limiting concurrency for long-running workloads. Changes were applied to queries, schema, and role settings to remove the contention driver and stabilize HA behavior; these actions are summarized in the Solution below.

Solution:

Implemented changes: application queries were rewritten to include a date (partition key) predicate so PostgreSQL can prune partitions; a subscr_no‑leading index was created on the CDR partitions to avoid per-partition sequential scans; the Patroni role was altered to set statement_timeout = 0 to prevent HA-monitoring queries from being canceled under load; connection concurrency was capped using a connection pooler pattern and work_mem was increased after capping real backend concurrency.

Why this works: partition pruning reduces the number of relations each backend must open and lock, keeping relation locks within PostgreSQL’s fast‑path slots and avoiding LockManager LWLock serialization. A subscr_no‑leading index reduces per‑partition scan time and lock-hold duration. Capping concurrency reduces the number of concurrent lock holders; increasing work_mem removes expensive spill I/O that lengthened lock durations. Together these architectural and configuration changes eliminate the LockManager contention and stabilize the leader under load.

Conclusion:

After the changes, CPU and LWLock wait events returned to baseline during comparable traffic patterns, application timeouts stopped, and Patroni monitoring operations no longer timed out under peak load. The combined fixes reduced global contention risk, improved query latency for the CDR workload, and mitigated the failure domain for HA monitoring and failover stability.