Problem:

The client encountered a persistent issue related to file descriptor exhaustion on their PostgreSQL version 15 database, running on a Patroni High Availability cluster with RHEL 8.10. The PostgreSQL logs frequently reported the error: “out of file descriptors: Too many open files; release and retry” during database operations.

Although the client had significantly increased the OS-level file descriptor limits, the problem continued to occur. The client considered decreasing the PostgreSQL parameter max_files_per_process from 4096 to 400 in an attempt to address the issue and sought expert guidance on this approach.

Process:

Step 1 – Initial Analysis

The expert reviewed the system logs and PostgreSQL configuration shared by the client. It was confirmed that the PostgreSQL system was hitting file descriptor limits under high load, despite the high OS and user-level limits configured:

  • fs.file-max = 15,000,000
  • fs.nr_open = 10,000,000
  • Per-user limits: soft 8,000,000, hard 10,000,000

The PostgreSQL parameters in question were:

  • max_connections = 5000
  • max_files_per_process = 4096

The expert calculated the theoretical maximum file descriptor usage:
5000 connections × 4096 files per process = 20,480,000 file descriptors, which significantly exceeds the available system limits, explaining why the error persisted.

Step 2 – Evaluation of Risk and Proposed Parameter Changes

The client initially considered drastically reducing max_files_per_process to 400, but the expert warned against this change. Reducing the setting without also reducing the number of connections would not effectively solve the issue and could instead cause query latency or simply postpone the system failure.

The expert emphasized that the core problem was the excessive number of allowed connections. A PostgreSQL configuration supporting 5000 simultaneous connections is extremely difficult to scale and typically requires external connection management.

Step 3 – Detailed Recommendations

The expert provided a step-by-step action plan:

  • Reduce max_connections to 500–800 (High Priority)
    The expert strongly recommended using a connection pooler, such as PgBouncer in transaction pooling mode, to handle a large number of application requests while significantly reducing the direct load on PostgreSQL.
  • Maintain or Slightly Reduce max_files_per_process
    Keeping max_files_per_process at 4096 would be acceptable if the number of connections is reduced. For example, with 800 connections and 1024 files per process, the worst-case file descriptor usage would be approximately 819,200, which is within safe system limits.
  • Verify OS-Level Limits Are Properly Applied
    The expert instructed the client to check the actual applied limits using the /proc/<pid>/limits file for the PostgreSQL process. If the limits were not applied as expected, adjustments were recommended using PAM configurations and systemd overrides to ensure PostgreSQL can utilize the configured high file descriptor limits.
  • Monitor for File Descriptor Leaks
    The expert advised using system tools like lsof to track active file descriptors and identify potential leaks caused by long-lived idle connections, unclosed cursors, or problematic client-side code.
  • Enable Additional Logging
    PostgreSQL logging configurations should be updated to track temporary file creation and connection activity to support ongoing monitoring and issue diagnosis.

Solution:

The expert provided the following solution:

  • Reduce max_connections to the range of 500–800.
  • Use PgBouncer for connection pooling to efficiently manage a large number of incoming requests without overwhelming PostgreSQL.
  • Keep max_files_per_process at 4096 or reduce it slightly if connection limits are lowered.
  • Confirm that OS-level file descriptor limits are correctly enforced for PostgreSQL through PAM and systemd configuration adjustments.
  • Implement regular monitoring using lsof and PostgreSQL logs to detect file descriptor leaks and track system behavior under load.
  • Avoid reducing max_files_per_process to 400 without significantly reducing the number of concurrent connections, as this could create additional performance bottlenecks.

Conclusion:

The root cause of the file descriptor exhaustion was the excessive number of allowed PostgreSQL connections combined with the file descriptor usage per process. Simply increasing OS limits was not sufficient to prevent the issue due to PostgreSQL’s configuration and the high system load.

The expert’s recommended solution focused on reducing the number of direct PostgreSQL connections through the use of a connection pooler, which would significantly reduce the demand on file descriptors and stabilize the system. Proper OS-level configuration, along with proactive monitoring and logging, were essential to ensure the reliability of the PostgreSQL cluster moving forward.

By following the expert’s guidance, the client would achieve a more sustainable PostgreSQL setup capable of supporting high-demand applications while preventing file descriptor exhaustion.