Problem:

The client has master and standby configuration and connections are managed by pgpool (version 4.1.5) in the PostgreSQL (version 11.5) production environment. Recently the client faced a switchover/failover when an “idle_in_transaction” session was killed using “pg_terminate_backend” after connecting via pgpoolvip.

Process:

Step 1 – Initial investigation:

The expert team gathered the information and started the troubleshooting:

  • Configuration files: For both pgpool and PostgreSQL. This helped to identify any misconfigurations or incorrect settings that could be causing the issue.
  • Logs: The logs of pgpool and logs of the PostgreSQL master during the issue period when the switchover happened.
  • System resources: Requested information about the system resources, such as CPU, memory, and disk space usage. This helped to identify any resource-related issues that could be causing the issue.
  • Versions: Asked to provide the versions of PostgreSQL and pgpool being used.
  • Current State of the Client’s PostgreSQL Environment.
  • Step 2 – Further investigation:

    The expert team found that:

  • The described issue highlights a problematic behavior in pgpool where terminating sessions directly on the master server, rather than through pgpool, leads to unintended consequences.
  • When pgpool detects abrupt termination of connections, it misinterprets this as a master server failure, triggering a failover process.
  • To avoid this problem, sessions should always be terminated through pgpool, ensuring proper handling of connections and preventing unnecessary failovers. This emphasizes the importance of following recommended procedures for session management in systems utilizing pgpool to maintain stability and reliability.
  • Solution:

    After a detailed investigation of the information, the expert team provided to the client the next instruction:

    Connect to the database via pgpool and terminate the session. As for the reason when pgpool sees that its connections are suddenly killed, it concludes that the master server is failing and performs a failover. (more information in the link: https://postgrespro.com/list/thread-id/2559842)

    Conclusion:

    The switchover/failover in the client’s PostgreSQL production environment occurred because terminating “idle_in_transaction” sessions directly through the master server caused pgpool to misinterpret these actions as server failure. The issue was resolved by adhering to recommended procedures for session management, ensuring all session terminations are handled through pgpool. Following this approach, as detailed in the solution provided by the expert team, prevented unnecessary failovers and stabilized the system.