Problem:
The client experienced connection issues on the PostgreSQL database server, with an abnormally high number of connections reaching around 20,000 at a given time. The client asked for assistance from the expert team in identifying the possible reasons behind this issue. Based on the client’s analysis, there were multiple wait events on HAProxy, and the maximum connection limit is set to approximately 10,000.
Process:
The expert team started investigating logs and the troubleshooting process.
Step 1:
The team’s investigation and recommendations included:
Step 2:
During the client’s live session, our team, alongside an expert, tackled several critical issues:
Step 3:
1. Modified HAProxy Logging Level
2. Reloaded HAProxy
3. Evaluated Impact on HAProxy Performance
Solution:
After the investigation, the expert team connected to the HAProxy server, and edited the haproxy.cfg file, and set the logging level to debug to modify the login level. Verified the configuration for syntax errors and reloaded HAProxy gracefully or restarted it if necessary. Monitored system performance due to increased logging, and reverted to the default logging level after troubleshooting.
Conclusion:
The client faced connection issues on the PostgreSQL database server, with connections surging to about 20,000 at a time. The expert team investigated, focusing on HAProxy settings, possible application faults, and system configurations. They recommended steps including querying the HAProxy version, checking the application’s connection management, examining OS specifics, using netstat, tuning TCP parameters, and considering PGBouncer. The team modified the HAProxy logging level to debug, reloaded HAProxy configurations, and assessed performance impacts. They discussed findings and solutions with the client, planned follow-up meetings, and aimed to return to standard logging post-troubleshooting.