Problem: The client had a 6-node multi-DC setup for Cassandra (3 nodes in PROD – East US2 and 3 nodes in DR – West US2) and needed to alter a keyspace. The keyspace was initially defined as follows: CREATE KEYSPACE bulk_api_management WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true; The client wanted […]
Database 28 Aug 2024 Optimizing Cassandra Storage with RAID0 ArrayProblem: The client managed a 5-node Cassandra cluster across two data centers (DC1 and DR1), each containing 5 nodes. The data_file_directories were distributed across multiple mount points. On one node, the mount point /cassandra/data2 was nearly full due to a large table in the “jesi” keyspace, specifically the “service_monitoring_payload” table. This resulted in significant storage […]
Database 26 Aug 2024 Resolving SSL Connection Issues in PostgreSQLProblem: The application fails to establish an SSL connection to the database, displaying the error “could not accept SSL connection: Success” in the database logs. Despite this, manual `psql` connections using SSL work fine. Process: Step 1 – Verify Connection String: Confirm that the application’s connection string includes the necessary SSL parameters. Example: postgresql://user:password@hostname:port/dbname?sslmode=require Step […]
Database 22 Aug 2024 High connection issues on the PostgreSQL database serverProblem: The client experienced connection issues on the PostgreSQL database server, with an abnormally high number of connections reaching around 20,000 at a given time. The client asked for assistance from the expert team in identifying the possible reasons behind this issue. Based on the client’s analysis, there were multiple wait events on HAProxy, and […]
Database 19 Aug 2024 Ensuring Successful Data Restoration from Cassandra 3.11 to 4.0.6Problem: Restoring a Cassandra 3.11 snapshot to a 4.0.6 cluster using the nodetool refresh command results in an empty table, indicating a potential compatibility issue. This affects the DR environment, which needs to accurately replicate the PROD environment’s data. Solution: Step 1. Verify Snapshot Content: Use nodetool listsnapshots and a test environment to ensure the […]
Database 16 Aug 2024 Cassandra superuser password is getting reset after server restartProblem: The client ran Cassandra v4.0.6 in non-production environments and noticed that the “Cassandra” superuser’s password (which was changed two months ago) was observed to be reset to its old password (default password “Cassandra”). After patches were applied to the OS the server rebooted (a monthly activity). The client didn’t see any evidence of someone […]
Database 16 Aug 2024 Exhaustion of Data Storage Space Due to Incremental Backup LocationProblem: The client reported that their data storage space was being rapidly exhausted because incremental backups were stored in the same location as the primary data storage. This resulted in the storage becoming full, forcing them to delete incremental backups to make room for new data. The client requested guidance on configuring incremental backups to […]
Database 15 Aug 2024 Postgres Database Cluster Crash InvestigationProblem: Three instances of the production Postgres Database cluster experienced crashes with “segmentation fault” errors within 30 days. Despite no recent changes in the system, the issue persisted, prompting the need for investigation to identify the root cause. Process: Upon receiving the initial report of the issue, the experts engaged with the client to gather […]
Database 12 Aug 2024 Failover Investigation and Resolution for PostgreSQL ClusterProblem: PG Prod Node 1 failed over to Node 2, accompanied by high WAL (Write-Ahead Logging) generation. The client requested an investigation into the cause of the failover on Node 1. Process: To conduct a thorough investigation, the following data and logs were requested: PostgreSQL logs Cluster logs Database logs WAL logs Cluster configuration Monitoring […]
Database