Problem: The client had a 10-node cluster across two data centers, with 5 nodes in each. They ran nodetool repair on one keyspace, but after 6 hours, it was stuck at 99% completion. They requested guidance on how to proceed with the issue, noting that their product version was ‘ReleaseVersion: 2.2.5.’ Process: Step 1 – […]
Database 8 Aug 2024 Cassandra Superuser Password Reset IssueProblem: Cassandra v4.0.6 in non-production environments experienced an issue where the “cassandra” superuser password, which had been changed two months prior, reset to its default password (“cassandra”) after applying OS patches and rebooting the server. No manual password changes were evident in the audit logs. Solution: Initial Steps: The superuser “cassandra” initially had the default […]
Database 2 Aug 2024 Changing Passwords in a Cassandra ClusterProblem: The client needed to change the passwords of all users in the Cassandra cluster. It was specifically inquired about the necessity of changing the default password for the “cassandra” superuser and requested a step-by-step guide, along with precautions to prevent any impact on the application. Process: The expert provided a detailed response with the […]
Database 1 Aug 2024 Intermittent Table Update Issue in Cassandra DBProblem: The table is not updated immediately in Cassandra DB. The reference id table is not updated sometimes and the application gets old values. Process: Step 1: In the initial investigation and troubleshooting of the DB our experts asked the client the following questions: Cluster Configuration How many nodes are there in the cluster? What […]
Database 26 Jul 2024 Cassandra Memory Allocation Issue and ResolutionProblem: The client experienced frequent warnings in their Cassandra nodes, indicated by the log entry: INFO [CompactionExecutor:37] 2023-02-23 12:00:12,268 NoSpamLogger.java:91 – Maximum memory usage reached (536870912), cannot allocate chunk of 1048576 The client requested an investigation into the potential impact of increasing the file_cache_size_in_mb in the cassandra.yaml file, whether a restart (bounce) would be necessary, […]
Database 11 Jul 2024 WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUMProblem: The client encountered a Cassandra exception: “WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM.” This issue, occurring since March 9, 2023, revolves around an INSERT INTO query. They seek troubleshooting assistance as this exception had not occurred before that date. Process: Step 1 – Initial Investigation and Troubleshooting: The expert team initiated […]
Database 10 Jul 2024 Network Instability Causing Keepalived Crashes and Application ErrorsProblem: The client reported issues with Keepalived crashes leading to high availability disruptions and application errors, particularly connection timeouts with the PostgreSQL server. Initial investigations revealed suspicions of network instability and outdated software versions contributing to the problem. Process: Requesting initial information for further investigation of the problem The number of servers in the HAProxy […]
Database 8 Jul 2024 Database in the Cassandra cluster generates a large number of commitlogsProblem: In the Cassandra cluster, the database generated a large number of commit logs and didn’t delete them. Due to this, the commit logs filesystem is getting full and the database is crashing. This is relevant for all nodes. Process: Step 1: Initial Investigation and gather information from the client Initial troubleshooting and information gathering […]
Database 5 Jul 2024 Diagnosing and Resolving SSL SYSCALL Errors in PostgreSQL with PatroniProblem: The client reported an intermittent issue with their PostgreSQL database managed by Patroni. The error message encountered was “SSL SYSCALL error: EOF detected”. Despite checking the PostgreSQL logs and HAProxy logs, no corresponding errors were found. The client attempted to change the idle_in_transaction_session_timeout parameter from 1 hour to unlimited, but the error persisted. Solution: […]
Database