Problem: The client is experiencing daily automatic restarts of their Cassandra database, potentially causing issues with application connectivity. Additionally, they’re encountering errors related to JVM memory, degraded mode, connection resets by peer, and null pointer exceptions on indexes. Solution: After a thorough analysis of the provided logs, the following findings and recommendations were made: Identify […]
Database 20 Jun 2024 Cassandra Database Unavailability and Performance IssuesProblem: The client reported instances of Cassandra databases experiencing unavailability and performance degradation, leading to application connection errors. Specifically, nodes were found to be unresponsive, and memory and CPU utilization spiked unexpectedly. Additionally, there were concerns about data inconsistencies between data centers and excessive tombstones affecting performance. Process: Initial Investigation: Upon initial investigation, it was […]
Database 17 Jun 2024 Postgres – query performance degraded after bulk deleteProblem: After deleting 52 million rows from a table, the client performed a VACUUM FULL and REINDEX on the table. Following these maintenance operations, the performance of large joins involving this table has significantly deteriorated, with query execution times increasing from minutes to hours. The task is to identify the cause of this performance regression […]
Database 16 Jun 2024 Application crashes with “could not send data to client: Connection reset by peer”Problem: The client had 10 monthly bill cycles, and during such bill cycle days, the billing process was connected to the DB cluster in 20 streams. After the process had started the replication the above-mentioned billing process failed with the “could not send data to the client: Connection reset by peer” error. The client reduced […]
Database 14 Jun 2024 Troubleshooting Apache Cassandra Query Timeout IssueProblem: The client experienced timeout errors when running adhoc queries in an Apache Cassandra cluster. Specifically, a query with multiple conditions was timing out due to the coordinator node not receiving responses from replica nodes. Process: Here are the steps taken by the expert in the process of resolving the problem: Assessment of Query Efficiency: […]
Database 13 Jun 2024 Resolving Connection Error in cqlsh with SSLProblem: A client encountered connection errors while attempting to connect to a Cassandra database cluster using cqlsh. The error messages indicated issues related to protocol version compatibility and SSL certificate verification failures (“This version of the driver does not support protocol version 21”). Process: Initial Diagnosis: The support team analyzed the error messages provided by […]
Database 7 Jun 2024 Application crashes with ‘could not send data to the client: Connection reset by peer’Problem: The client’s billing system experienced around 10 billing cycles each month. During each billing cycle, the billing process was initiated and connected to the database cluster with 20 concurrent streams. However, upon starting the process, both the replication and the billing process failed, displaying the following error: “Could not send data to the client: […]
Database 7 Jun 2024 Implementing SSL Communication in a Patroni/etcd/Postgres ClusterProblem: The client seeks to configure SSL communication within an existing Patroni/etcd/Postgres cluster, specifically aiming to switch to HTTPS in the Patroni configuration file to secure communication between components. Solution: After a thorough analysis, the following recommendations were made: Certificate Generation Utilize OpenSSL or obtain a certificate from a trusted Certificate Authority (CA). For self-signed […]
Database 18 May 2024 Resolving High Read Latency in Production Cluster: A Comprehensive Troubleshooting ApproachProblem: The client is experiencing high read latency in their production cluster monitoring. They are seeking assistance in identifying the cause of this latency and resolving it to prevent potential outages. Process: Steps and measures undertaken to investigate the issue: Initial Assessment: Requested logs/config from all nodes. Observed server overload or potential network issues. Configuration […]
Database