Problem: The client planned to migrate a 5-node Cassandra cluster from an on-premises environment (version 3.11.8) to AWS (target version 4.1.5). The client requested guidance on the best migration strategy that ensures no downtime. Additionally, the client requested information on backup and restore procedures for the migration. Solution: The expert recommended a step-by-step approach. First […]
Database 23 Dec 2024 Optimizing Performance in a Cassandra Cluster Experiencing High CPU UsageProblem: The client experienced issues with uneven data distribution across nodes after adding new nodes to an existing Cassandra cluster. Upon reviewing the “nodetool status” output, it was observed that new nodes were not receiving equal data compared to existing ones, resulting in significant data discrepancies between nodes. The client sought assistance in understanding why […]
Database 18 Dec 2024 Data Synchronization Issue in Cassandra Cluster After Adding a New Data CenterProblem: The client reported a critical issue with the Cassandra cluster after adding a new data center and a rack containing three nodes. Despite bringing the new data center online, no data was being transferred from the source data center. Additionally, attempts to run a repair operation on the nodes were unsuccessful, which prevented the […]
Database 16 Dec 2024 Resolving Data Consistency Issues in Cassandra When Adding a New Data CenterProblem: The client needed to add a new data center to their existing Cassandra DB cluster for a critical project. However, upon starting Cassandra on the new server, it encountered a shutdown error due to a required node being offline. The error message, “A node required to move the data consistently is down,” indicated an […]
Database 11 Dec 2024 Addressing SSTable Corruption and Data Migration Challenges in Cassandra EnvironmentsProblem: The client is encountering a “SSTable corruption” issue when starting Cassandra in a new PLAB environment created using a CloudFormation template. After copying EBS volumes from a disaster recovery (DR) environment and making necessary adjustments in the cassandra.yaml file, they receive a series of NullPointerExceptions related to the SSTableReader while attempting to open SSTables. […]
Database 9 Dec 2024 Troubleshooting Authentication Failures and Node Reattachment in Pgpool-II SetupProblem: The client experienced an authentication failure during health checks in their Pgpool-II setup, which led to a failover event. Despite updating the password in pool_passwd and pgpool.conf using the pg_md5 utility, the client continued to face the same issue. They observed that after failing over the node due to the authentication issue, they successfully […]
Database 6 Dec 2024 Proactive Monitoring and Support for Apache Cassandra During iPhone Launch EventProblem: The client is preparing for an iPhone launch event, anticipating traffic spikes up to 200%. They require proactive monitoring of their Apache Cassandra production system during specified timeframes, with an upgrade to Severity 1 for immediate response during those periods. Process: Ticket Acknowledgment: We confirmed availability for the requested support dates and asked for […]
Database 2 Dec 2024 Resolving Cassandra Query Timeout Issues: Optimizing Performance and Ensuring StabilityProblem: The client reported encountering a request timeout error when querying the PLDT Cassandra database in a production environment. The specific query involved selecting records from the jesi.service_monitoring table, which was attached along with a screenshot for further context. Process: Upon receiving the issue, the support team initiated an investigation. They first inquired about the […]
Database 29 Nov 2024 Rolling Upgrade of ETCD and Patroni Nodes in a Multi-Node PostgreSQL ClusterProblem: The client wanted to perform a rolling upgrade of the underlying operating system from RHEL 7 to RHEL 9 for their ETCD nodes in a Patroni-managed PostgreSQL cluster. The cluster contained three ETCD nodes and three Patroni-managed PostgreSQL instances (one primary and two standby). With a Recovery Point Objective (RPO) and Recovery Time Objective […]
Database