Problem: The client reported encountering memory-related issues and tombstone cell messages in the system.log file of their Apache Cassandra deployment. Notable log entries included warnings about maximum memory usage, tombstone cells, and concerns about a hanging repair process. The issue seemed to have improved temporarily after increasing the heap size but resurfaced after two days. […]
Database 9 Apr 2024 Compatibility Issue between OpenJPA 3.2.2 and Spring Boot 3.1.xProblem: The challenge arises when attempting to upgrade from Spring Boot (SB) version 2.7.x to SB 3.1.x due to compatibility issues with OpenJPA. No compatible version of OpenJPA has been found for Spring Boot 3.1.x. Upgrading to Spring Boot 3.1.x presents hurdles in terms of OpenJPA compatibility. The current version of OpenJPA lacks support for […]
Application Development 9 Apr 2024 Troubleshooting Excessive Commit Log Accumulation Leading to Cassandra Cluster FailuresProblem: The problem is that commit logs in the production Cassandra cluster are accumulating excessively without being deleted, leading to a full filesystem and subsequent database crashes. Process: Step 1: Hardware Specifications and Disk Space: Requested hardware specifications for each Cassandra node. Checked disk space on all nodes using df -h command. Step 2: System […]
Database 8 Apr 2024 Resolving a Critical PostgreSQL Database Locking IssueProblem: The client encountered a critical issue where LW Locking on the PostgreSQL DB level, specifically the MultiXactOffsetControlLock, led to a complete outage of the production system. This resulted in 255 sessions being locked from 18:30 GMT to 22:30 GMT, causing a significant impact on the system’s availability. Solution: PostgreSQL Version and Configuration: The PostgreSQL […]
Database 8 Apr 2024 Troubleshooting Connectivity Issues in a Cassandra ClusterProblem: The client encountered challenges connecting to the cqlsh on several nodes within their Cassandra cluster. Additionally, discrepancies were noted in the output of “nodetool status” across different nodes, with certain nodes appearing as down. Seeking assistance, the client provided output files for analysis, prompting intervention to rectify the connectivity issues. Process: Check Network Connectivity: […]
Database 7 Mar 2024 Resolving NGINX Ingress Error During Helm Chart InstallationProblem: The client encountered an error during the installation of a Helm chart, specifically related to the NGINX admission controller. The installation failed due to a validation error in NGINX Ingress, indicating a synchronization issue with the ingress event handlers. This problem arose when multiple ingresses used the same secret, causing complications with secret refreshing […]
Application Development 5 Mar 2024 Scaling Airflow and Spark on Kubernetes with HosstedIntroduction The installation of Airflow and Apache Spark can be fine-tuned for optimalperformance by adjusting over 150 environment variables, thereby maximizingthe number of DAGs running and fully utilizing allocated resources.During recent hosted support sessions with an ISV that develops software fortelecommunications companies, we encountered multiple challenges ineffectively scaling their Airflow deployment. Specifically, we focused onoptimizing […]
Case Studies 4 Mar 2024 Resolving Grafana Alerting IssuesProblem The client faced a critical issue with their Grafana setup: Grafana alerts were failing to trigger when configured thresholds were breached, and the “TEST ALERT” feature consistently resulted in a “NO DATA” message. Process Step 1: Initial Investigation To address this issue, a multi-step approach was taken. In the initial investigation, a meeting was […]
Case Studies 15 Nov 2023 Resolving Memory Consumption Issues in PostgreSQL ClusterProblem: The client reported high memory consumption on both leader (Node 1) and replica (Node 2) nodes in PostgreSQL version 13. Memory utilization on both nodes was observed to be significantly elevated. On Node 1, high memory usage was associated with PostgreSQL processes such as checkpoint and background writer operations, while Node 2 was undergoing […]
Database