Case Studies Archives - Page 25 of 26 - Proactive Insights and Support For Open-Source Applications

8 Apr 2024 Troubleshooting Connectivity Issues in a Cassandra Cluster

Problem: The client encountered challenges connecting to the cqlsh on several nodes within their Cassandra cluster. Additionally, discrepancies were noted in the output of “nodetool status” across different nodes, with certain nodes appearing as down. Seeking assistance, the client provided output files for analysis, prompting intervention to rectify the connectivity issues. Process: Check Network Connectivity: […]

Database 7 Mar 2024 Resolving NGINX Ingress Error During Helm Chart Installation

Problem: The client encountered an error during the installation of a Helm chart, specifically related to the NGINX admission controller. The installation failed due to a validation error in NGINX Ingress, indicating a synchronization issue with the ingress event handlers. This problem arose when multiple ingresses used the same secret, causing complications with secret refreshing […]

Application Development 5 Mar 2024 Scaling Airflow and Spark on Kubernetes with Hossted

Introduction The installation of Airflow and Apache Spark can be fine-tuned for optimalperformance by adjusting over 150 environment variables, thereby maximizingthe number of DAGs running and fully utilizing allocated resources.During recent hosted support sessions with an ISV that develops software fortelecommunications companies, we encountered multiple challenges ineffectively scaling their Airflow deployment. Specifically, we focused onoptimizing […]

Case Studies 4 Mar 2024 Resolving Grafana Alerting Issues

Problem The client faced a critical issue with their Grafana setup: Grafana alerts were failing to trigger when configured thresholds were breached, and the “TEST ALERT” feature consistently resulted in a “NO DATA” message. Process Step 1: Initial Investigation To address this issue, a multi-step approach was taken. In the initial investigation, a meeting was […]

Case Studies 15 Nov 2023 Resolving Memory Consumption Issues in PostgreSQL Cluster

Problem: The client reported high memory consumption on both leader (Node 1) and replica (Node 2) nodes in PostgreSQL version 13. Memory utilization on both nodes was observed to be significantly elevated. On Node 1, high memory usage was associated with PostgreSQL processes such as checkpoint and background writer operations, while Node 2 was undergoing […]

Database 13 Nov 2023 Resolving Cassandra CorruptSSTableException Issue

Problem: The client reported that Cassandra 2.2.5, running in a single-node configuration, crashed and failed to start. The error logs pointed to a “CorruptSSTableException” in the “system_traces.events” table, indicating corruption in an SSTable file. Since this was a critical system table, the client needed a way to bypass the corrupted data and restart the Cassandra […]

Database 11 Nov 2023 Resolution of Cassandra Cluster Pending Tasks Issue

Problem: The client reported a critical increase in pending tasks on one of the nodes within their Cassandra cluster. This issue was causing concern, and the client sought assistance in understanding the root cause and implementing a resolution. Process: The client initially executed the nodetool compaction-stats -H command on the affected node and restarted it, […]

Database 8 Nov 2023 Title: Stabilizing Applications Amidst DNS Challenges: Tackling Intermittent Connection Issues in Kubernetes

Title: Stabilizing Applications Amidst DNS Challenges: Tackling Intermittent Connection Issues in Kubernetes Problem: The customer is facing intermittent connection issues in their application, resulting in “connection refused” errors in the logs. These errors are linked to DNS resolution failures and connection timeouts. As a temporary fix, the customer is restarting the CoreDNS pod every hour, […]

Developer Tools 6 Nov 2023 Resolving Cassandra Node Crashes Due to Heavy Server Activity and Interface Instability

Problem: The client reported recurring crashes of a Cassandra node with errors related to “too many open files”. Despite increasing the maximum open files limit, the issue persisted. The problem was observed primarily during high server load, with regular crashes around 01:15 AM. The client suspected that network instability or heavy operations, such as running […]

Database