Problem: The client faced intermittent downtimes in their PostgreSQL cluster, which is managed by Patroni for high availability. These downtimes were particularly prominent during failover events when the system failed to transition smoothly between nodes during leader elections. As a result, PostgreSQL was unable to maintain continuity of service, affecting the application performance. Logs from […]
Database 21 Feb 2025 Resolving Nexus Image Deletion IssueProblem: The client experienced a problem where one of the images in their Nexus Repository was deleted unexpectedly without any trace. The client needed assistance in answering the following questions: How was the image deleted and is it possible to recover it? How can future abrupt deletions of images be prevented? How can Nexus logging […]
Developer Tools 17 Feb 2025 Inconsistency in Search Results of Elasticsearch with Reserved CharactersProblem: The client observed inconsistent behavior in Elasticsearch search results when searching for strings containing reserved characters, such as colons, slashes, parentheses, and curly braces. These inconsistencies were most notable when the query string included special characters without proper escaping or when using quotes around the search values. This caused mismatches in expected results, with […]
Database 14 Feb 2025 Improving Cassandra Performance by Adjusting Consistency Levels and Resource ConfigurationProblem: The customer experienced issues with their Cassandra database, specifically with write failures and slow performance during nodetool repair operations. These issues were affecting the application’s ability to interact with the database, resulting in delays and failure to write data. The Cassandra cluster, consisting of 3 nodes in each of two data centers (US East […]
Database 7 Feb 2025 Diagnosing and Resolving SSLPeerUnverifiedException in Apache CassandraProblem: The client reported an issue where Apache Cassandra nodes in their multi-datacenter cluster were logging frequent errors related to SSL certificate validation, with entries like: DEBUG [Native-Transport-Requests-1] 2025-01-23 08:54:27,794 ServerConnection.java:140 - Failed to get peer certificates for peer /10.110.151.78:36376 javax.net.ssl.SSLPeerUnverifiedException: peer not verified Despite these log entries, the cluster continued to function normally, but […]
Database 24 Jan 2025 Apache Cassandra: Resolving High Memory Usage issueProblem: The client reported high memory usage on a production Apache Cassandra node, accompanied by frequent errors related to the ThreadPoolExecutor shutting down. This led to instability in the Cassandra service, including errors like java.util.concurrent.RejectedExecutionException, and resulted in a failure to execute repairs. Process: Step 1: Initial Identification The error logs provided by the client […]
Database 19 Jan 2025 Apache Spark: Resolving Airflow Scheduler Heartbeat Issues in Production EnvironmentProblem: The client reported continuous heartbeat issues in the Airflow scheduler, causing failure to generate controller DAGs in a production environment. This critical issue impacted job execution, especially when multiple jobs were triggered simultaneously, leading to timeouts and job failures. Process: Step 1: Initial Identification The error message displayed in the logs indicated that the […]
Data Analytics 17 Jan 2025 Resolving Special Character Search Issues in ElasticsearchProblem: The client encountered an issue in their Elasticsearch setup where search results did not return exact matches when the search phrase included special characters, such as “:” (colon). This problem persisted despite using a custom indexing configuration with the `index_word_delimiter_graph_filter`. The client needed a solution to preserve special characters for exact matches while maintaining […]
Data Analytics 15 Jan 2025 Resolving Kubelet Certificate Expiration Issues in Rancher ClustersProblem: The client’s production environment includes Rancher installed on two clusters: a Rancher cluster and an application cluster. During the cluster setup, the kubelet certificate was generated with a validity of one year, which recently expired. According to the Rancher RKE documentation, additional configuration is needed to manage certificate validity. The client observed inconsistencies: Some […]
Developer Tools