Problem: The client observed inconsistent behavior in Elasticsearch search results when searching for strings containing reserved characters, such as colons, slashes, parentheses, and curly braces. These inconsistencies were most notable when the query string included special characters without proper escaping or when using quotes around the search values. This caused mismatches in expected results, with […]
Database 14 Feb 2025 Improving Cassandra Performance by Adjusting Consistency Levels and Resource ConfigurationProblem: The customer experienced issues with their Cassandra database, specifically with write failures and slow performance during nodetool repair operations. These issues were affecting the application’s ability to interact with the database, resulting in delays and failure to write data. The Cassandra cluster, consisting of 3 nodes in each of two data centers (US East […]
Database 7 Feb 2025 Diagnosing and Resolving SSLPeerUnverifiedException in Apache CassandraProblem: The client reported an issue where Apache Cassandra nodes in their multi-datacenter cluster were logging frequent errors related to SSL certificate validation, with entries like: DEBUG [Native-Transport-Requests-1] 2025-01-23 08:54:27,794 ServerConnection.java:140 - Failed to get peer certificates for peer /10.110.151.78:36376 javax.net.ssl.SSLPeerUnverifiedException: peer not verified Despite these log entries, the cluster continued to function normally, but […]
Database 24 Jan 2025 Apache Cassandra: Resolving High Memory Usage issueProblem: The client reported high memory usage on a production Apache Cassandra node, accompanied by frequent errors related to the ThreadPoolExecutor shutting down. This led to instability in the Cassandra service, including errors like java.util.concurrent.RejectedExecutionException, and resulted in a failure to execute repairs. Process: Step 1: Initial Identification The error logs provided by the client […]
Database 19 Jan 2025 Apache Spark: Resolving Airflow Scheduler Heartbeat Issues in Production EnvironmentProblem: The client reported continuous heartbeat issues in the Airflow scheduler, causing failure to generate controller DAGs in a production environment. This critical issue impacted job execution, especially when multiple jobs were triggered simultaneously, leading to timeouts and job failures. Process: Step 1: Initial Identification The error message displayed in the logs indicated that the […]
Data Analytics 17 Jan 2025 Resolving Special Character Search Issues in ElasticsearchProblem: The client encountered an issue in their Elasticsearch setup where search results did not return exact matches when the search phrase included special characters, such as “:” (colon). This problem persisted despite using a custom indexing configuration with the `index_word_delimiter_graph_filter`. The client needed a solution to preserve special characters for exact matches while maintaining […]
Data Analytics 15 Jan 2025 Resolving Kubelet Certificate Expiration Issues in Rancher ClustersProblem: The client’s production environment includes Rancher installed on two clusters: a Rancher cluster and an application cluster. During the cluster setup, the kubelet certificate was generated with a validity of one year, which recently expired. According to the Rancher RKE documentation, additional configuration is needed to manage certificate validity. The client observed inconsistencies: Some […]
Developer Tools 13 Jan 2025 Ensuring Zero Downtime: Upgrading Apache Cassandra from 3.11.7 to 4.1.0Problem: The client needed to upgrade their production Apache Cassandra system from version 3.11.7 to 4.1.0 to take advantage of new features and improvements. They requested guidance on upgrade procedures and a reliable source for downloading the required RPM. Zero downtime was a critical requirement to ensure uninterrupted operations during the upgrade process. Process: Step […]
Database 10 Jan 2025 Mitigating Frequent Docker Swarm Re-Elections by Adjusting Timeout ParametersProblem: The client faced issues with frequent re-elections in a Docker Swarm cluster whenever there were brief server-level disruptions. They sought guidance on modifying the swarm election timeout to stabilize the cluster and prevent unnecessary re-elections. Additionally, they wanted to understand the relationship between election timeout, heartbeat, and dispatcher-heartbeat settings. Process: Step 1: Initial Investigation […]
Developer Tools