Problem: The client implemented a 4-node OpenSearch cluster to ensure high availability for their application. When all four nodes were operational, both indexing and searching worked seamlessly. However, during a high availability test where two nodes were intentionally turned off, the indexing process stalled, and no documents were processed. Indexing resumed only after the two […]
Data Analytics 1 Jan 2025 High Resource Utilization from Istio Sidecar ContainersProblem: The client, a FinTech company, managing thousands of microservices using Istio in sidecar proxy mode, faced high CPU and memory utilization. This was caused by the overhead from Istio sidecars, which were handling: Traffic encryption and decryption with mTLS. Traffic routing (rate limiting, retries) and policy management. Telemetry generation for monitoring and tracing tools. […]
Communication 30 Dec 2024 Resolving Timeout Issues for Internal Services in Istio-Managed EKS ClustersProblem: The client used Istio to manage service communication in a distributed microservices architecture. Centralized services, including Gitlab, Keycloak, Vault, and others, were hosted in an Amazon EKS cluster and accessed via a WireGuard-based VPN mesh (Netbird) from 10 external Kubernetes clusters. Despite having all services exposed through Istio ingress gateways, external clusters experienced frequent […]
Communication 27 Dec 2024 Optimizing DNS Resolution and Resolving Readiness Delays in Kubernetes with Istio and CrossplaneProblem: The client reported delays in the readiness of ingress virtual services and difficulty accessing services through DNS names. Despite using Istio for service-to-service communication and centralized services like Keycloak, Gitlab, Vault, and others, the setup was taking too long, especially when resolving DNS names for these services. The delay was primarily due to Crossplane […]
Communication 25 Dec 2024 Migration and Upgrade of Cassandra Cluster from On-Premises to AWSProblem: The client planned to migrate a 5-node Cassandra cluster from an on-premises environment (version 3.11.8) to AWS (target version 4.1.5). The client requested guidance on the best migration strategy that ensures no downtime. Additionally, the client requested information on backup and restore procedures for the migration. Solution: The expert recommended a step-by-step approach. First […]
Database 23 Dec 2024 Optimizing Performance in a Cassandra Cluster Experiencing High CPU UsageProblem: The client experienced issues with uneven data distribution across nodes after adding new nodes to an existing Cassandra cluster. Upon reviewing the “nodetool status” output, it was observed that new nodes were not receiving equal data compared to existing ones, resulting in significant data discrepancies between nodes. The client sought assistance in understanding why […]
Database 20 Dec 2024 Resolving Airflow DAG Triggering IssuesProblem: The client’s operations team reported issues with triggering jobs via Apache Airflow, specifically through a custom solution, the dag_factory. While jobs triggered outside of the dag_factory worked without problems, those initiated through it were not being processed as expected. Attempts to gather logs in the Airflow UI yielded no entries, as the DAG triggering […]
Data Analytics 18 Dec 2024 Data Synchronization Issue in Cassandra Cluster After Adding a New Data CenterProblem: The client reported a critical issue with the Cassandra cluster after adding a new data center and a rack containing three nodes. Despite bringing the new data center online, no data was being transferred from the source data center. Additionally, attempts to run a repair operation on the nodes were unsuccessful, which prevented the […]
Database 16 Dec 2024 Resolving Data Consistency Issues in Cassandra When Adding a New Data CenterProblem: The client needed to add a new data center to their existing Cassandra DB cluster for a critical project. However, upon starting Cassandra on the new server, it encountered a shutdown error due to a required node being offline. The error message, “A node required to move the data consistently is down,” indicated an […]
Database