Problem: In the production environment of a multi-node OpenSearch cluster, the nodes frequently crashed due to Out-of-Memory (OOM) errors. Initially, the heap size was increased from 16 GB to 30 GB based on IBM’s recommendations, but the problem persisted. IBM further suggested increasing the number of shards from 16 to 64 to mitigate memory overload. […]
Data Analytics 20 Nov 2024 PostgreSQL: Replication Failure in Patroni ClusterProblem: The client reported a replication issue in their A1 BG Production environment, consisting of a Patroni cluster with two PostgreSQL instances (Leader and Replica). Replication stopped, causing the leader’s /pgcluster file system to fill up with pg_wal files, leading to a full disk. The client requested help to identify the root cause of the […]
Database 18 Nov 2024 Apache Cassandra: Addressing High CPU Utilization After UpgradeProblem: Following an upgrade from Cassandra 4.0.9 to 4.1.3, the client reported a noticeable increase in CPU utilization. The average CPU usage on their systems jumped from around 20% to approximately 37%. This escalation in CPU usage adversely impacted system performance and stability. The issue was notably more severe on servers running Red Hat Enterprise […]
Database 15 Nov 2024 Resolving HBase Region Transition and Hadoop File System Permission Issues in a PROD EnvironmentProblem: The client encountered a critical issue in their production environment involving HBase regions stuck in a transition state. This problem resulted in service disruptions within their Hadoop cluster. The issue was exacerbated by file system permission changes following a cold restart of the cluster, leading to difficulties in accessing data and managing HBase operations. […]
Data Analytics 13 Nov 2024 Resolving PostgreSQL Failover and Transaction File Access IssueProblem: After performing a manual failover in PostgreSQL, the client encountered the following error when running a query on a partitioned table ‘ac1_control’: ERROR: could not access the status of transaction 613182547; DETAIL: Could not open file ‘pg_xact/0248’: No such file or directory. Despite restarting the PostgreSQL instance, the issue persisted. The client was operating […]
Database 8 Nov 2024 Upgrade of Elasticsearch from Version 7.15 to 7.17Problem: The client requested assistance with upgrading their Elasticsearch installation from version 7.15 to 7.17. The client sought a detailed step-by-step guide and expressed the need for a meeting to clarify the upgrade process. Process: Upon receiving the request, the expert requested additional details about the client’s current Elasticsearch setup, including information on the cluster, […]
Data Analytics 6 Nov 2024 Resolving SSL Configuration Issues for a Multi-Node OpenSearch ClusterProblem: The client experienced issues setting up a second node for high availability in their OpenSearch cluster. While the first node worked properly, the second node encountered SSL handshake errors when both nodes were used simultaneously. Process: The expert reviewed the client’s YML configurations and SSL certificate setups. They provided detailed feedback on correcting SSL […]
Data Analytics 4 Nov 2024 Enhancing Airflow SecurityProblem: The client reported an issue and requested support to address potential vulnerabilities. A security issue was identified in an Apache Airflow instance, version 2.5.0, involving the absence of an account lockout mechanism. Solution: Our expert identified two key measures to enhance Airflow security and responded to the client’s request with the following recommendations: Implementing […]
Data Analytics 2 Nov 2024 Resolving ConfigMap Storage Limit in HelmProblem: The client reported an issue with the hard limit of 1MB for ConfigMap storage in Helm, which was causing problems with their deployment process. This limitation hindered their ability to store large configurations, necessitating a solution that could accommodate their growing data needs. Solution: To address the issue, the expert initiated an in-depth investigation. […]
Developer Tools