Problem: The client had a 6-node multi-DC setup for Cassandra (3 nodes in PROD – East US2 and 3 nodes in DR – West US2) and needed to alter a keyspace. The keyspace was initially defined as follows: CREATE KEYSPACE bulk_api_management WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true; The client wanted […]
Database 30 Aug 2024 Resolving Certificate Issues with Podman and Local Nexus RegistryProblem: A client faced difficulties downloading images from their local Nexus repository using Podman. Despite several troubleshooting attempts, including adding the registry to insecure registries and adding the certificate locally, the issue persisted. The specific error encountered was related to certificate validation. Process: Initial Troubleshooting: The client added the Nexus registry to the list of […]
Developer Tools 30 Aug 2024 Proper Shutdown Procedures for Cassandra to Ensure Data IntegrityProblem: A client was experiencing data inconsistencies with the Lucene index on Cassandra and suspected improper shutdown procedures as the root cause. The kill -9 command was used to shut down Cassandra, which led to concerns that data was not being written to disk properly. The client sought guidance on the best way to shut […]
Database 28 Aug 2024 Optimizing Elasticsearch Query Performance for Large DocumentsProblem: The client faced significant delays in executing Elasticsearch queries within their production environment. A particular query, which involved a simple numeric account identifier, took an alarming 68 seconds to execute, despite returning only six hits. The total size of the query output was 583KB, yet the Elasticsearch profiler indicated that 67 seconds of this […]
Data Analytics 28 Aug 2024 Seamlessly Transition from PodPreset to Admission Webhooks: Overcoming Kubernetes Upgrade HurdlesProblem: The customer upgraded their Kubernetes cluster from version 1.19 to 1.24.8. Following this upgrade, they lost access to the PodPreset feature, which was removed in Kubernetes version 1.20. The customer needed a replacement for this functionality and identified Admission Webhooks as a potential solution. However, despite following RedHat’s procedure for implementing Admission Webhooks, the […]
Developer Tools 28 Aug 2024 Optimizing Cassandra Storage with RAID0 ArrayProblem: The client managed a 5-node Cassandra cluster across two data centers (DC1 and DR1), each containing 5 nodes. The data_file_directories were distributed across multiple mount points. On one node, the mount point /cassandra/data2 was nearly full due to a large table in the “jesi” keyspace, specifically the “service_monitoring_payload” table. This resulted in significant storage […]
Database 26 Aug 2024 Resolving SSL Connection Issues in PostgreSQLProblem: The application fails to establish an SSL connection to the database, displaying the error “could not accept SSL connection: Success” in the database logs. Despite this, manual `psql` connections using SSL work fine. Process: Step 1 – Verify Connection String: Confirm that the application’s connection string includes the necessary SSL parameters. Example: postgresql://user:password@hostname:port/dbname?sslmode=require Step […]
Database 23 Aug 2024 Compatibility Issue Between Zipkin 2.24.2 and Elasticsearch 8.xProblem: Zipkin version 2.24.2 does not support Elasticsearch 8.x, which poses a significant obstacle as many of our clusters are already upgraded to Elasticsearch 8.x. This compatibility issue needs to be addressed to ensure seamless operation of our monitoring and tracing functionalities. Solution: Assessment and Documentation: Conducted thorough analysis and documented the current compatibility status […]
Developer Tools 22 Aug 2024 High connection issues on the PostgreSQL database serverProblem: The client experienced connection issues on the PostgreSQL database server, with an abnormally high number of connections reaching around 20,000 at a given time. The client asked for assistance from the expert team in identifying the possible reasons behind this issue. Based on the client’s analysis, there were multiple wait events on HAProxy, and […]
Database