Problem: A client faced difficulties downloading images from their local Nexus repository using Podman. Despite several troubleshooting attempts, including adding the registry to insecure registries and adding the certificate locally, the issue persisted. The specific error encountered was related to certificate validation. Process: Initial Troubleshooting: The client added the Nexus registry to the list of […]
Developer Tools 28 Aug 2024 Optimizing Cassandra Storage with RAID0 ArrayProblem: The client managed a 5-node Cassandra cluster across two data centers (DC1 and DR1), each containing 5 nodes. The data_file_directories were distributed across multiple mount points. On one node, the mount point /cassandra/data2 was nearly full due to a large table in the “jesi” keyspace, specifically the “service_monitoring_payload” table. This resulted in significant storage […]
Database 28 Aug 2024 Optimizing Elasticsearch Query Performance for Large DocumentsProblem: The client faced significant delays in executing Elasticsearch queries within their production environment. A particular query, which involved a simple numeric account identifier, took an alarming 68 seconds to execute, despite returning only six hits. The total size of the query output was 583KB, yet the Elasticsearch profiler indicated that 67 seconds of this […]
Data Analytics 28 Aug 2024 Seamlessly Transition from PodPreset to Admission Webhooks: Overcoming Kubernetes Upgrade HurdlesProblem: The customer upgraded their Kubernetes cluster from version 1.19 to 1.24.8. Following this upgrade, they lost access to the PodPreset feature, which was removed in Kubernetes version 1.20. The customer needed a replacement for this functionality and identified Admission Webhooks as a potential solution. However, despite following RedHat’s procedure for implementing Admission Webhooks, the […]
Developer Tools 26 Aug 2024 Resolving SSL Connection Issues in PostgreSQLProblem: The application fails to establish an SSL connection to the database, displaying the error “could not accept SSL connection: Success” in the database logs. Despite this, manual `psql` connections using SSL work fine. Process: Step 1 – Verify Connection String: Confirm that the application’s connection string includes the necessary SSL parameters. Example: postgresql://user:password@hostname:port/dbname?sslmode=require Step […]
Database 23 Aug 2024 Compatibility Issue Between Zipkin 2.24.2 and Elasticsearch 8.xProblem: Zipkin version 2.24.2 does not support Elasticsearch 8.x, which poses a significant obstacle as many of our clusters are already upgraded to Elasticsearch 8.x. This compatibility issue needs to be addressed to ensure seamless operation of our monitoring and tracing functionalities. Solution: Assessment and Documentation: Conducted thorough analysis and documented the current compatibility status […]
Developer Tools 22 Aug 2024 High connection issues on the PostgreSQL database serverProblem: The client experienced connection issues on the PostgreSQL database server, with an abnormally high number of connections reaching around 20,000 at a given time. The client asked for assistance from the expert team in identifying the possible reasons behind this issue. Based on the client’s analysis, there were multiple wait events on HAProxy, and […]
Database 21 Aug 2024 Resolving Prometheus Pod Crashing Issue in Production EnvironmentProblem: The client reported an issue where the Prometheus pod was crashing in the production environment. The error logs indicated a variety of issues including “Terminated Reason: Error” and messages about unhealthy blocks and existing lock files. The specific error message highlighted was: Last State: Terminated Reason: Error Message: und healthy block” mint=1680069600000 maxt=1680091200000 ulid=01GWPY8332RWJAPSFZ8KNAJQ9H […]
Data Analytics 19 Aug 2024 Ensuring Successful Data Restoration from Cassandra 3.11 to 4.0.6Problem: Restoring a Cassandra 3.11 snapshot to a 4.0.6 cluster using the nodetool refresh command results in an empty table, indicating a potential compatibility issue. This affects the DR environment, which needs to accurately replicate the PROD environment’s data. Solution: Step 1. Verify Snapshot Content: Use nodetool listsnapshots and a test environment to ensure the […]
Database