Problem: The client has requested assistance with the following issue regarding the `connect-eoc-data-summary-to-grid-sink-httpfile-connector`. The connector is experiencing a lag where it is not reading any records, and the offset is not being committed, causing the lag to keep increasing. The client indicated that the grid connector appears to be stuck and has failed. The following […]
Data Analytics 19 Jul 2024 Resolving Elasticsearch Query Timeouts ProblemProblem: Certain Elasticsearch queries timed out after 30 seconds. Details: The customer used Elasticsearch (version 7.17.0 or slightly newer) to query documents created by the Actimize application. The Elastic index contained approximately 80 million documents, amounting to several terabytes. Typically, queries were executed within a few seconds, but some queries consistently took 30 seconds or […]
Data Analytics 12 Jul 2024 Prometheus’ node exporter failing on ARM64 machinesProblem: The customer is experiencing the “exec format error” issue when using Prometheus node exporter versions 1.5.0 and 1.6.0 on ARM64 machines, particularly Graviton-type instances in an AWS environment. This error is observed in the node exporter pods running as a DaemonSet in a Kubernetes cluster with nodes having ARM64 architecture. Process: The experts requested […]
Data Analytics 11 Jul 2024 WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUMProblem: The client encountered a Cassandra exception: “WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM.” This issue, occurring since March 9, 2023, revolves around an INSERT INTO query. They seek troubleshooting assistance as this exception had not occurred before that date. Process: Step 1 – Initial Investigation and Troubleshooting: The expert team initiated […]
Database 10 Jul 2024 Network Instability Causing Keepalived Crashes and Application ErrorsProblem: The client reported issues with Keepalived crashes leading to high availability disruptions and application errors, particularly connection timeouts with the PostgreSQL server. Initial investigations revealed suspicions of network instability and outdated software versions contributing to the problem. Process: Requesting initial information for further investigation of the problem The number of servers in the HAProxy […]
Database 8 Jul 2024 Database in the Cassandra cluster generates a large number of commitlogsProblem: In the Cassandra cluster, the database generated a large number of commit logs and didn’t delete them. Due to this, the commit logs filesystem is getting full and the database is crashing. This is relevant for all nodes. Process: Step 1: Initial Investigation and gather information from the client Initial troubleshooting and information gathering […]
Database 6 Jul 2024 Risks in Airflow Version 2.5.2 – Unauthenticated Page VulnerabilityProblem: The user was unable to reach the application page and received the error ‘Unauthenticated Page’. Process: Step 1: Initial Investigation The security issue pertains to an unauthenticated page within the Airflow version 2.5.2 instance. This unauthenticated page poses a potential security risk, as it can be accessed without proper authentication, potentially exposing sensitive information […]
Data Analytics 5 Jul 2024 Diagnosing and Resolving SSL SYSCALL Errors in PostgreSQL with PatroniProblem: The client reported an intermittent issue with their PostgreSQL database managed by Patroni. The error message encountered was “SSL SYSCALL error: EOF detected”. Despite checking the PostgreSQL logs and HAProxy logs, no corresponding errors were found. The client attempted to change the idle_in_transaction_session_timeout parameter from 1 hour to unlimited, but the error persisted. Solution: […]
Database 5 Jul 2024 Resolution of Cassandra Nodetool Repair Failure Due to Data CorruptionProblem: The client has a two-datacenter (DC1 and DR1) Cassandra cluster. They encountered a failure while running nodetool repair on a node in DC1, which was traced to data corruption on a node in DR1. The logs indicated a corruption error in a specific SSTable file. Solution: Step 1. Initial Diagnosis: Ran nodetool repair in […]
Database