Problem: The client reported that the Prometheus directory inside /var/lib had grown to 23GB, leading to high disk utilization on /var and potentially impacting other services. The /var directory has a total capacity of 200GB, which is shared by other service libraries and log files. Currently, the utilization on /var is at 80%, and the […]
Data Analytics 29 Jul 2024 Kafka Streams Application: Efficient Management of Changelog TopicsProblem: The client, utilizing Kafka Streams application version 3.3.1, encountered issues with managing changelog topics. Despite configuring the application for automatic cleanup of records within these topics, unchecked growth was noticed, posing potential risks to system performance and stability. Process: Initial Assessment: The client reported the issue, highlighting the design of their Kafka Streams application […]
Data Analytics 27 Jul 2024 Resolving FreeIPA Password Expiration Issue for Admin-Reset PasswordsProblem: FreeIPA prompts regular users to change their passwords immediately after an admin resets them, which is undesired for certain admin-managed accounts like ‘admpass’. Process: The expert first reviewed the client’s IPA password policy and proposed using the krbPasswordExpiration attribute to control password expiration. However, attempts to set this attribute during user modification did not […]
Security 26 Jul 2024 Cassandra Memory Allocation Issue and ResolutionProblem: The client experienced frequent warnings in their Cassandra nodes, indicated by the log entry: INFO [CompactionExecutor:37] 2023-02-23 12:00:12,268 NoSpamLogger.java:91 – Maximum memory usage reached (536870912), cannot allocate chunk of 1048576 The client requested an investigation into the potential impact of increasing the file_cache_size_in_mb in the cassandra.yaml file, whether a restart (bounce) would be necessary, […]
Database 20 Jul 2024 Grid Connector Stuck and Failing: Offsets Not Committed, Leading to Increasing LagProblem: The client has requested assistance with the following issue regarding the `connect-eoc-data-summary-to-grid-sink-httpfile-connector`. The connector is experiencing a lag where it is not reading any records, and the offset is not being committed, causing the lag to keep increasing. The client indicated that the grid connector appears to be stuck and has failed. The following […]
Data Analytics 19 Jul 2024 Resolving Elasticsearch Query Timeouts ProblemProblem: Certain Elasticsearch queries timed out after 30 seconds. Details: The customer used Elasticsearch (version 7.17.0 or slightly newer) to query documents created by the Actimize application. The Elastic index contained approximately 80 million documents, amounting to several terabytes. Typically, queries were executed within a few seconds, but some queries consistently took 30 seconds or […]
Data Analytics 12 Jul 2024 Prometheus’ node exporter failing on ARM64 machinesProblem: The customer is experiencing the “exec format error” issue when using Prometheus node exporter versions 1.5.0 and 1.6.0 on ARM64 machines, particularly Graviton-type instances in an AWS environment. This error is observed in the node exporter pods running as a DaemonSet in a Kubernetes cluster with nodes having ARM64 architecture. Process: The experts requested […]
Data Analytics 11 Jul 2024 WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUMProblem: The client encountered a Cassandra exception: “WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM.” This issue, occurring since March 9, 2023, revolves around an INSERT INTO query. They seek troubleshooting assistance as this exception had not occurred before that date. Process: Step 1 – Initial Investigation and Troubleshooting: The expert team initiated […]
Database 10 Jul 2024 Network Instability Causing Keepalived Crashes and Application ErrorsProblem: The client reported issues with Keepalived crashes leading to high availability disruptions and application errors, particularly connection timeouts with the PostgreSQL server. Initial investigations revealed suspicions of network instability and outdated software versions contributing to the problem. Process: Requesting initial information for further investigation of the problem The number of servers in the HAProxy […]
Database