Problem: The client encountered issues with the data-explorer view functionality in their CKAN environment. While resources could be downloaded manually, the data-explorer view was unable to load. During the initial investigation, it was found that while the “datastore” plugin was enabled in the ckan.ini file, the ckan.datastore.write_url and ckan.datastore.read_url were not configured. The client was […]
Data Analytics 11 Dec 2024 Addressing SSTable Corruption and Data Migration Challenges in Cassandra EnvironmentsProblem: The client is encountering a “SSTable corruption” issue when starting Cassandra in a new PLAB environment created using a CloudFormation template. After copying EBS volumes from a disaster recovery (DR) environment and making necessary adjustments in the cassandra.yaml file, they receive a series of NullPointerExceptions related to the SSTableReader while attempting to open SSTables. […]
Database 9 Dec 2024 Troubleshooting Authentication Failures and Node Reattachment in Pgpool-II SetupProblem: The client experienced an authentication failure during health checks in their Pgpool-II setup, which led to a failover event. Despite updating the password in pool_passwd and pgpool.conf using the pg_md5 utility, the client continued to face the same issue. They observed that after failing over the node due to the authentication issue, they successfully […]
Database 6 Dec 2024 Proactive Monitoring and Support for Apache Cassandra During iPhone Launch EventProblem: The client is preparing for an iPhone launch event, anticipating traffic spikes up to 200%. They require proactive monitoring of their Apache Cassandra production system during specified timeframes, with an upgrade to Severity 1 for immediate response during those periods. Process: Ticket Acknowledgment: We confirmed availability for the requested support dates and asked for […]
Database 4 Dec 2024 Kubernetes Upgrade and Node Restoration for Customer’s Onsite EnvironmentProblem: The client reported two main issues: One of the Kubernetes master nodes was in a “not ready” state. They needed to upgrade their Kubernetes version from 1.26 to 1.29. The client requested support to address these concerns. The client had already shut down the master node and was awaiting further instructions for troubleshooting. Process: […]
Developer Tools 2 Dec 2024 Resolving Cassandra Query Timeout Issues: Optimizing Performance and Ensuring StabilityProblem: The client reported encountering a request timeout error when querying the PLDT Cassandra database in a production environment. The specific query involved selecting records from the jesi.service_monitoring table, which was attached along with a screenshot for further context. Process: Upon receiving the issue, the support team initiated an investigation. They first inquired about the […]
Database 29 Nov 2024 Rolling Upgrade of ETCD and Patroni Nodes in a Multi-Node PostgreSQL ClusterProblem: The client wanted to perform a rolling upgrade of the underlying operating system from RHEL 7 to RHEL 9 for their ETCD nodes in a Patroni-managed PostgreSQL cluster. The cluster contained three ETCD nodes and three Patroni-managed PostgreSQL instances (one primary and two standby). With a Recovery Point Objective (RPO) and Recovery Time Objective […]
Database 27 Nov 2024 Mitigating Frequent Docker Swarm Re-elections: Adjusting Election Timeout for Improved StabilityProblem: The customer is facing frequent Docker Swarm re-elections, triggered even by brief server issues lasting just a few seconds. They are seeking guidance on how to modify the Swarm election timeout and whether adjusting this value will have any impact on the system. Process: Step 1: Initial Investigation The customer reported frequent leader re-elections […]
Developer Tools 25 Nov 2024 Resolving PostgreSQL Filesystem Bloat and Replication Slot Stuck IssueProblem: The client encountered a significant issue with their PostgreSQL database (PGDB). They reported that the filesystem (FS) utilization suddenly increased from 74% to 94% without any new objects being created. Despite their efforts to recreate the replication slot and restart PGPool, the filesystem remained at 94%. Logs revealed a termination error related to another […]
Database