Problem:
Following an upgrade from Cassandra 4.0.9 to 4.1.3, the client reported a noticeable increase in CPU utilization.
The average CPU usage on their systems jumped from around 20% to approximately 37%. This escalation in CPU usage adversely
impacted system performance and stability. The issue was notably more severe on servers running Red Hat Enterprise Linux (RHEL) 8.8
compared to RHEL 7.9, where CPU utilization remained relatively stable.
Process:
Step 1 – Initial Assessment
Upon receiving the client’s complaint, our expert team began by reviewing the provided logs and system configurations.
The initial analysis did not indicate any immediate issues directly related to Cassandra’s configuration. The team requested
additional data, including:
- System logs before and after the upgrade.
- CPU utilization metrics over time.
- RAM usage statistics.
- Process-specific CPU utilization data.
Step 2 – Investigation and Identification of Potential Issues
Our experts undertook a comprehensive investigation, focusing on the following areas:
-
Operating System Differences:
- Kernel and System Libraries: The upgrade to RHEL 8.8 introduced new kernel versions and system libraries.
These changes could impact how Cassandra interacts with the operating system, potentially affecting CPU usage. - Tuned Service: RHEL 8.8 uses tuned, a service for performance optimization that could be misconfigured or less compatible
with Cassandra’s needs compared to the configuration in RHEL 7.9.
- Kernel and System Libraries: The upgrade to RHEL 8.8 introduced new kernel versions and system libraries.
-
Cassandra Configuration Changes:
- Upgrade Impact: Changes in Cassandra 4.1.3 from 4.0.9, such as new features or modifications to data handling and query processing,
could be more resource-intensive. The experts reviewed release notes and documentation to identify any new settings or default changes
that might affect performance. - Resource Allocation: Investigated if there were changes in resource allocation or additional overhead introduced with the new version
of Cassandra.
- Upgrade Impact: Changes in Cassandra 4.1.3 from 4.0.9, such as new features or modifications to data handling and query processing,
Step 3 – Implementation of Recommendations and Testing
-
Tuned Profile Adjustments:
- Profile Selection: Experts recommended adjusting the tuned profiles to ones optimized for database workloads, such as postgresql
or cpu-partitioning. This was done to ensure that the tuned service was not introducing unnecessary CPU overhead. - Temporary Disabling: Disabling tuned was suggested to observe if it had any impact on CPU utilization. This step was crucial in
identifying if tuned was the source of the increased CPU usage.
- Profile Selection: Experts recommended adjusting the tuned profiles to ones optimized for database workloads, such as postgresql
-
Operating System Tuning:
- Kernel Parameters: Adjusted kernel parameters related to process scheduling and I/O operations to see if these changes impacted
Cassandra’s performance. - OS Version Testing: Tested Cassandra 4.1.3 on RHEL 7.9 to confirm if the higher CPU utilization was specific to RHEL 8.8.
This also included evaluating if upgrading to RHEL 9 might offer improvements.
- Kernel Parameters: Adjusted kernel parameters related to process scheduling and I/O operations to see if these changes impacted
-
Cassandra Configuration Review:
- Configuration Files: Reviewed and adjusted Cassandra configuration files (cassandra.yaml, cassandra-env.sh) for any settings
that might contribute to higher CPU usage. - Heap Size and JVM Tuning: Adjusted JVM heap sizes and garbage collection settings to optimize performance.
- Configuration Files: Reviewed and adjusted Cassandra configuration files (cassandra.yaml, cassandra-env.sh) for any settings
Solution:
The following solutions were implemented based on the recommendations:
- Tuned Adjustments: Switched tuned to a postgresql profile and disabled it temporarily to gauge its effect on CPU utilization.
This resulted in a reduction in CPU usage, but the issue was not entirely resolved. - OS Tuning: Applied kernel parameter adjustments and verified their impact through performance testing. Although some improvements
were noted, RHEL 8.8 still showed higher CPU usage compared to RHEL 7.9. - Alternative Testing: The client tested Cassandra 4.1.3 on RHEL 7.9, which confirmed that CPU usage was more stable on this version.
Conclusion:
The investigation highlighted that the increased CPU utilization was influenced by a combination of factors, including operating system
changes introduced with RHEL 8.8 and potential configuration adjustments needed for Cassandra 4.1.3. While tuning tuned and adjusting
OS parameters helped mitigate some of the issues, RHEL 8.8 continued to exhibit higher CPU utilization compared to RHEL 7.9. The client
was advised to consider further optimization of their environment and explore alternative operating system versions if higher performance
stability was required. This case underscores the importance of evaluating the interplay between software upgrades and operating system
changes to maintain optimal performance and stability.