Problem:

The client reported a critical increase in pending tasks on one of the nodes within their Cassandra cluster. This issue was causing concern, and the client sought assistance in understanding the root cause and implementing a resolution.

Process:

The client initially executed the nodetool compaction-stats -H command on the affected node and restarted it, which temporarily cleared the pending tasks. However, the issue persisted, and a critical alert related to MutationStage and PendingTasks was received. The expert then analyzed the situation, reviewed logs and configurations, and conducted tests in a non-production environment to identify the underlying cause.

Solution:

The expert recommended several actions to address the issue:

  • Restart of Affected Node: As Cassandra restarts all pending and halted tasks with a process restart, this was advised as an immediate solution.
  • Execution of nodetool compaction-stats -H: The expert advised running this command on all nodes to clear any pending tasks.
  • Manual Major Compaction: To address potential data fragmentation indicated by tombstone warnings, the expert recommended executing nodetool compact followed by nodetool repair on all nodes.

Conclusion:

The proposed solution effectively addresses both the immediate and underlying issues causing the critical increase in pending tasks. By restarting the affected node and executing key nodetool commands across the cluster, the client can clear pending tasks and improve system stability. The recommendation for ongoing monitoring and manual major compaction ensures that data fragmentation is minimized, reducing the likelihood of similar issues in the future. This comprehensive approach not only resolves the current problem but also strengthens the cluster’s resilience against potential recurrence, ensuring smoother and more reliable Cassandra operations.