Problem:

The problem is that commit logs in the production Cassandra cluster are accumulating excessively without being deleted, leading to a full filesystem and subsequent database crashes.

Process:

Step 1: Hardware Specifications and Disk Space:

  • Requested hardware specifications for each Cassandra node.
  • Checked disk space on all nodes using df -h command.

Step 2: System Logs and Configuration:

  • Examined system logs on each node using tail -n 100 /path/to/cassandra/logs/system.log.
  • Collected a zip of the Cassandra config folder.
  • Verified configuration file consistency across all nodes.

Step 3: Commitlog Directory Size and Write Activity:

  • Investigated the size of the commitlog directory.
  • Assessed the write activity on the database in terms of size and frequency.

Step 4: Performance Metrics:

  • Gathered other performance-related metrics, including memory usage, for all Cassandra nodes.

Step 5: Log Purge Process:

  • Checked for any log purge process running on the system.
  • Investigated the possibility of a manually created process causing issues.

Solution:

Upon analysis, it was identified that CommitLogArchiver errors were present in the log files. The configuration was set to hard link Commitlog files with the same name in the same directory, leading to failures in archiving and leaving unused segments.

Steps Taken:

  1. Modified the commitlog_archiving.properties file.
  2. Removed the line “ln %path /cassandra/commitlog/%name” in the 30th line.
  3. Set archive_command= (empty without arguments) to disable archive, letting Cassandra delete processed segments.

Post-Modification Steps:

  1. Restarted Cassandra after changing the setting.
  2. Ran nodetool repair after a successful restart.
  3. Emphasized the need to manually delete old segment files.

This procedure is per node, necessitating the same changes on each node of the cluster.

Conclusion:

The root cause was successfully identified and resolved by adjusting the commitlog archiving configuration. The provided solution ensures that Cassandra deletes processed segments, preventing the accumulation of unused commitlogs and mitigating the risk of filesystem overflow.