Problem:
The client managed a 5-node Cassandra cluster across two data centers (DC1 and DR1), each containing 5 nodes. The data_file_directories were distributed across multiple mount points. On one node, the mount point /cassandra/data2 was nearly full due to a large table in the “jesi” keyspace, specifically the “service_monitoring_payload” table. This resulted in significant storage imbalance, which posed a risk to the cluster’s stability.
Process:
Consolidating the multiple mount points into a single RAID0 array was considered. This would allow data to be striped across multiple disks, ensuring better performance and even distribution at the kernel level. Below is a detailed description of the process that was undertaken:
Node Decommissioning:
The first step was to decommission the affected node using the command nodetool decommission. This safely removed the node from the cluster before making any changes to the storage configuration. The status of the node was verified to ensure it was successfully decommissioned.
After decommissioning, the Cassandra service on this node was stopped to prepare for disk reconfiguration.
Disk Preparation and RAID0 Array Creation:
Data Erasure: All existing data on the disks was erased using the dd command (dd if=/dev/zero of=/dev/sdX bs=1M count=256). This was necessary to prepare the disks for reconfiguration into a RAID0 array.
Partitioning: New disk labels and partitions were created using the parted command (parted –script — /dev/sdX mklabel gpt and parted /dev/sdX mkpart primary 2048s 100%) on each disk to ensure they were ready for inclusion in the RAID0 array.
RAID0 Array Creation: A RAID0 array was created by combining multiple disks into a single striped array using mdadm –create –verbose /dev/md0 –level=stripe –raid-devices=3 /dev/sdX1 /dev/sdY1 /dev/sdZ1. This configuration allowed data to be evenly distributed across the disks, improving performance and simplifying storage management.
File System Creation: The newly created RAID0 array was formatted with the XFS file system using the mkfs.xfs /dev/md0 command. XFS was recommended due to its high performance, especially when handling large datasets.
Mounting and Configuring Cassandra:
Mounting the RAID0 Array: The RAID0 array was mounted, and the system was configured to ensure that the mount would persist across reboots by updating /etc/fstab with the entry /dev/md0 /opt/cassandra/storage xfs noatime,logbufs=8 0 0 and then running the mount /opt/cassandra/storage command.
Cassandra Reconfiguration: New directories for Cassandra data were created within the RAID0 array using mkdir -p /opt/cassandra/storage/data. Cassandra’s configuration files were updated to reflect the new data directory (data_file_directories: /opt/cassandra/storage/data), ensuring that the database would utilize the RAID0 array for storage.
Restarting Cassandra: The Cassandra service was restarted on the node using sudo systemctl start cassandra, and the rejoining process was monitored to ensure that the node achieved a stable and healthy state within the cluster.
Data Distribution and Cleanup:
Running Repairs: A repair operation was executed on the node using nodetool repair to ensure that data was consistent across the cluster.
Cleanup: A cleanup operation was run on all member nodes using nodetool cleanup. This removed any obsolete data and optimized disk space usage, further stabilizing the storage system.
Detailed Recommendations for Future Deployments:
RAID0 Configuration: It was strongly recommended that future deployments use a single RAID0 array per node for Cassandra data storage. This setup would provide performance benefits through improved data striping and simplify storage management, reducing the risk of future imbalances.
Data Migration: The client was advised on a methodical approach for increasing disk capacity if required:
- Stop Cassandra using sudo systemctl stop cassandra.
- Copy the data from the old mount point to the new mount point using cp -vrfp /old_mount_point/* /new_mount_point/.
- After copying, unmount the old filesystem with umount /old_mount_point, and mount the new filesystem at the same path as the old one with mount /new_mount_point /old_mount_point.
- Restart Cassandra using sudo systemctl start cassandra, and run maintenance commands like nodetool repair, nodetool compact, and nodetool cleanup to ensure data consistency and optimal performance.
Regular Maintenance: The client was advised to establish a regular schedule for running Cassandra maintenance commands:
- Repair: Running nodetool repair weekly is recommended to maintain data consistency across nodes, especially after decommissioning or removing a node.
- Compaction: Monthly compactions using nodetool compact were suggested to optimize file structures, particularly when there are performance issues or indications of excessive tombstones in logs.
- Cleanup: Cleanup should be performed after adding new nodes using nodetool cleanup to remove unused data and free up disk space. This step is not required periodically but should be executed after significant cluster changes.
Handling Cassandra Data Safely: The client was advised against manually deleting files as a method for freeing up space, as this could result in data loss and potential cluster instability. Instead, they were encouraged to follow the recommended procedures, such as decommissioning nodes or using approved tools and commands for managing data safely.
Solution:
By consolidating multiple mount points into a single RAID0 array, the client achieved better performance and easier management of Cassandra’s data storage. The RAID0 array allowed data to be striped across multiple disks, which enhanced I/O performance and ensured even data distribution, addressing the initial storage imbalance issue.
Conclusion:
The transition to a single mount point configured as a Linux MD RAID0 array successfully resolved the disk management issues in the Cassandra cluster. This approach not only improved performance but also simplified the maintenance and monitoring of the database. For future deployments, it is recommended to use RAID0 arrays for Cassandra data storage to avoid similar issues with uneven data distribution and disk management. Regular maintenance routines should also be established to ensure the ongoing health and performance of the cluster.