Altering Cassandra Keyspace in a Multi-DC Setup: Best Practices and Key Steps - Proactive Insights and Support For Open-Source Applications

Problem:

The client had a 6-node multi-DC setup for Cassandra (3 nodes in PROD – East US2 and 3 nodes in DR – West US2) and needed to alter a keyspace. The keyspace was initially defined as follows:

CREATE KEYSPACE bulk_api_management WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true;

The client wanted to change the strategy to “NetworkTopologyStrategy” and include both data centers, DC1 and DC2, with the following statement:

ALTER KEYSPACE bulk_api_management WITH replication = {'class': 'NetworkTopologyStrategy', 'DC2': 3, 'DC1': 3};

The client had all other keyspaces configured with DC1 and DC2 inclusion using “NetworkTopologyStrategy,” except for the “bulk_api_management” keyspace. They were seeking to understand why it would be necessary to decommission a node and run “nodetool cleanup” when they simply needed to alter a keyspace to include both DC1 and DC2 using the “NetworkTopologyStrategy” strategy. They questioned whether running nodetool repair <keyspace> on all 6 nodes after altering the keyspace would be sufficient.

The client sought answers from our expert regarding whether this statement was correct, what post-steps might have been necessary (such as running “nodetool repair”), and whether the ALTER KEYSPACE command could have been executed while all nodes were up and running with the application connected to the Cassandra DB, or if any pre-steps were required.

Solution:

The expert has provided a detailed approach to addressing the client’s problem regarding the alteration of the keyspace and the inclusion of both DC1 and DC2 using the “NetworkTopologyStrategy.”

The expert has shared a comprehensive tutorial for adding a new data center to Cassandra, which can be found here: Adding a New Data Center to Cassandra.
The expert has checked and validated the data provided by the client regarding the “NetworkTopologyStrategy” and the inclusion of data centers DC1 and DC2.
The expert recommends that after altering the keyspace, the client should run nodetool rebuild old_dc_name on all nodes of the new data center. This is crucial for properly integrating the new data center into the existing cluster.
The expert has also provided the necessary commands and explained the importance of running “nodetool cleanup.” Specifically, after the data is populated, the client needs to execute the following commands on all nodes in both data centers:
- nodetool repair
- nodetool cleanup

The expert emphasizes that simply running nodetool repair on all nodes is not sufficient when adding a new data center to the setup. Each node requires the following sequence:

nodetool rebuild -- name_of_existing_data_center
nodetool repair
nodetool cleanup

The final nodetool cleanup command is important as it will clean up unused replicas and free up space on each node.

This approach ensures that the keyspace alteration is done correctly and that the new data center is fully integrated into the Cassandra cluster.

Conclusion:

By following expert recommendations, the client successfully updated their Cassandra database to work effectively across two data centers. This involved changing the setup to support data distribution equally between both locations, synchronizing data, and optimizing storage. The solution ensured improved data reliability and system efficiency, minimizing the risk of downtime or data loss. The client now has a more robust and stable database environment, along with the knowledge and resources to maintain this setup efficiently.