Problem:

The client wanted to perform a rolling upgrade of the underlying operating system from RHEL 7 to RHEL 9 for their ETCD nodes in a Patroni-managed PostgreSQL cluster. The cluster contained three ETCD nodes and three Patroni-managed PostgreSQL instances (one primary and two standby). With a Recovery Point Objective (RPO) and Recovery Time Objective (RTO) of 5 minutes, the client needed a rolling upgrade method that ensured minimal downtime while maintaining cluster integrity, preventing quorum loss in ETCD, and protecting the PostgreSQL replication.

The client asked if they could maintain ETCD and Patroni nodes on different OS versions (RHEL 7 and RHEL 9) while keeping the same PostgreSQL version during the upgrade.

Process:

Step 1: Gathering Information

The client provided details on their current setup:

  • ETCD Version: 3.3.25
  • Patroni Version: 1.6.5
  • PostgreSQL Version: 12.17
  • OS Version: RHEL 7.8
  • Key Dependency: Python

The expert reviewed the setup and confirmed that an incremental OS upgrade would be feasible, allowing nodes to run on different OS versions (RHEL 7 and RHEL 9) as long as ETCD and Patroni versions remained compatible across both operating systems.

Step 2: Compatibility Analysis

The expert recommended verifying that the current versions of ETCD, Patroni, and PostgreSQL were compatible with RHEL 9. This was essential to ensure a smooth upgrade process without any unforeseen issues related to dependencies like Python.

Step 3: Rolling Upgrade Plan

A detailed upgrade plan was devised to minimize disruption during the process. The initial steps included ensuring recent, consistent backups of both ETCD and PostgreSQL instances, and taking ETCD snapshots before upgrading each node. The upgrade began with the standby PostgreSQL instances, where the OS on the standby Patroni nodes was upgraded to RHEL 9.3 using the Leapp utility. Afterward, the upgraded standby nodes were validated to ensure they rejoined the cluster and synchronized with the primary instance. Next, the ETCD nodes were upgraded one by one to RHEL 9.3. New RHEL 9-based ETCD nodes were added to the cluster before removing the older RHEL 7 nodes, ensuring quorum was maintained throughout the process.

Step 4: Expert Recommendations

The expert proposed the following steps for the OS upgrade:

  • Backup and Validation: Ensure recent and consistent backups of ETCD and PostgreSQL. Take snapshots of all ETCD nodes and confirm the health of the Patroni cluster.
  • ETCD Upgrade:
    • Prepare a New RHEL 9 Machine: Set up a new machine with RHEL 9 and install the same ETCD version (3.3.25).
    • Add New ETCD Nodes to the Cluster: Add the new RHEL 9 nodes to the ETCD cluster and update Patroni configurations to reflect these new nodes. Monitor ETCD cluster health to ensure no quorum loss.
    • Remove Old ETCD Nodes: Gradually remove the old RHEL 7 nodes after verifying the new RHEL 9 nodes are functioning correctly.
  • Patroni Upgrade:
    • Prepare a New RHEL 9 Machine for Patroni: Set up a new machine with RHEL 9, Patroni 1.6.5, and PostgreSQL 12.17.
    • Add New Patroni Nodes to the Cluster: Add the new Patroni nodes to the cluster and ensure they synchronize with the PostgreSQL primary and standby databases.
    • Switchover and Remove Old Patroni Nodes: Perform a controlled switchover to promote the new RHEL 9 Patroni nodes to primary. Gradually remove the old RHEL 7 nodes after successful verification of cluster health.

Step 5: Testing and Verification

The expert suggested testing the process in a non-production environment to verify behavior, specifically ensuring that Patroni, PostgreSQL, and ETCD would function as expected with mixed OS versions (RHEL 7, 8, and 9).

Step 6: Expert Recommendations

The expert confirmed that the client could proceed with their plan to upgrade ETCD nodes first by adding new RHEL 9 machines and then safely removing the old nodes. The same approach could be followed for Patroni nodes. The expert also provided steps to perform a controlled switchover in Patroni to avoid data loss during the upgrade.

Solution:

The client successfully performed a rolling OS upgrade of ETCD and Patroni from RHEL 7 to RHEL 9. The expert guided them through adding new RHEL 9 nodes to the ETCD and Patroni clusters while maintaining the same ETCD, Patroni, and PostgreSQL versions. They tested the setup in a non-production environment to ensure compatibility and proceeded with a controlled switchover and node removal.

Conclusion:

The proposed solution ensured a seamless upgrade of the ETCD and Patroni nodes without losing quorum or breaking replication. By methodically adding new nodes running RHEL 9 while gradually removing the older RHEL 7 nodes, the client maintained high availability throughout the process. The rolling upgrade strategy adhered to the client’s RPO/RTO requirements of 5 minutes and minimized downtime, ensuring the PostgreSQL and ETCD clusters continued to function smoothly throughout the upgrade.