Problem:

The client reported two main issues:

  • One of the Kubernetes master nodes was in a “not ready” state.
  • They needed to upgrade their Kubernetes version from 1.26 to 1.29.

The client requested support to address these concerns. The client had already shut down the master node and was awaiting further instructions for troubleshooting.

Process:

The expert requested detailed information, including logs and specific commands to understand the root cause of the node failure. The client also asked for the steps taken before encountering the issue, emphasizing the importance of a well-planned upgrade process since the jump from Kubernetes 1.26 to 1.29 involves potential API breaking changes. The expert recommended splitting the upgrade into three separate phases, as Kubernetes supports upgrading only one version at a time.

The client powered on the master node, collected the logs, and provided the requested information. The expert suggested performing a series of steps to try and restore the node, including draining and deleting the node from the cluster, and re-adding it using Kubespray.

Solution:

The expert provided a structured plan to address the client’s issues with the Kubernetes master node and facilitate the version upgrade, which included the following steps:

  • Log Collection and Analysis: The expert requested essential logs from the client to diagnose the state of the master node, including outputs from kubectl commands and journalctl logs for Kubernetes components.
  • Node Removal: The client powered on the problematic master node and collected logs. The expert advised running kubectl drain to evict pods, followed by kubectl delete node to remove the non-responsive master node from the cluster.
  • Inventory Update: The client updated the Kubespray inventory file to remove references to the old master node and added the new master node under the appropriate sections for proper configuration.
  • Running Kubespray Playbooks: The expert guided the client to run the Kubespray playbooks specifically targeting the new master node with the command:
    ansible-playbook -i inventory/<path to inventory file> cluster.yml --become --limit <new-master-node>
  • Certificate Management: To resolve issues with missing kubeadm certificates, the expert recommended regenerating them using the command:
    kubeadm init phase upload-certs --upload-certs

    The necessary PKI files were transferred from an operational master node to the new one to ensure proper setup.

  • Handling kube-dns Issues: The expert identified a known issue with the kube-dns service and advised the client to delete the existing service to prevent conflicts during the upgrade process.
  • Upgrade Execution: After restoring the master node, the client proceeded with the upgrade in phases, first upgrading to version 1.27 before moving to 1.29. The expert remained available for support throughout the process.

Conclusion:

The solution offered by the expert was effective because it provided a structured and thorough approach to both restore the node and handle the version upgrade. Breaking the upgrade into smaller phases minimized the risk of further disruptions, and the manual intervention with certificates ensured the master node could rejoin the cluster successfully. The collaborative troubleshooting between the client and the expert minimized downtime, making this a robust and reliable solution.