Problem:
The client’s production environment includes Rancher installed on two clusters: a Rancher cluster and an application cluster. During the cluster setup, the kubelet certificate was generated with a validity of one year, which recently expired. According to the Rancher RKE documentation, additional configuration is needed to manage certificate validity. The client observed inconsistencies:
- Some certificates were generated with a validity of 1 year, while others had 10 years.
- Certificates with 1-year validity, like the scheduler and controller manager, were automatically rotated, but the kubelet certificate was not.
The client resolved the expired certificate by rotating it manually, ensuring functionality was restored. However, they are seeking clarification about the logic behind varying certificate validity durations. Why do only certain certificates auto-rotate while others do not? Whether these discrepancies are related to Rancher’s default certificate management settings.
Process:
Step 1 – Initial Analysis
The experts reviewed the client’s issue and verified that the certificates in question were Rancher’s default certificates, not custom ones. The investigation highlighted:
- Kubelet and API server certificates are typically configured with a 10-year validity.
- Other components, like the scheduler and controller manager, have 1-year certificates that auto-rotate.
Step 2 – Recommendations from Expert 1
The first expert requested additional details to provide tailored guidance. Key points included:
- Confirmation of whether the client used self-signed or custom certificates.
- Clarification of whether the objective was to extend certificate validity from 1 year to 10 years.
- Instructions on how to manually rotate certificates and adjust default settings if needed.
Step 3 – Advanced Insights from Expert 2
The second expert delved deeper into Rancher’s certificate management logic:
- Kubelet and API server certificates, with 10-year validity, do not auto-rotate unless explicitly configured.
- Scheduler and controller manager certificates, with 1-year validity, are set to auto-rotate.
- Manual intervention is required for rotating 10-year certificates in RKE1 environments.
- RKE2 automates the certificate rotation process, addressing many of these discrepancies.
- The root CA certificate defaults to a 10-year duration, ensuring long-term trust for other certificates.
They referenced the Rancher documentation for manual certificate rotation:
The expert hypothesized that the client might have manually rotated only the kubelet and API server certificates, leaving other components unchanged.
Step 4 – Client Follow-Up
The experts provided actionable next steps:
- Verify the configuration for certificate management in the Rancher RKE cluster.
- Explicitly configure kubelet to rotate certificates if desired.
- Upgrade to RKE2 for automated certificate management, reducing manual effort.
- Regularly monitor certificate expiration dates to prevent similar issues.
Solution:
The client successfully rotated the expired kubelet certificate and restored normal cluster functionality. The experts clarified the differences in certificate validity and rotation policies, helping the client understand Rancher’s default behavior. They recommended transitioning to RKE2 for improved automation and reduced manual intervention.
Conclusion:
This case highlights the importance of understanding default certificate management policies in Rancher RKE. By leveraging expert guidance, the client resolved the immediate issue and gained insights into preventing future discrepancies. Transitioning to RKE2 offers a robust solution for automating certificate management, ensuring long-term stability and operational efficiency.