Problem:

The client reported an issue where Apache Cassandra nodes in their multi-datacenter cluster were logging frequent errors related to SSL certificate validation, with entries like:

    DEBUG [Native-Transport-Requests-1] 2025-01-23 08:54:27,794 ServerConnection.java:140 - Failed to get peer certificates for peer /10.110.151.78:36376
    javax.net.ssl.SSLPeerUnverifiedException: peer not verified
    

Despite these log entries, the cluster continued to function normally, but the client was concerned about the error logs and potential issues related to peer certificate verification.

Process:

Step 1: Initial Identification

The error logs provided by the client showed repeated instances of SSLPeerUnverifiedException, indicating that Cassandra was attempting to verify SSL certificates for peer nodes but was unable to do so. The client confirmed that no significant impact was observed in the cluster’s operations despite the logs, and the cluster appeared to be functioning properly.

The client’s configuration file revealed that client-side authentication was disabled (require_client_auth = false), indicating that SSL certificates were not required for normal operations.

Step 2: Analysis by the Expert

The expert reviewed the provided logs, configuration files, and the client’s setup, noting the following:

  • SSL errors appeared to be related to debug log entries, not actual failures in peer-to-peer communication.
  • The configuration was correct for an environment where peer authentication was not necessary.
  • There was no indication of network communication failure or performance degradation in the logs.

Key questions were raised regarding:

  • Whether the client had noticed any instability or service degradation despite the error logs.
  • The client’s specific needs for SSL certificate validation in this setup.
  • Whether the logs were cluttering system log files or causing unnecessary concerns.

Step 3: Client Environment Overview

The client provided details from the nodetool status command, showing the nodes across two data centers: PROD-East US2 and DR-West US2. All nodes were up, with healthy disk and memory metrics. However, the SSL errors persisted, even though the cluster operated normally.

Step 4: Root Cause Analysis and Solution Proposal

The expert identified the following:

  • SSLPeerUnverifiedException: This error was triggered by an SSL-related misconfiguration but didn’t affect normal cluster operations. Since require_client_auth was set to false, SSL certificates weren’t required, and the error was just being logged as part of the debug output.
  • Logging Issue: The errors were being logged in DEBUG mode, which was unnecessary for the client’s use case. These log entries did not indicate a functional issue but rather contributed to cluttered logs.

Solution:

The expert recommended the following actions to resolve the issue:

  • Disable Debug Logging for SSL Connections: Modify the logback.xml file to change the log level for ServerConnection to INFO, removing the DEBUG-level entries related to SSL validation errors.
  •         <logger name="org.apache.cassandra.transport.ServerConnection" level="INFO"/>
            
  • Review SSL Configuration: Although the client did not require SSL client authentication, the expert suggested ensuring the SSL configuration was aligned with the client’s needs. If SSL certificates weren’t required, the client could leave the default settings.
  • Monitor and Optimize Logging: The client was advised to monitor the logs periodically and adjust logging levels as needed to avoid performance issues caused by excessive logging.

Conclusion:

The issue with the SSLPeerUnverifiedException was primarily related to debug-level logging in Apache Cassandra, not an actual failure in peer-to-peer communication. By adjusting the logging configuration and ensuring it was aligned with the client’s requirements, the errors were suppressed without impacting the system’s performance. This solution helped reduce unnecessary log entries and reassured the client that their system was functioning as expected.