Problem:
The client reported an issue where Apache Cassandra nodes in their multi-datacenter cluster were logging frequent errors related to SSL certificate validation, with entries like:
DEBUG [Native-Transport-Requests-1] 2025-01-23 08:54:27,794 ServerConnection.java:140 - Failed to get peer certificates for peer /10.110.151.78:36376 javax.net.ssl.SSLPeerUnverifiedException: peer not verified
Despite these log entries, the cluster continued to function normally, but the client was concerned about the error logs and potential issues related to peer certificate verification.
Process:
Step 1: Initial Identification
The error logs provided by the client showed repeated instances of SSLPeerUnverifiedException, indicating that Cassandra was attempting to verify SSL certificates for peer nodes but was unable to do so. The client confirmed that no significant impact was observed in the cluster’s operations despite the logs, and the cluster appeared to be functioning properly.
The client’s configuration file revealed that client-side authentication was disabled (require_client_auth = false
), indicating that SSL certificates were not required for normal operations.
Step 2: Analysis by the Expert
The expert reviewed the provided logs, configuration files, and the client’s setup, noting the following:
- SSL errors appeared to be related to debug log entries, not actual failures in peer-to-peer communication.
- The configuration was correct for an environment where peer authentication was not necessary.
- There was no indication of network communication failure or performance degradation in the logs.
Key questions were raised regarding:
- Whether the client had noticed any instability or service degradation despite the error logs.
- The client’s specific needs for SSL certificate validation in this setup.
- Whether the logs were cluttering system log files or causing unnecessary concerns.
Step 3: Client Environment Overview
The client provided details from the nodetool status
command, showing the nodes across two data centers: PROD-East US2 and DR-West US2. All nodes were up, with healthy disk and memory metrics. However, the SSL errors persisted, even though the cluster operated normally.
Step 4: Root Cause Analysis and Solution Proposal
The expert identified the following:
- SSLPeerUnverifiedException: This error was triggered by an SSL-related misconfiguration but didn’t affect normal cluster operations. Since
require_client_auth
was set to false, SSL certificates weren’t required, and the error was just being logged as part of the debug output. - Logging Issue: The errors were being logged in DEBUG mode, which was unnecessary for the client’s use case. These log entries did not indicate a functional issue but rather contributed to cluttered logs.
Solution:
The expert recommended the following actions to resolve the issue:
- Disable Debug Logging for SSL Connections: Modify the
logback.xml
file to change the log level forServerConnection
to INFO, removing the DEBUG-level entries related to SSL validation errors.
<logger name="org.apache.cassandra.transport.ServerConnection" level="INFO"/>
Conclusion:
The issue with the SSLPeerUnverifiedException
was primarily related to debug-level logging in Apache Cassandra, not an actual failure in peer-to-peer communication. By adjusting the logging configuration and ensuring it was aligned with the client’s requirements, the errors were suppressed without impacting the system’s performance. This solution helped reduce unnecessary log entries and reassured the client that their system was functioning as expected.