Problem:

A customer reported a critical production issue related to incorrect timestamp values in a data pipeline built using Debezium CDC. Timestamps originating from an MSSQL source system were appearing +05:30 hours ahead of the expected values in the downstream system.

The issue affected multiple tables containing timestamp columns based on MSSQL date and time data types that do not store timezone information.

  • MSSQL date/time data types do not persist timezone metadata
  • Debezium interprets these values as UTC internally
  • A custom conversion layer attempted to normalize timestamps to a local timezone
  • An additional, unintended offset was observed in one environment

Although the same configuration and codebase were used across environments, timestamp behavior differed, making the issue difficult to diagnose.

Process:

Step 1: Data Pipeline Review

The expert reviewed the full data flow to understand where timezone handling might be introduced:

  • MSSQL as the source database
  • Debezium for change data capture
  • Kafka as the streaming platform
  • A cloud-based analytical database as the target system

It was confirmed that the target system consistently uses UTC and does not apply implicit timezone conversions. Source database timezone settings were also verified and found to be consistent across environments.

Step 2: Datatype and Timezone Behavior Analysis

The expert explained that several MSSQL date/time data types do not carry timezone information. As a result:

  • Debezium assumes UTC when serializing these values
  • Kafka topics receive epoch-based timestamps derived from this assumption

A custom conversion layer had been introduced to normalize all timestamps to a local timezone. While effective in some cases, this approach introduced ambiguity when combined with Debezium’s internal handling.

Step 3: Converter Logic Evaluation

The custom converter applied different logic depending on the Java object type produced by Debezium:

  • Numeric epoch values: applied a fixed offset adjustment
  • Timestamp objects: converted using a configured regional timezone
  • Date objects: applied a millisecond-based offset

Further analysis showed that Debezium may represent the same logical timestamp using different Java types depending on context, leading to inconsistent timezone adjustments.

Step 4: Log-Based Validation

Application logs confirmed that the custom converter initially received correct, timezone-aware values. However, inspection of Kafka message timestamps revealed an unexpected offset, indicating that the data path may have bypassed the intended conversion logic in certain cases.

This discrepancy highlighted that the offset was being applied more than once in some scenarios, depending on how the timestamp was internally represented.

Step 5: Kafka and Runtime Verification

The expert clarified that Kafka brokers do not alter message timestamps. To further isolate the issue, the following verification steps were recommended:

  • Validate system time at the JVM level for all producers
  • Inspect Kafka messages with timestamp printing enabled
  • Explicitly configure Debezium to operate in UTC

These steps were intended to determine whether the offset was introduced before or during message production.

Solution:

Rather than relying on fixed offset adjustments, the expert recommended a more robust and future-proof approach:

  • Add detailed logging to capture incoming timestamp values and object types
  • Dynamically detect whether timezone context is already present
  • Apply conditional conversion logic instead of blanket offsets
  • Standardize Debezium configuration to explicitly use UTC

The expert also reinforced widely accepted best practices for distributed systems:

  • Store all timestamps in UTC
  • Perform timezone conversion only at the application or presentation layer
  • Avoid storing fixed offsets without full timezone context

Conclusion:

This case demonstrates how subtle differences in timestamp handling can lead to significant production issues in CDC-based architectures. When source systems do not store timezone information, assumptions made by connectors and custom logic must be carefully aligned.

Key takeaways include:

  • Not all database date/time types are timezone-aware
  • CDC tools may represent timestamps using multiple internal data types
  • Fixed-offset conversions are fragile across environments
  • UTC-first storage with late-stage localization is the most reliable strategy

By introducing dynamic conversion logic, enhanced observability, and explicit timezone configuration, the customer gained a clear and consistent approach to timestamp handling across all environments.