Problem:
The client reported recurring kernel messages appearing in the system logs while running OpenZFS on a virtualized Linux server. Using dmesg -T, the customer observed approximately five identical stack traces per second, raising concerns about potential impact on production stability and data integrity.
The log messages consistently referenced memory allocation paths within ZFS and SPL modules, including:
spl_kmem_zalloc.cold.3multilist_createdmu_objset_syncdsl_pool_syncspa_synctxg_sync_thread
The environment was identified as a VMware-based virtual machine running:
- OS: RHEL 8.10 (kernel 4.18.0-553.el8_10)
- ZFS version: OpenZFS 2.2.8
The client’s primary concern was whether these frequent kernel warnings indicated a serious fault that could affect production workloads.
Process:
Step 1: Initial Assessment and Log Review
The expert reviewed the provided dmesg output and identified that the messages were warning-level stack traces, not kernel panics or fatal errors. The traces originated during ZFS transaction group (TXG) synchronization, a normal background operation in OpenZFS.
Step 2: Environment and Configuration Analysis
Additional system information was collected, including ZFS pool status, disk layout, CPU and memory statistics, and I/O metrics. Key observations included:
- All ZFS pools were ONLINE with no read, write, or checksum errors.
- Recent ZFS scrubs completed successfully with 0B repaired.
- Disk I/O activity was high but consistent with a heavily utilized system.
- The server showed a high load average and a large number of runnable processes, indicating sustained system load.
Step 3: Memory Pressure Evaluation
The stack traces pointed to temporary memory allocation failures inside ZFS. The expert explained that:
- ZFS aggressively uses available RAM for caching (ARC).
- Under high memory pressure, allocation attempts may fail momentarily.
- When this occurs, ZFS logs a warning and retries the allocation.
Because allocations were eventually successful, no follow-up error messages or crashes occurred.
Step 4: Validation Through System Metrics
System metrics (vmstat, iostat, uptime) confirmed that:
- CPU utilization remained moderate.
- Memory was abundant, but actively used by cache and applications.
- I/O throughput was high, consistent with active ZFS workloads.
Solution:
The expert concluded that the observed messages were harmless ZFS warning traces caused by temporary memory pressure during synchronization tasks.
Key findings included:
- No ZFS pool corruption or disk failures were detected.
- No kernel panics, crashes, or service interruptions occurred.
- ZFS successfully recovered from the allocation retries without impact.
Recommendations provided to the client were:
- No immediate action required: The warnings can safely be ignored in their current form.
- Monitoring: Continue monitoring system load and memory usage.
- Preventive tuning (optional): Review ARC size limits (
zfs_arc_min,zfs_arc_max) if memory pressure increases. - Log review: Periodically review
/var/log/messages*to ensure warnings do not escalate into persistent allocation failures.
Conclusion:
The frequent ZFS-related stack traces observed in the kernel logs were warning-only messages triggered by transient memory allocation retries under high system load. All ZFS pools remained healthy, and no evidence indicated a risk to production data or system stability.
The case confirmed that the environment was operating within acceptable parameters. With continued monitoring and optional memory tuning, the client can safely continue running production workloads without concern.