Problem:

On a primary Sonatype Nexus instance running version 3.29.2-02, container images stored in Docker repositories were disappearing without any recorded user DELETE calls. The operator reported unexpected removals, noted release notes referencing a prior cleanup-policy UI bug (where day values were misinterpreted as seconds), and observed that repository artifacts had been cleaned automatically. The environment contains many Docker repositories sharing a single blob store.

Process:

Step 1: Triage the initial symptom set

Observed the customer report describing missing images, Nexus version 3.29.2-02, and the reference to the earlier cleanup-policy UI bug. This established two immediate hypotheses: corrupted cleanup policy values persisted from earlier versions, or a scheduled task performed deletions. That determination drove the next diagnostics: check cleanup policies, scheduled tasks, and recent task execution history.

Step 2: Verify cleanup policies and scheduled cleanup task state

Reviewed the requested admin screenshots and instructions. The customer confirmed there were no configured cleanup policies under Administration → Repository → Cleanup Policies and the policy-driven cleanup task had no effective policies. This ruled out the UI-corruption-of-policy-values hypothesis as the proximate cause and required pivoting to task and log analysis.

Step 3: Request system diagnostics and logs

Requested a support bundle and targeted log excerpts (system info, tasks list, nexus.log, request.log) and asked for a full Tasks page screenshot. Collecting these artifacts enabled correlation between scheduled task runs and deletion events instead of relying on configuration inspection alone.

Step 4: Correlate scheduled tasks with deletion events

Analyzed the Tasks list and discovered an active Docker-format garbage-collection task (Delete unused manifests and images) scheduled on an advanced cadence. Task execution timestamps aligned with the timeframe when images disappeared. This finding shifted focus from policy code to the Docker GC task behavior.

Step 5: Log analysis to identify the deleting actor

Reviewed nexus.log and request.log entries around the deletion window. Request logs showed no external DELETE requests, while system logs indicated deletions performed by an internal scheduled process (actor shown as system). This confirmed deletions were performed by an internal task rather than an API client, and therefore implicated the Docker GC task. Additional inspection revealed multiple compact-blob-store tasks running on schedule, which would permanently purge soft-deleted blobs if left enabled.

Step 6: Environment risk factors and decision to act

Observed the environment had a large number of Docker repositories sharing a single blob store, increasing the risk that a GC pass in one repository could remove shared layers referenced elsewhere. Based on that risk and log correlation, the mitigation decision was to stop scheduled internal deletions and protect soft-deleted blobs, then monitor to confirm cessation of data loss prior to any recovery or corrective upgrade work.

Step 7: Implement immediate mitigation and prepare remediation path

Disabled the Docker GC scheduled task and disabled all compact blob store tasks to prevent permanent purging. After disabling, monitored the repository browser and logs for 24–48 hours; deletions ceased. With immediate bleeding stopped, prepared an upgrade and configuration hardening plan to remove the underlying bug risk and reduce blast radius going forward.

Solution:

Immediate remediation: disabled the Sonatype Nexus “Delete unused manifests and images” Docker GC task and disabled all compact blob store tasks. This prevented further scheduled internal deletions and stopped compaction that would permanently remove soft-deleted blobs, allowing investigation and potential recovery from backups.

Longer-term remediation: plan an upgrade off 3.29.2-02 to a release containing Docker GC fixes, and separate high-value repositories onto dedicated blob stores. The upgrade addresses the underlying GC correctness bugs seen in older releases; separating blob stores reduces risk that a GC pass in one repository removes layers still referenced by others.

Conclusion:

After disabling the Docker GC and compaction tasks, image deletions stopped and immediate data-loss risk was mitigated. The changes restored operational control while an upgrade and repository/blobs reorganisation were planned to eliminate the root cause and reduce future risk.