Problem:

The client was experiencing issues with their Wiki.js application running on an AWS EC2 instance, utilizing PostgreSQL as the database and Docker for deployment. After a system crash, attempts to restart the application via the Docker Compose file resulted in errors. The client requested assistance in investigating the problem, stating that while the application could connect to the database, it remained unreachable. They indicated that changing the volume allowed the PostgreSQL database to create a new instance, and the application appeared accessible, but persistent problems persisted.

Process:

Step 1 – Initial Investigation

  • The client provided SSH access to the EC2 instance and outlined the problem, including errors encountered during the Docker Compose startup.
  • Environment Setup: The support agent accessed the EC2 instance using the provided SSH key and navigated to the specified directory to set up the environment.

Step 2 – Analyze Logs and Database Health

  • Log Review: The expert checked the logs, observing that the Wiki.js application was attempting to connect to the PostgreSQL database but failed to become reachable.
  • Testing Connections: The expert executed a curl command to test connectivity on port 3000 and confirmed that the application was not responding.

Step 3 – Identify Root Cause

  • Database Inspection: The expert investigated potential issues within the PostgreSQL database, focusing on the application’s migration process and database integrity.
  • Corruption Discovery: It was determined that a corruption in the pageTree table was causing the migration process to hang, preventing the application from starting correctly.

Step 4 – Implementation of Solutions

  • Data Recovery Strategy: The expert devised a plan to rebuild the database, excluding the problematic table. This involved creating scripts to dump the schema and data while bypassing the corrupted table.
  • Fresh Database Setup: The expert set up a fresh PostgreSQL data folder and imported the cleaned data to restore functionality.

Solution:

  • Database Reconstruction: The expert successfully rebuilt the database by creating a fresh instance and importing data while excluding the corrupted pageTree table. This resolved the hanging migration issue and allowed the application to start properly.
  • Application Accessibility: After completing the restoration, the application became accessible. The expert noted that the pageTree table was a cache layer that could be automatically rebuilt by the application, mitigating concerns about data loss.

Conclusion:

The issue of PostgreSQL database corruption in the Wiki.js application was effectively resolved through systematic investigation and restoration processes. The expert’s recommendations for implementing robust backup strategies, including frequent backups and data integrity checks, were emphasized to prevent future occurrences of similar issues. The client expressed gratitude for the support provided, and the expert suggested revisiting data management practices for uploaded files and re-enabling relevant sections in the Docker Compose configuration.

Overall, the case highlighted the importance of proactive database management and effective communication between support teams and clients in troubleshooting complex application issues.