Problem:
The client reported that the Prometheus directory inside /var/lib
had grown to 23GB, leading to high disk utilization on /var
and potentially impacting other services. The /var
directory has a total capacity of 200GB, which is shared by other service libraries and log files. Currently, the utilization on /var
is at 80%, and the client expressed the need to clear space to ensure continued smooth operation of all services.
Process:
-
Initial Assessment:
Upon receiving the client’s issue, our expert team conducted an initial assessment to understand the setup and impact of the growing Prometheus directory. They confirmed that the issue was specific to the Prometheus time-series database (TSDB) located at
/var/lib/prometheus/
. -
Identification of Solution:
To address the issue of disk space utilization, our expert team proposed adjusting the retention time of Prometheus data. This involves configuring Prometheus to retain data for a specified period, after which older data is automatically deleted.
-
Implementation:
For virtual machine setups, our experts recommended editing the Prometheus service file (
prometheus.service
) to include a retention policy. Specifically, we advised adding--storage.tsdb.retention.time=7d
to limit data retention to 7 days. This was implemented by modifying the Prometheus service configuration file located typically under/etc/systemd/system/prometheus.service
.
Solution:
By implementing the retention policy adjustment (--storage.tsdb.retention.time=7d
), Prometheus was configured to automatically clean up older data beyond the specified retention period. This effectively prevented the /var
directory from filling up with excessive Prometheus data, thereby stabilizing disk usage and ensuring other services could operate without disruption.
Conclusion:
In conclusion, our approach to managing Prometheus data retention resolved the issue of directory growth and high disk utilization on the client’s system. By implementing targeted configuration changes, we helped optimize resource usage and maintain system stability, ensuring continued reliable operation of Prometheus and other services relying on /var
disk space.