Fixing Kubernetes Disk Pressure Issue: A Guide to Effective Garbage Collection

Kubernetes, the popular container orchestration platform, offers a robust and scalable environment for deploying, managing, and scaling containerized applications. However, as with any complex system, challenges can arise. One common issue that Kubernetes administrators may encounter is disk pressure. This occurs when nodes in the cluster run out of disk space, hindering the proper functioning of applications and the overall health of the cluster. In this blog post, we’ll explore the concept of Kubernetes disk pressure, its implications, and how to address it using garbage collection, including the use of --image-gc-high-threshold and --image-gc-low-threshold parameters, along with specific steps for MicroK8s users.

Understanding Kubernetes Disk Pressure:

Disk pressure in Kubernetes refers to a situation where a node’s available disk space becomes critically low, potentially leading to disruptions in application deployment and operation. Several factors contribute to disk pressure, including:

Pod Logs and Events: Pods generate logs and events that are stored on the node’s filesystem. If not managed properly, these logs can accumulate and consume significant disk space.
Container Images: Docker images, which are the building blocks of containers, can accumulate on nodes over time. If the nodes are not configured to periodically clean up unused images, it can lead to disk space exhaustion.
Unused Volumes: Persistent volumes that are no longer in use by any pod can accumulate and consume valuable disk space.

Addressing Disk Pressure with Garbage Collection:

Garbage collection is the process of identifying and removing unused or unnecessary resources, helping to free up disk space and optimize the cluster’s performance. Here’s how you can leverage garbage collection to mitigate disk pressure in your Kubernetes cluster, including the use of --image-gc-high-threshold and --image-gc-low-threshold parameters:

Configure Log Rotation:
Implement log rotation mechanisms for your application and system logs. Kubernetes supports log rotation, and you can configure log retention policies to ensure that logs do not accumulate indefinitely.
Image Pruning:
Enable automatic image pruning by configuring the kubelet to remove unused Docker images regularly.
image-gc-high-threshold: The percent of disk usage which triggers image garbage collection. Default is 85%.
image-gc-low-threshold: The percent of disk usage to which image garbage collection attempts to free. Default is 80%.
Automate Volume Cleanup:
Regularly inspect and clean up unused persistent volumes. You can use tools like Velero or custom scripts to identify and delete volumes that are no longer associated with any running pods.
Tune Node Capacity:
Adjust the resource requests and limits for your pods to prevent them from consuming excessive disk space. Properly sizing your nodes and pods helps in avoiding resource contention and disk pressure issues.

Vanilla kubernetes steps:

vi /var/lib/kubelet/config.yaml

add this:

imageGCHighThresholdPercent: 75
imageGCLowThresholdPercent: 70

Microk8s-Specific Steps:

For Microk8s users, follow these steps to adjust the --image-gc-high-threshold and --image-gc-low-threshold parameters:

Edit the file /var/snap/microk8s/current/args/kubelet and add the following lines:

--image-gc-high-threshold=70
--image-gc-low-threshold=65

Save the file.
Restart the Microk8s kubelet service or microk8s itself:

systemctl restart snap.microk8s.daemon-kubelite
or
microk8s.stop && microk8s.start

Kubernetes disk pressure is a common challenge in containerized environments, but with proactive management and the implementation of garbage collection strategies—tailored to the specific needs of your Kubernetes distribution, such as Microk8s—administrators can ensure the stability and efficiency of their clusters. By regularly cleaning up unused resources, tuning configurations, and monitoring disk usage—utilizing features like adjusted --image-gc-high-threshold and --image-gc-low-threshold values—you can prevent disk pressure issues and maintain a healthy Kubernetes deployment.

Post Views: 5,177