Longhorn is a robust distributed block storage solution for Kubernetes. It offers built‑in capabilities like disaster recovery (DR) volumes, snapshots, and backups. But occasionally, these helpful mechanisms can turn into blockers – especially when outdated references leave resources in a broken state.
In this article, I’ll walk through a real scenario where several Longhorn DR volumes got stuck in a deleting state for days because they referenced backups that no longer existed. We’ll explore what caused the issue, what didn’t work, and how disabling admission webhooks allowed us to clean up the mess safely.
The Problem
I discovered four DR volumes that stayed stuck in the deleting state, consuming resources and refusing to budge. Their Robustness was listed as faulted and attempts to delete them failed.
NAME STATE ROBUSTNESS
pvc-1111dede-aaaa-bbbb-cccc-111122223333 deleting faulted
pvc-2222dede-aaaa-bbbb-cccc-111122223333 deleting faulted
pvc-3333dede-aaaa-bbbb-cccc-111122223333 deleting faulted
pvc-4444dede-aaaa-bbbb-cccc-111122223333 deleting faultedWhat Caused It
Each volume had a spec.fromBackup field with a reference to a deleted backup:
spec:
fromBackup: cifs://192.168.50.10/longhorn-backups?backup=backup-deadbeef&volume=pvc-1111dede-aaaa-bbbb-cccc-111122223333Longhorn’s admission webhook validates resources at the Kubernetes API layer. So when the referenced backup no longer exists, every action (delete, patch, edit) gets rejected.
Example error:
The request is invalid: spec.fromBackup: failed to inspect the backup config
cifs://192.168.50.10/longhorn-backups?...: backup.longhorn.io "backup-deadbeef" not foundThis means: as long as the field exists, the API server will refuse the request.
What Didn’t Work
I tried several approaches that normally work for stuck Kubernetes resources.
Force Delete
kubectl delete volume pvc-1111... -n longhorn-system --force --grace-period=0Still rejected – webhook validation runs before deletion.
Remove Finalizers
kubectl patch volume <name> -n longhorn-system \
--type json -p '[{"op":"remove","path":"/metadata/finalizers"}]'Rejected – same validation.
Remove fromBackup
kubectl patch volume <name> -n longhorn-system \
--type json -p '[{"op":"remove","path":"/spec/fromBackup"}]'Rejected – the field is the source of the validation failure.
Raw K8s API Calls
Even direct API calls like:
kubectl replace --raw ...were blocked.
Restart Controllers
Longhorn kept enforcing validation no matter what.
The Fix
The admission webhooks are what blocks requests. They sit at the Kubernetes API layer – before anything hits Longhorn controllers.
So, to fix the stuck state, temporarily remove the webhooks, clean the resources, and restore them.
Step 1: Remove Webhooks
kubectl delete validatingwebhookconfiguration longhorn-webhook-validator
kubectl delete mutatingwebhookconfiguration longhorn-webhook-mutatorImmediately after this, the API server stops enforcing backup validation.
Step 2: Strip Broken Fields & Finalizers
Example:
kubectl patch volume pvc-1111dede-aaaa-bbbb-cccc-111122223333 -n longhorn-system \
--type json -p '[
{"op":"remove","path":"/spec/fromBackup"},
{"op":"remove","path":"/metadata/finalizers"}
]'Or batch process via script (shown later).
Step 3: Verify
kubectl get volume <name> -n longhorn-systemShould return NotFound.
Step 4: Restore Webhooks
Restart Longhorn manager:
kubectl rollout restart daemonset longhorn-manager -n longhorn-systemThe webhooks will be recreated automatically.
Why This Works
- Webhooks validate before controllers run
- Removing them temporarily disables validation
- Once fields are removed, the resource is valid again
- Webhooks get recreated automatically
Prevention
A few easy wins:
- Delete DR volumes before deleting their backups
- Enable retention policies
- Audit backup references periodically:
kubectl get volumes -n longhorn-system -o json | \
jq '.items[] | select(.spec.fromBackup != null) | {name:.metadata.name, fromBackup:.spec.fromBackup}'Takeaways
- Longhorn admission webhooks can block deletion of invalid resources
- Force deletion doesn’t work when validation rejects requests
- Temporarily disabling webhooks is the cleanest fix
- Longhorn restores them automatically on restart
