Skip to content

Fixing Stuck Longhorn DR Volumes

Longhorn is a robust distributed block storage solution for Kubernetes. It offers built‑in capabilities like disaster recovery (DR) volumes, snapshots, and backups. But occasionally, these helpful mechanisms can turn into blockers – especially when outdated references leave resources in a broken state.

In this article, I’ll walk through a real scenario where several Longhorn DR volumes got stuck in a deleting state for days because they referenced backups that no longer existed. We’ll explore what caused the issue, what didn’t work, and how disabling admission webhooks allowed us to clean up the mess safely.

The Problem

I discovered four DR volumes that stayed stuck in the deleting state, consuming resources and refusing to budge. Their Robustness was listed as faulted and attempts to delete them failed.

NAME                                     STATE    ROBUSTNESS
pvc-1111dede-aaaa-bbbb-cccc-111122223333 deleting faulted
pvc-2222dede-aaaa-bbbb-cccc-111122223333 deleting faulted
pvc-3333dede-aaaa-bbbb-cccc-111122223333 deleting faulted
pvc-4444dede-aaaa-bbbb-cccc-111122223333 deleting faulted

What Caused It

Each volume had a spec.fromBackup field with a reference to a deleted backup:

spec:
  fromBackup: cifs://192.168.50.10/longhorn-backups?backup=backup-deadbeef&volume=pvc-1111dede-aaaa-bbbb-cccc-111122223333

Longhorn’s admission webhook validates resources at the Kubernetes API layer. So when the referenced backup no longer exists, every action (delete, patch, edit) gets rejected.

Example error:

The request is invalid: spec.fromBackup: failed to inspect the backup config
cifs://192.168.50.10/longhorn-backups?...: backup.longhorn.io "backup-deadbeef" not found

This means: as long as the field exists, the API server will refuse the request.

What Didn’t Work

I tried several approaches that normally work for stuck Kubernetes resources.

Force Delete

kubectl delete volume pvc-1111... -n longhorn-system --force --grace-period=0

Still rejected – webhook validation runs before deletion.

Remove Finalizers

kubectl patch volume <name> -n longhorn-system \
  --type json -p '[{"op":"remove","path":"/metadata/finalizers"}]'

Rejected – same validation.

Remove fromBackup

kubectl patch volume <name> -n longhorn-system \
  --type json -p '[{"op":"remove","path":"/spec/fromBackup"}]'

Rejected – the field is the source of the validation failure.

Raw K8s API Calls

Even direct API calls like:

kubectl replace --raw ...

were blocked.

Restart Controllers

Longhorn kept enforcing validation no matter what.

The Fix

The admission webhooks are what blocks requests. They sit at the Kubernetes API layer – before anything hits Longhorn controllers.

So, to fix the stuck state, temporarily remove the webhooks, clean the resources, and restore them.

Step 1: Remove Webhooks

kubectl delete validatingwebhookconfiguration longhorn-webhook-validator
kubectl delete mutatingwebhookconfiguration longhorn-webhook-mutator

Immediately after this, the API server stops enforcing backup validation.

Step 2: Strip Broken Fields & Finalizers

Example:

kubectl patch volume pvc-1111dede-aaaa-bbbb-cccc-111122223333 -n longhorn-system \
  --type json -p '[
    {"op":"remove","path":"/spec/fromBackup"},
    {"op":"remove","path":"/metadata/finalizers"}
  ]'

Or batch process via script (shown later).

Step 3: Verify

kubectl get volume <name> -n longhorn-system

Should return NotFound.

Step 4: Restore Webhooks

Restart Longhorn manager:

kubectl rollout restart daemonset longhorn-manager -n longhorn-system

The webhooks will be recreated automatically.

Why This Works

  • Webhooks validate before controllers run
  • Removing them temporarily disables validation
  • Once fields are removed, the resource is valid again
  • Webhooks get recreated automatically

Prevention

A few easy wins:

  • Delete DR volumes before deleting their backups
  • Enable retention policies
  • Audit backup references periodically:
kubectl get volumes -n longhorn-system -o json | \
  jq '.items[] | select(.spec.fromBackup != null) | {name:.metadata.name, fromBackup:.spec.fromBackup}'

Takeaways

  • Longhorn admission webhooks can block deletion of invalid resources
  • Force deletion doesn’t work when validation rejects requests
  • Temporarily disabling webhooks is the cleanest fix
  • Longhorn restores them automatically on restart
Published inKubernetesLinux