Recovering a Kubernetes Cluster After Losing Control Plane Nodes

In this blog post, I’ll walk you through the process of recovering a Kubernetes cluster after losing two out of three control plane nodes. This scenario can be daunting, as it results in the loss of etcd quorum, rendering the API server inaccessible. However, with the right steps, you can restore your cluster to a working state, as I recently did with a cluster running Kubernetes v1.32.3, deployed with kubeadm, and using Keepalived for high availability (HA). Here’s how I did it.

Background

My cluster originally had three control plane nodes (cp1, cp2, cp3) and two worker nodes (node0, node1). The control plane nodes were configured with a Keepalived virtual IP (VIP) 10.0.0.101 for HA. Unfortunately, cp2 and cp3 were permanently lost, leaving only cp1 (IP: 10.0.0.1). This caused etcd to lose quorum, making the cluster’s API unavailable, even though the worker nodes continued running existing pods.

The goal was to restore the cluster with cp1 as the sole control plane node and then optionally rebuild HA by adding new nodes.

Prerequisites

Access to the remaining control plane node (cp1 in my case).
SSH access with root privileges.
Basic understanding of Kubernetes components (etcd, kube-apiserver, kubelet).
kubectl, kubeadm, and a container runtime (e.g., containerd) installed.
Backup of etcd data (optional but highly recommended).

Step-by-Step Recovery Process

Step 1: Assess the Situation

First, I confirmed the state of the cluster:

kubectl get nodes

Output: Error from server (Forbidden) or Timeout, indicating the API server was down.

Check etcd health:

etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        endpoint health

Output: https://127.0.0.1:2379 is unhealthy: failed to commit proposal: context deadline exceeded.

Check etcd members:

etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        member list

Output showed three members (cp1, cp2, cp3), but only cp1 был доступен.

Since etcd requires a majority (2 out of 3) for quorum, the cluster was stuck.

Step 2: Install etcdctl (if not available)

On cp1, etcdctl wasn’t installed by default. I downloaded it to match the etcd version (3.5.16):

wget https://github.com/etcd-io/etcd/releases/download/v3.5.16/etcd-v3.5.16-linux-amd64.tar.gz
tar -xvf etcd-v3.5.16-linux-amd64.tar.gz
mv etcd-v3.5.16-linux-amd64/etcdctl /usr/local/bin/
mv etcd-v3.5.16-linux-amd64/etcd /usr/local/bin/  # Optional, for manual etcd launch
chmod +x /usr/local/bin/etcdctl /usr/local/bin/etcd
etcdctl version

Step 3: Stop etcd

Since etcd runs as a static pod managed by kubelet, I stopped it by moving its manifest:

mv /etc/kubernetes/manifests/etcd.yaml /tmp/etcd.yaml.backup

Verify it stopped:

crictl ps | grep etcd

Output should be empty.

Step 4: Backup etcd Data

Make a backup of the etcd data directory:

cp -r /var/lib/etcd /var/lib/etcd-backup

Step 5: Force etcd to Run as a Single Node

Since removing cp2 and cp3 via member remove failed due to lack of quorum, force etcd to rebuild as a single-node cluster:

etcd --name cp1 \
     --listen-client-urls https://127.0.0.1:2379,https://10.0.0.1:2379 \
     --advertise-client-urls https://10.0.0.1:2379 \
     --listen-peer-urls https://10.0.0.1:2380 \
     --initial-advertise-peer-urls https://10.0.0.1:2380 \
     --initial-cluster cp1=https://10.0.0.1:2380 \
     --initial-cluster-state existing \
     --data-dir /var/lib/etcd \
     --cert-file=/etc/kubernetes/pki/etcd/server.crt \
     --key-file=/etc/kubernetes/pki/etcd/server.key \
     --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
     --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \
     --peer-key-file=/etc/kubernetes/pki/etcd/peer.key \
     --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
     --peer-client-cert-auth=true \
     --client-cert-auth=true \
     --force-new-cluster &

If you see address already in use, stop any conflicting processes:

pkill etcd
crictl stop <etcd-container-id>

Then rerun the command.

Check health:

etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        endpoint health

Output: https://127.0.0.1:2379 is healthy.

Verify members:

etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        member list

Output: Only cp1 remained.

Stop the manual instance:

pkill etcd

Step 6: Update and Restart etcd Pod

Edit the etcd manifest to match the single-node configuration:

nano /tmp/etcd.yaml.backup

Ensure:

- --initial-cluster=cp1=https://10.0.0.1:2380
- --initial-cluster-state=existing

Move it back:

mv /tmp/etcd.yaml.backup /etc/kubernetes/manifests/etcd.yaml

Restart kubelet:

systemctl restart kubelet

Verify etcd (it can take some time for start, wait patiently):

crictl ps | grep etcd
etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        endpoint health

Step 7: Restore API Server

Verify kube-apiserver manifest:

cat /etc/kubernetes/manifests/kube-apiserver.yaml

Confirm --etcd-servers=https://127.0.0.1:2379. Restarted kubelet:

systemctl restart kubelet

Check:

crictl ps | grep kube-apiserver
kubectl get nodes

Step 8: Clean Up Lost Nodes

Remove cp2 and cp3 from the cluster:

kubectl delete node cp2
kubectl delete node cp3

Step 9: Verify Cluster

Final check:

kubectl get nodes
kubectl get pods -A

The cluster was back online with cp1 as the sole control plane node.

Optional: Restoring High Availability

To rebuild HA:

Generate a certificate key:

kubeadm init phase upload-certs --upload-certs

1.1. Jenerate token with:

kubeadm token create --print-join-command

Join new control plane nodes (e.g., cp4, cp5):

kubeadm join 10.0.0.101:6443 --token <token> --discovery-token-ca-cert-hash <hash> --control-plane --certificate-key <key>

3. Verify nodes:

kubectl get node

Update `--initial-cluster` in `etcd.yaml` on all nodes and restart `kubelet`

The --initial-cluster parameter in the etcd configuration defines the members of the etcd cluster. Initially, we forced etcd to run as a single node (cp1) with:

--initial-cluster=cp1=https://10.0.0.1:2380

Now that cp2 and cp3 are back, you need to include them in the --initial-cluster list so etcd operates as a three-node cluster again, ensuring quorum and HA. Each control plane node runs its own etcd instance, and they must all agree on the cluster membership.

Assumptions

cp1: IP 10.0.0.1
cp2: IP 10.0.0.2 (based on earlier member list output)
cp3: IP 10.0.0.3 (based on earlier member list output)
Each node’s etcd listens on its own IP at port 2380 for peer communication.

Adjust these IPs if they’ve changed in your environment.

Step 1: Verify Current etcd Members

On cp1, check the current etcd membership:

etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        member list

Expected output should now include cp1, cp2, and cp3 (since they’re back). If it still shows only cp1, we’ll need to add the others manually after updating the manifests.

Step 2: Update `etcd.yaml` on Each Node

You need to edit /etc/kubernetes/manifests/etcd.yaml on cp1, cp2, and cp3 to include all three nodes in --initial-cluster.

edit etcd.yaml manifest on all 3 control plane nodes:

nano /etc/kubernetes/manifests/etcd.yaml

Find the command section and update --initial-cluster to:

    - --initial-cluster=cp1=https://10.0.0.1:2380,cp2=https://10.0.0.2:2380,cp3=https://10.0.0.3:2380
    - --initial-cluster-state=existing

Save and exit.

Notes:

Ensure --name matches the node’s name (e.g., --name=cp1 on cp1, --name=cp2 on cp2, etc.).
Verify that --listen-peer-urls and --initial-advertise-peer-urls use the correct IP for each node (e.g., https://10.0.0.2:2380 on cp2).

Step 3: Restart `kubelet` on Each Node

Restart kubelet to apply the changes. Do this one node at a time to maintain etcd quorum:

systemctl restart kubelet

Wait a few seconds (sometimes it can took longer), then check:

crictl ps | grep etcd
etcdctl --endpoints=https://127.0.0.1:2379 \
           --cacert=/etc/kubernetes/pki/etcd/ca.crt \
           --cert=/etc/kubernetes/pki/etcd/server.crt \
           --key=/etc/kubernetes/pki/etcd/server.key \
           endpoint health

Should return healthy.

Step 4: Verify etcd Cluster

On any control plane node, check the cluster:

etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        member list

Expected output:

<member-id-1>, started, cp1, https://10.0.0.1:2380, https://10.0.0.1:2379, false
<member-id-2>, started, cp2, https://10.0.0.2:2380, https://10.0.0.2:2379, false
<member-id-3>, started, cp3, https://10.0.0.3:2380, https://10.0.0.3:2379, false

Check overall health:

etcdctl --endpoints=https://10.0.0.1:2379,https://10.0.0.2:2379,https://10.0.0.3:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        endpoint health

Should show all endpoints as healthy.

Step 5: Verify Kubernetes Cluster

kubectl get nodes

Ensure all nodes are Ready. Check pods:

kubectl get pods -A

Conclusion

Recovering a Kubernetes cluster after losing control plane nodes is challenging but doable. The key was using --force-new-cluster to rebuild etcd as a single node, followed by aligning the manifests and restarting components. Regular etcd backups and HA planning can prevent such scenarios, but this guide proves recovery is possible even in dire situations.

Post Views: 987