Repair broken etcd node

If one etcd node is no longer a member of the remaining etcd cluster or fails to connect you need to remove it from the cluster and then add it again:

  1. Stop etcd on the broken node : sudo service etcd stop
  2. delete the data on the broken node sudo rm -r /var/lib/etcd/data/*
  3. delete the wal data on the broken node: sudo rm -r /var/lib/etcd/wal/*
  4. Follow the instructions for etcd runtime-configuration Show archive.org snapshot , remove the broken node from the cluster, then re-add it again and update the etcd config on the broken node with the parameters printed by the add command.
  5. start etcd again

Even if etcd logging is configured to /var/log/etcd/etcd.log it can happen on new hosts (focal) that StandardOutput is only in journal (systemctl status etcd).

Claus-Theodor Riegg Over 6 years ago