If one etcd node is no longer a member of the remaining etcd cluster or fails to connect you need to remove it from the cluster and add it again.
Upstream documentation
Make sure to read and understand the detailed instructions for etcd runtime reconfiguration Show archive.org snapshot .
Re-adding a faulty node
On the faulty node
sudo systemctl stop etcd
sudo mv /var/lib/etcd /tmp/etcd.old
sudo mkdir -p /var/lib/etcd/{data,wal}
sudo chown -R etcd: /var/lib/etcd
On a working node
Warning
Double-check to use the correct hostname
Bash
etcdctl member list -w table
etcdctl member remove $ID_OF_FAULTY_NODE
etcdctl member add $NAME_OF_FAULTY_NODE --peer-urls=$PEER_ADDRS_OF_FAULTY_NODE
AWK
# define the hostname you wish to re-add as variable
member=FQDN_OF_FAULTY_NODE
# re-add the host using an awk script
etcdctl member list | awk -F ', ' -v member="$member" '{
if ( member == $3 ){
system("etcdctl member remove " $1);
system("etcdctl member add " $3 " --peer-urls=" $4);
}
}'
On the faulty node again
sudo sed -i -e 's/new/existing/g' /etc/etcd/etcd.cfg
sudo systemctl restart etcd
Logging
Even if etcd
is configured to write to /var/log/etcd/etcd.log
it can happen on new hosts (focal) that STDOUT gets written to the systemd journal. You can check it with sudo journalctl -efu etcd
Help, my terminal gets spammed with error messages
You are probably connected to a patroni cluster node. When patronictl
cannot connect to etcd
it will try over and over again until it succeeds. Since the etcd
-cluster is broken this won't succeed. You need to run the following:
killall patronictl