Removing nodes from Kubernetes clusters


#1

In an HA Kubernetes cluster the REK operator will automatically purge failed nodes that have been unreachable for more than an hour. For master nodes this includes the following steps:

  1. Delete the Deployment resource for the OSD from the rook-ceph namespace
  2. Exec into the Rook operator pod and run the command ceph osd purge <id>
  3. Delete the Node resource
  4. Remove the node from the CephCluster resource named rook-ceph in the rook-ceph namespace unless storage is managed automatically with useAllNodes: true
  5. (Masters only) Connect to the etcd cluster and remove the peer
  6. (Masters only) Remove the apiEndpoint for the node from the kubeadm-config ConfigMap in the kube-system namespace

All of these steps can be performed manually if needed. For removing etcd peers, exec into one of the remaining etcd pods in the kube-system namespace. You can use the etcdctl CLI there with the certificates mounted in /etc/kubernetes/pki/etcd:

$ cd /etc/kubernetes/pki/etcd
$ etcdctl --endpoints=https://127.0.0.1:2379 --cert-file=healthcheck-client.crt --key-file=healthcheck-client.key --ca-file=ca.crt member list

a1316b56d7099abf: name=node-k7d4 peerURLs=https://10.128.0.124:2380 clientURLs=https://10.128.0.124:2379 isLeader=false
ab67f9f870c32907: name=node-wbf1 peerURLs=https://10.128.0.125:2380 clientURLs=https://10.128.0.125:2379 isLeader=false
d9228c5ac755a5c6: name=node-hrrm peerURLs=https://10.128.0.123:2380 clientURLs=https://10.128.0.123:2379 isLeader=true

$ etcdctl --endpoints=https://127.0.0.1:2379 --cert-file=healthcheck-client.crt --key-file=healthcheck-client.key --ca-file=ca.crt member remove a1316b56d7099ab