Rolling the nodes in a Replicated Kubernetes cluster


#1

This is an example of how to replace all 4 nodes on an HA cluster with 3 masters and 1 worker on GCP.

Worker

  1. On worker-1, run sudo /opt/replicated/shutdown.sh
  2. On master-1 run kubectl drain worker-1 --delete-local-data --ignore-daemonsets
  3. gcloud compute instances delete worker-1
  4. On master-1 run replicatedctl cluster delete-node worker-1
  5. Wait for worker-1 to come back up, then run the worker join script on it from the /cluster page of the console.
  6. Exec into rook-ceph-operator pod in rook-ceph-system namespace and wait for ceph status to show 4 osds and no degraded data redundancy

Master

  1. (Airgap only). On master-3 run
    kubectl patch deploy replicated --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/nodeSelector", "value":{"node-role.kubernetes.io/master":""}}]'
    
  2. On master-3 run
    sudo /opt/replicated/shutdown.sh
    
  3. On another master run
    kubectl drain master-3 --delete-local-data --ignore-daemonsets
    
  4. Run
    gcloud compute instances delete master-3
    
  5. On another master run
    replicatedctl cluster delete-node master-3
    
  6. Wait for master-3 to come back up, then run the master join script on it.
  7. Update load balancer with new IP of master-3.
  8. Exec into rook-ceph-operator pod in rook-ceph-system namespace and wait for ceph status to show 4 osds and no degraded data redundancy.
  9. (airgap only) Wait for the application .airgap bundles and the license .rli file to be replicated to the new master. The file locations can be found with replicatedctl params export | grep AirgapLicensePath and replicatedctl params export | grep AirgapPackagePath.

Repeat master-3 steps 1-8 on master-2 and master-1