Replicated Pod stuck in ContainerCreating state


#1

Symptom

The Replicated Kubernetes install script can hang with a spinner at the Await Replicated Ready step. This may be an indication that its PersistentVolume cannot be mounted. This happens when the kubelet service has failed to detect that the FlexVolume plugins in the /usr/libexec/kubernetes/kubelet-plugins/volume/exec directory have been added.

Fix

The fix is to run systemctl restart kubelet. Kubelet will probe the volume plugin directory when it restarts and be able to mount the Persistent Volume to the Replicated pod.

Investigation

There are a couple ways to confirm that the problem stems from kubelet’s dynamic volume plugin discovery mechanism.

  1. Run journalctl -u kubelet | grep desired_state_of_world_populator. You should see error logs containing the message Failed to add volume "replicated-persistent"

  2. Use kubectl to get the logs of the Rook agent pod running in the rook-ceph-system. The last line of the logs should be agent-cluster: start watching cluster resources, indicating that it has never been called by the FlexVolume binary to mount a PersistentVolume to a Pod. Note that you will have to find the the agent running on the node with the failed mount. During install there will be only one node and therefore only one agent.