Flexvolume creates deadlock and Deployment enters into Crashloopbackoff on node reboot


#1

Hi There,
We have a deployment with an init container where we are checking whether flexVolume is ready to be mounted. Here are the details

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: test
  namespace: replicated-namespace
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: xyz
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: xyz
        tier: backend
    spec:
      initContainers:
      - command:
        - /bin/sh
        - -c
        - df $MOUNT_PATH | grep ":6789"
        env:
        - name: MOUNT_PATH
          value: /sharedfs
        image: docker.io/replicated/replicated-operator:stable-2.46.2
        imagePullPolicy: IfNotPresent
        name: check-mount
        resources: {}
        securityContext:
          seLinuxOptions:
            type: spc_t
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /sharedfs
          name: shared-mount
          readOnly: true
      containers:
      ...
      ...
      ...
      restartPolicy: Always
      volumes:
      - flexVolume:
          driver: ceph.rook.io/rook
          fsType: ceph
          options:
            clusterNamespace: rook-ceph
            fsName: rook-shared-fs
        name: shared-mount

Now, whenever we restart the node the Deployment goes into CrashloopBackoff. Even though Due to restart policy the deployment is recreated by k8s, it never recovers.
It works only if we manually delete and recreate the deployment after node restart. Is there any way to get this working?


Replicated shared snapshotter doesn't comes up on node restart
#2

After reboots there is a race condition in which mounting the ceph shared filesystem usually fails. The initContainer is meant to protect against running with an unsuccessful mount. You can force delete the individual pod and allow it to be re-created by Kubernetes.

https://help.replicated.com/docs/kubernetes/packaging-an-application/volumes/#shared-filesystem-initcontainer


#3

How can this be avoided in production, Manual deletion and re-deployment may not be feasible.


#4

If you’re running a clustered setup, those pods would be scheduled on other nodes when one goes down for reboot. They would not be automatically scheduled on the rebooted node (unless in a DaemonSet) so there would not be a race condition.

We’re also looking into ways to prevent or automatically fix the issue for upcoming releases.


#5

Thanks, @areed for the quick response. Yeah in dev env we are facing this issue as we are running single node cluster. In production, it should not be a problem as it is a multi-node cluster.