Autoscaling in Kubernetes?


#1

In Kubernetes, what’s the best way to increase the number of service instances as the number of nodes in the cluster increases? For example, I might run a Deployment with 1 or 2 replicas to start, but if several nodes are added I’d like to scale this up to match the number of nodes in the cluster.


#2

If you want to run with #replicas=#nodes, there’s a simple solution - daemonsets. This will run one pod on each node, no matter how many nodes there are.

I’ll address scaling based on load in a future comment.


#3

Scaling pods based on load is not quite as easy as using a daemonset to run one pod on each node, but is doable. The Horizontal Pod Autoscaler allows scaling the number of nodes based on the consumption of a resource, as shown here:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

That would autoscale a deployment named ‘php-apache’ from 1 to 10 replicas, targeting an average CPU usage of 50%. Be sure to set CPU limits unless you want the container to scale based on its CPU usage as a fraction of the instance it is running on!

When designing a deployment to be autoscaled, it can also be worthwhile to setup pod anti-affinity to reduce the chance of multiple copies of the same pod competing for a limited resource on one node, while no copies run on another. In general, affinities can be extremely powerful tools. The following php-apache deployment will prefer to schedule pods on nodes that do not already have a php-apache pod running, but will also attempt to be colocated in the same failure domain as a database pod.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
  labels:
    app: php-apache
spec:
  selector:
    matchLabels:
      app: php-apache
  template:
    metadata:
      labels:
         app: php-apache
    spec:
      affinity:
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 5
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - database
            topologyKey: failure-domain.beta.kubernetes.io/zone
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 10
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - php-apache
              topologyKey: kubernetes.io/hostname
      containers:
      - name: php-apache
        image: php:5-apache

However powerful it is, this autoscaler has requirements of its own. In order to function, the kubernetes metrics server must be installed on your cluster. As of 2.31.1, this is not included in Replicated installations by default and will need to be included in your app yaml - and preferably run within your app’s namespace. Adding resources to the kube-system namespace is not supported.

Autoscaling is more complicated, but also more powerful. Choose the right tool for the job, and make sure to scale based on the limiting resource!


#4

Here is another example which is similar to the example above, except it includes the kubernetes metrics server also. The metrics server resources are deployed into replicated namespace except for metrics-server-auth-reader RoleBinding which needs to be set against kube-system namespace.

---
# kind: scheduler-kubernetes

apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: '{{repl Namespace}}'

---
# kind: scheduler-kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: '{{repl Namespace}}'
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.6
        args:
          - --cert-dir=/tmp
          - --secure-port=4443
          - --kubelet-insecure-tls
        ports:
        - name: main-port
          containerPort: 4443
          protocol: TCP
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        imagePullPolicy: Always
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp
      nodeSelector:
        beta.kubernetes.io/os: linux
---
# kind: scheduler-kubernetes

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:aggregated-metrics-reader
  labels:
    rbac.authorization.k8s.io/aggregate-to-view: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
rules:
- apiGroups: ["metrics.k8s.io"]
  resources: ["pods", "nodes"]
  verbs: ["get", "list", "watch"]
---
# kind: scheduler-kubernetes

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: '{{repl Namespace}}'

---
# kind: scheduler-kubernetes

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: '{{repl Namespace}}'

---
# kind: scheduler-kubernetes

apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: '{{repl Namespace}}'
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

---
# kind: scheduler-kubernetes

apiVersion: v1
kind: Service
metadata:
  name: metrics-server
  namespace: '{{repl Namespace}}'
  labels:
    kubernetes.io/name: "Metrics-server"
    kubernetes.io/cluster-service: "true"
spec:
  selector:
    k8s-app: metrics-server
  ports:
  - port: 443
    protocol: TCP
    targetPort: main-port
---
# kind: scheduler-kubernetes

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - nodes/stats
  - namespaces
  verbs:
  - get
  - list
  - watch
---
# kind: scheduler-kubernetes

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: '{{repl Namespace}}'

---
# kind: scheduler-kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: php-apache
  name: php-apache
  namespace: '{{repl Namespace}}'
spec:
  replicas: 1
  selector:
    matchLabels:
      run: php-apache
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - image: k8s.gcr.io/hpa-example
        imagePullPolicy: Always
        name: php-apache
        ports:
        - containerPort: 80
          protocol: TCP
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m

---
# kind: scheduler-kubernetes

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
  namespace: '{{repl Namespace}}'
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  targetCPUUtilizationPercentage: 50