In Kubernetes, what’s the best way to increase the number of service instances as the number of nodes in the cluster increases? For example, I might run a Deployment with 1 or 2 replicas to start, but if several nodes are added I’d like to scale this up to match the number of nodes in the cluster.
If you want to run with #replicas=#nodes
, there’s a simple solution - daemonsets. This will run one pod on each node, no matter how many nodes there are.
I’ll address scaling based on load in a future comment.
Scaling pods based on load is not quite as easy as using a daemonset to run one pod on each node, but is doable. The Horizontal Pod Autoscaler allows scaling the number of nodes based on the consumption of a resource, as shown here:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
That would autoscale a deployment named ‘php-apache’ from 1 to 10 replicas, targeting an average CPU usage of 50%. Be sure to set CPU limits unless you want the container to scale based on its CPU usage as a fraction of the instance it is running on!
When designing a deployment to be autoscaled, it can also be worthwhile to setup pod anti-affinity to reduce the chance of multiple copies of the same pod competing for a limited resource on one node, while no copies run on another. In general, affinities can be extremely powerful tools. The following php-apache deployment will prefer to schedule pods on nodes that do not already have a php-apache pod running, but will also attempt to be colocated in the same failure domain as a database pod.
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
labels:
app: php-apache
spec:
selector:
matchLabels:
app: php-apache
template:
metadata:
labels:
app: php-apache
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 5
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- database
topologyKey: failure-domain.beta.kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 10
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- php-apache
topologyKey: kubernetes.io/hostname
containers:
- name: php-apache
image: php:5-apache
However powerful it is, this autoscaler has requirements of its own. In order to function, the kubernetes metrics server must be installed on your cluster. As of 2.31.1, this is not included in Replicated installations by default and will need to be included in your app yaml - and preferably run within your app’s namespace. Adding resources to the kube-system
namespace is not supported.
Autoscaling is more complicated, but also more powerful. Choose the right tool for the job, and make sure to scale based on the limiting resource!
Here is another example which is similar to the example above, except it includes the kubernetes metrics server also. The metrics server resources are deployed into replicated namespace except for metrics-server-auth-reader
RoleBinding
which needs to be set against kube-system
namespace.
---
# kind: scheduler-kubernetes
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: '{{repl Namespace}}'
---
# kind: scheduler-kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: '{{repl Namespace}}'
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
ports:
- name: main-port
containerPort: 4443
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
imagePullPolicy: Always
volumeMounts:
- name: tmp-dir
mountPath: /tmp
nodeSelector:
beta.kubernetes.io/os: linux
---
# kind: scheduler-kubernetes
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:aggregated-metrics-reader
labels:
rbac.authorization.k8s.io/aggregate-to-view: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rules:
- apiGroups: ["metrics.k8s.io"]
resources: ["pods", "nodes"]
verbs: ["get", "list", "watch"]
---
# kind: scheduler-kubernetes
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: '{{repl Namespace}}'
---
# kind: scheduler-kubernetes
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: '{{repl Namespace}}'
---
# kind: scheduler-kubernetes
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: '{{repl Namespace}}'
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
---
# kind: scheduler-kubernetes
apiVersion: v1
kind: Service
metadata:
name: metrics-server
namespace: '{{repl Namespace}}'
labels:
kubernetes.io/name: "Metrics-server"
kubernetes.io/cluster-service: "true"
spec:
selector:
k8s-app: metrics-server
ports:
- port: 443
protocol: TCP
targetPort: main-port
---
# kind: scheduler-kubernetes
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
verbs:
- get
- list
- watch
---
# kind: scheduler-kubernetes
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: '{{repl Namespace}}'
---
# kind: scheduler-kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: php-apache
name: php-apache
namespace: '{{repl Namespace}}'
spec:
replicas: 1
selector:
matchLabels:
run: php-apache
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
run: php-apache
spec:
containers:
- image: k8s.gcr.io/hpa-example
imagePullPolicy: Always
name: php-apache
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
# kind: scheduler-kubernetes
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: '{{repl Namespace}}'
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
targetCPUUtilizationPercentage: 50