Node affinity, node anti-affinity, node selectors, and pod priority
These features help control where pods are placed during scheduling. Fine-tuning this helps to ensure pods are placed on the nodes they are best suited for.
For example, you run memory-intensive pods on nodes with more memory and disk-intensive pods on nodes with faster hard drives.
Node selectors allow pods to schedule only on those nodes containing matching labels. Scheduling pods with specific hardware requirements on designated nodes frees up resources on other nodes for the remaining pods.
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
Node affinity is more flexible than node selectors. It can prefer or require that a pod be scheduled on nodes with specific labels or taints. For example, setting requiredDuringSchedulingIgnoredDuringExecution notes that a pod requires specific labels during pod scheduling, but if that changes while the pod is running, do not try to reschedule it. It can also require that a pod be scheduled if labels or taits change by using requiredDuringSchedulingRequiredDuringExecution.
In contrast, node anti-affinity prevents pods from being placed on nodes with specific labels or taints.
In the example below, the pod should schedule the nginx pod to nodes that have the key:value “disktype: ssd” (node affinity) and and schedule pods for the nginx-slow on nodes that do not have “disktype: ssd” set (anti-affinity).
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
–--
apiVersion: v1
kind: Pod
metadata:
name: nginx-slow
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: NotIn
values:
- ssd
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
The pod priority feature allows you to define a relative priority for pods within a namespace. Higher-priority pods get scheduled first when competing resource demands exist, while lower-priority pods can utilize leftover resources without impacting critical workloads, maximizing overall cluster efficiency.
This requires first creating a PriorityClass and then referencing it in a pod spec. Below we see a PriorityClass set to 1000000, indicating a higher priority. These can range from -2147483648 to 1000000000.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority-nonpreempting
value: 1000000
preemptionPolicy: Never
globalDefault: false
description: "This priority class will not cause other pods to be preempted."
The following example shows how to reference a priority class in a pod spec.
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
prio: high
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
priorityClassName: high-priority
Quality of service (QoS)
Quality of Service (QoS) classes in Kubernetes play a significant role in resource optimization by prioritizing how pods access resources (CPU, memory) on a node. They also play a role during pod eviction should resources run low.
Here’s how the three QoS classes contribute to efficient resource utilization:
- Guaranteed (for critical tasks): Guaranteed pods have defined CPU and memory requests and limits set to the same value. Kubernetes reserves resources exclusively for these pods, preventing other pods from consuming them, and guarantees predictable performance for critical workloads even during peak cluster utilization. These pods are also the last to be affected by a shortage of resources.
- Burstable (for controlled usage): These pods have lower resource requests than limits and can utilize additional resources beyond their requests on demand. However, Kubernetes throttles back the containers if they consistently exceed their limits, allowing them to burst for short periods without impacting guaranteed pods and preventing them from hogging resources indefinitely. These pods are evicted only after Best Effort pods have been evicted.
- Best effort (for flexible tasks): These pods have no guaranteed resources, and the Kubernetes scheduler schedules them only when sufficient resources remain on the node after fulfilling requests and the limits of guaranteed and burstable pods. These are suitable for non-critical tasks that can tolerate fluctuations in performance or even temporary pauses during high resource utilization periods. These are the first pods to be evicted if the node is trying to conserve or reclaim resources.