Table of Contents

Kubernetes Nodepool Scheduling

Summary: This wiki page shows how I configure my AKS nodepools and migrate pods between nodepools if needed.
Date: 2 January 2026

I would like to start with explaining what nodepools are, especially in Azure Kubernetes Service (AKS). However, sometimes, the documentation is just very good:

In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools. Node pools contain the underlying VMs that run your applications. System node pools and user node pools are two different node pool modes for your AKS clusters. System node pools serve the primary purpose of hosting critical system pods such as CoreDNS and metrics-server. User node pools serve the primary purpose of hosting your application pods.

Nodepool Pod Scheduling Management

Pod scheduling in Kubernetes is managed using (among others) using taints and tolerations. Taints are applied to nodes and allow a node to repel a set of pods unless those pods have a matching toleration. Tolerations are applied to pods and allow (but do not require) the pods to schedule onto nodes with matching taints. On AKS, labels are also an important part of the scheduling process.
I usually try to keep it simple, by using these directives for nodepools:

Additionally, I use the following application (pod) scheduling directives:

Note: The number of nodepools should be kept low, because each nodepool will have a node that is not used to it's maximum capacity, adding costs (and complexity).

Nodepool Setup

The setup below shows an example of the above setup:

Note that the name of a node pool can only contain lowercase alphanumeric characters and must begin with a lowercase letter. For Linux node pools, the length must be between 1-12 characters. For Windows node pools, the length must be between 1-6 characters.

With the nodepools above, te setup is that system pods (like CoreDNS) get scheduled on the system nodepool, all other pods get scheduled on the npusrdefault nodepool unless they have a toleration for either the npmobileapp or nprisk nodepools.

System

So, to make sure a pod get scheduled on the system pool, we do set the following nodeAffinity rule:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.azure.com/mode
            operator: In
            values:
              - system

This rule makes sure the pod only gets scheduled on nodes that have the label `kubernetes.azure.com/mode=system`, which is only true for the system nodepool. But we also need to set a toleration, because the system nodepool has a taint:

tolerations:
  - key: "CriticalAddonsOnly"
  operator: Exists

Combined, these setting will make sure the pod gets scheduled on the system nodepool.

User Nodepools

The user nodepools are require the same setup, but obviously with different values. First we need the a nodeAffinity rule that makes sure the pod only gets scheduled on user nodepools. Depending on your preference you can use a 'NotIn' or an 'In' operator:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.azure.com/mode
              operator: NotIn
              values:
                - system
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.azure.com/mode
            operator: In
            values:
              - user

Either of these rules will work, to make sure the pods will only be scheduled on user nodepools. However, depending on the nodepool you want the pod to be scheduled on, you also need to set a toleration:

tolerations:
  - key: "pool"
  operator: "Equal"
  value: "mobile"
  effect: "NoSchedule"
Note: Change the value to risk for the nprisk nodepool.

Default User Nodepool

The default user nodepool does not have a taint, so that any pod can always be scheduled. I prefer this because I favor uptime to control. This is however a personal preference, and different use cases might require different setups. Note that this means that even when setting a nodeAffinity to one of the additional user nodepools, the pod can still be scheduled on the default user nodepool, for example when the additional user nodepool is full or not available.

Additional Affinity

If you need to prevent the situation that pods get scheduled on the default user nodepool, additional affinity rules are required. In AKS, the agentpool label can be used for this purpose:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.azure.com/agentpool
            operator: In
            values:
              - npmobileapp

This however will break flexibility in case of migrations, upgrades or something like new naming conventions. This can be dealt with by adding more values like this:

# Migrating from the user nodepool to the npapp01 nodepool
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.azure.com/agentpool
            operator: In
            values:
              - npmobileapp1
              - npmobileapp2

This wiki has been made possible by: