Run Workloads with Topology-Aware Scheduling
This page shows you how to run workloads that use Topology-Aware Scheduling (TAS) in a Kubernetes cluster with Kueue enabled. The examples use a batch Job, but the same annotations work with any workload type that Kueue supports.
The intended audience for this page are batch users.
For conceptual details about how Kueue models cluster topology and how the TAS scheduling algorithm works, see Topology-Aware Scheduling concepts.
Before you begin
Make sure the following conditions are met:
- A Kubernetes cluster is running.
- The kubectl command-line tool has communication with your cluster.
- Kueue is installed.
- The
TopologyAwareSchedulingfeature gate is enabled (beta, on by default since Kueue v0.14). - Your administrator has
configured TAS
by creating a
Topologyobject, aResourceFlavorthat references it viaspec.topologyName, and aClusterQueuethat uses that flavor.
If you do not already have a Kubernetes cluster, you can create a TAS-ready
kind cluster. The config creates worker nodes labeled with
cloud.provider.com/node-group, cloud.provider.com/topology-block, and
cloud.provider.com/topology-rack, matching the sample topology used on this
page:
curl -L https://kueue.sigs.k8s.io/examples/tas/kind-cluster.yaml -o kind-cluster-tas.yaml
kind create cluster --name kueue-tas --config kind-cluster-tas.yaml
Then install Kueue. For example, to install the latest development version:
kubectl apply --server-side -k "github.com/kubernetes-sigs/kueue/config/default?ref=main"
Enable the TAS feature gate if needed
The TopologyAwareScheduling feature gate is enabled by default since Kueue
v0.14. If your Kueue installation manages feature gates explicitly, enable it
using one of the following methods.
For a manifests-based installation, edit the kueue-manager-config ConfigMap:
kubectl -n kueue-system edit configmap kueue-manager-config
Set featureGates.TopologyAwareScheduling to true in the
controller_manager_config.yaml data entry. Keep the rest of the existing
configuration unchanged:
apiVersion: v1
kind: ConfigMap
metadata:
name: kueue-manager-config
namespace: kueue-system
data:
controller_manager_config.yaml: |
apiVersion: config.kueue.x-k8s.io/v1beta2
kind: Configuration
featureGates:
TopologyAwareScheduling: true
Restart the controller manager after updating the ConfigMap:
kubectl -n kueue-system rollout restart deployment/kueue-controller-manager
Alternatively, add the feature gate to the manager container arguments in the
kueue-controller-manager Deployment:
args:
- --config=/controller_manager_config.yaml
- --feature-gates=TopologyAwareScheduling=true
If the container already has a --feature-gates argument, append
,TopologyAwareScheduling=true to the existing value instead of adding a
second --feature-gates argument.
Use either the ConfigMap field or the --feature-gates argument, not both.
For more details, see
Change the feature gates configuration.
Create the sample TAS setup
If your cluster does not already have TAS queues configured, apply the sample setup:
kubectl apply -f https://kueue.sigs.k8s.io/examples/tas/sample-queues.yaml
This setup creates:
- a
Topologywith block, rack, and hostname levels; - a
ResourceFlavornamedtas-flavorthat selects the labeledkindnodes and references that topology; - a
ClusterQueuenamedtas-cluster-queuewith CPU and memory quota for the flavor; - a
LocalQueuenamedtas-user-queuein thedefaultnamespace.
1. Identify the queues available in your namespace
Run the following command to list the LocalQueues available in your namespace.
kubectl -n default get localqueues
The output is similar to the following:
NAME CLUSTERQUEUE PENDING WORKLOADS
tas-user-queue tas-cluster-queue 0
The ClusterQueue defines the quotas for the
Queue, and its ResourceFlavor is what binds the queue to a Topology.
2. Choose a topology scheduling mode
TAS exposes three Pod-template annotations, each expressing a different placement intent for the PodSet. Pick the one that matches your workload:
- Use required when pods must communicate tightly and any cross-domain placement would degrade performance unacceptably.
- Use preferred when same-domain placement is desirable but the workload can still make progress when distributed.
- Use unconstrained when you do not care about topology but you want Kueue’s TAS bookkeeping to fill small gaps on existing nodes and reduce fragmentation.
You set exactly one of these annotations on the Pod template’s
metadata.annotations (not on the Job’s top-level metadata).
Required: same-domain placement
The kueue.x-k8s.io/podset-required-topology annotation requires Kueue to
schedule all pods of the PodSet within a single topology domain at the
indicated level. If no single domain has enough capacity, the workload waits.
metadata:
annotations:
kueue.x-k8s.io/podset-required-topology: "cloud.provider.com/topology-rack"
A full example:
apiVersion: batch/v1
kind: Job
metadata:
generateName: tas-sample-required
labels:
kueue.x-k8s.io/queue-name: tas-user-queue
spec:
parallelism: 10
completions: 10
completionMode: Indexed
template:
metadata:
annotations:
kueue.x-k8s.io/podset-required-topology: "cloud.provider.com/topology-rack"
spec:
containers:
- name: dummy-job
image: registry.k8s.io/e2e-test-images/agnhost:2.53
args: ["pause"]
resources:
requests:
cpu: "1"
memory: "200Mi"
restartPolicy: Never
Preferred: best-effort same-domain placement
The kueue.x-k8s.io/podset-preferred-topology annotation asks Kueue to fit
the PodSet within a single topology domain at the indicated level. If that
fails, Kueue evaluates levels above the indicated one, one by one. If the
PodSet does not fit even at the highest level, it is admitted distributed
across multiple domains.
metadata:
annotations:
kueue.x-k8s.io/podset-preferred-topology: "cloud.provider.com/topology-block"
A full example:
apiVersion: batch/v1
kind: Job
metadata:
generateName: tas-sample-preferred
labels:
kueue.x-k8s.io/queue-name: tas-user-queue
spec:
parallelism: 40
completions: 40
completionMode: Indexed
template:
metadata:
annotations:
kueue.x-k8s.io/podset-preferred-topology: "cloud.provider.com/topology-block"
spec:
containers:
- name: dummy-job
image: registry.k8s.io/e2e-test-images/agnhost:2.53
args: ["pause"]
resources:
requests:
cpu: "1"
memory: "200Mi"
restartPolicy: NeverUnconstrained: minimize fragmentation
The kueue.x-k8s.io/podset-unconstrained-topology annotation tells Kueue to
schedule pods on any nodes without topology considerations. Kueue still
tracks placement so it can pack pods into existing partially-used nodes,
which helps minimize fragmentation across the cluster.
The annotation value is the literal string "true", not a topology label:
metadata:
annotations:
kueue.x-k8s.io/podset-unconstrained-topology: "true"
A full example:
apiVersion: batch/v1
kind: Job
metadata:
generateName: tas-sample-unconstrained
labels:
kueue.x-k8s.io/queue-name: tas-user-queue
spec:
parallelism: 10
completions: 10
completionMode: Indexed
template:
metadata:
annotations:
kueue.x-k8s.io/podset-unconstrained-topology: "true"
spec:
containers:
- name: dummy-job
image: registry.k8s.io/e2e-test-images/agnhost:2.53
args: ["pause"]
resources:
requests:
cpu: "1"
memory: "200Mi"
restartPolicy: Never
3. Run the workload
Submit any of the examples above using kubectl create:
kubectl create -f https://kueue.sigs.k8s.io/examples/tas/sample-job-required.yaml
kubectl create -f https://kueue.sigs.k8s.io/examples/tas/sample-job-preferred.yaml
kubectl create -f https://kueue.sigs.k8s.io/examples/tas/sample-job-unconstrained.yaml
Internally, Kueue creates a corresponding Workload
for the Job and runs the TAS scheduling algorithm against the matching
ResourceFlavor’s Topology.
4. (Optional) Monitor the topology assignment
List Workloads in the namespace:
kubectl -n default get workloads.kueue.x-k8s.io
Check whether the Workload was admitted:
kubectl -n default describe workload <workload-name>
Look for the Admitted condition in the Conditions section and any
scheduling events in Events.
To see the concrete topology placement Kueue chose, inspect the
status.admission.podSetAssignments[].topologyAssignment field:
kubectl -n default get workloads.kueue.x-k8s.io <workload-name> -o yaml
That field lists the topology levels and domain values into which each PodSet was placed. It is the authoritative signal that TAS was applied.
Advanced topics
The page above covers the three basic TAS annotations. Kueue also supports additional placement behavior:
- Placement strategies - TAS uses greedy packing strategies to choose
domains. The default strategy is
BestFit. When the betaTASProfileMixedfeature gate is enabled, which is the default since Kueue v0.15, TAS usesLeastFreeCapacityfor unconstrained placement andBestFitfor required and preferred placement. - Balanced placement - the alpha
TASBalancedPlacementfeature gate makes preferred placement distribute pods or slices more evenly across the selected domains. This is useful for workloads with all-to-all communication patterns where a placement such as 6 pods in one rack and 6 pods in another rack is better than 10 pods in one rack and 2 pods in another. See Balanced Placement. - PodSet groups - co-locate multiple PodSets of a single workload in the
same topology domain using
kueue.x-k8s.io/podset-group-name. See Configure Topology Aware Scheduling for LeaderWorkerSet for a working example. - PodSet slices - split a PodSet into fixed-size slices, each pinned to a
single domain, using
kueue.x-k8s.io/podset-slice-required-topologyandkueue.x-k8s.io/podset-slice-size. See the Topology-Aware Scheduling concepts page. - Multi-layer topology - express slice constraints at up to three
topology layers in one annotation using
kueue.x-k8s.io/podset-slice-required-topology-constraints. This is controlled by the alphaTASMultiLayerTopologyfeature gate; see the Multi-Layer Topology section of the concepts page.
Troubleshooting
Workload not admitted
If the Workload stays in Pending state:
- Run
kubectl -n default describe workload <workload-name>and look at theEventssection for admission rejection reasons. - Verify that the
ResourceFlavorselected for the Workload references the expectedTopologyinspec.topologyName. - Verify that the referenced
Topologyexists, and that everyspec.levels[].nodeLabelvalue exists as a label key on the cluster’s nodes. - Verify that the topology label in your same-domain placement annotation, such
as
kueue.x-k8s.io/podset-required-topologyorkueue.x-k8s.io/podset-preferred-topology, matches one of the referencedTopologyobject’sspec.levels[].nodeLabelvalues. - For
requiredplacement, verify at least one domain at the requested level has enough free capacity to fit the entire PodSet.
You can inspect node labels with:
kubectl get nodes --show-labels
Annotation has no effect
If the topology annotation appears to be ignored entirely (pods are admitted
as if the annotation were absent), the most common cause is that the
ResourceFlavor selected for the workload does not have spec.topologyName
set. Without it, Kueue does not run TAS for that flavor.
For general troubleshooting, see the troubleshooting guide.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.