Cluster Queue
A ClusterQueue is a cluster-scoped object that governs a pool of resources such as CPU, memory, and hardware accelerators. A ClusterQueue defines:
- The quotas for the resource flavors that the ClusterQueue manages, with usage limits and order of consumption.
- Fair sharing rules across the multiple ClusterQueues in the cluster.
Only cluster administrators should create ClusterQueue
objects.
A sample ClusterQueue looks like the following:
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
This ClusterQueue admits Workloads if and only if:
- The sum of the CPU requests is less than or equal to 9.
- The sum of the memory requests is less than or equal to 36Gi.
You can specify the quota as a quantity.
Resources
In a ClusterQueue, you can define quotas for multiple compute resources (CPU, memory, GPUs, etc.).
For each resource, you can define quotas for multiple flavors. Flavors represent different variations of a resource (for example, different GPU models). You can define a flavor using a ResourceFlavor object.
In a process called admission, Kueue assigns to the
Workload pod sets a flavor for each resource the pod set
requests.
Kueue assigns the first flavor in the ClusterQueue’s .spec.resourceGroups[*].flavors
list that has enough unused nominalQuota
quota in the ClusterQueue or the
ClusterQueue’s cohort.
Resource Groups
It is possible that multiple resources in a ClusterQueue have the same flavors.
This is typical for cpu
and memory
, where the flavors are generally tied to
a machine family or VM availability policies. To tie two or more resources to
the same set of flavors, you can list them in the same resource group.
An example of a ClusterQueue with multiple resource groups looks like the following:
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "spot"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
- name: "on-demand"
resources:
- name: "cpu"
nominalQuota: 18
- name: "memory"
nominalQuota: 72Gi
- coveredResources: ["gpu"]
flavors:
- name: "vendor1"
resources:
- name: "gpu"
nominalQuota: 10
- name: "vendor2"
resources:
- name: "gpu"
nominalQuota: 10
In the example above, cpu
and memory
belong to one resourceGroup, while gpu
belongs to another.
A resource flavor must belong to at most one resource group.
Namespace selector
You can limit which namespaces can have workloads admitted in the ClusterQueue
by setting a label selector.
in the .spec.namespaceSelector
field.
To allow workloads from all namespaces, set the empty selector {}
to the
spec.namespaceSelector
field.
A sample namespaceSelector
looks like the following:
namespaceSelector:
matchExpressions:
- key: team
operator: In
values:
- team-a
Queueing strategy
You can set different queueing strategies in a ClusterQueue using the
.spec.queueingStrategy
field. The queueing strategy determines how workloads
are ordered in the ClusterQueue and how they are re-queued after an unsuccessful
admission attempt.
The following are the supported queueing strategies:
StrictFIFO
: Workloads are ordered first by priority and then by.metadata.creationTimestamp
. Older workloads that can’t be admitted will block newer workloads, even if the newer workloads fit in the available quota.BestEffortFIFO
: Workloads are ordered the same way asStrictFIFO
. However, older Workloads that can’t be admitted will not block newer Workloads that fit in the available quota.
The default queueing strategy is BestEffortFIFO
.
Cohort
ClusterQueues can be grouped in cohorts. ClusterQueues that belong to the same cohort can borrow unused quota from each other.
To add a ClusterQueue to a cohort, specify the name of the cohort in the
.spec.cohort
field. All ClusterQueues that have a matching spec.cohort
are
part of the same cohort. If the spec.cohort
field is empty, the ClusterQueue
doesn’t belong to any cohort, and thus it cannot borrow quota from any other
ClusterQueue.
Flavors and borrowing semantics
When a ClusterQueue is part of a cohort, Kueue satisfies the following admission semantics:
- When assigning flavors, Kueue goes through the list of flavors in the
relevant ResourceGroup inside ClusterQueue’s
(
.spec.resourceGroups[*].flavors
). For each flavor, Kueue attempts to fit a Workload’s pod set according to the quota defined in the ClusterQueue for the flavor and the unused quota in the cohort. If the Workload doesn’t fit, Kueue evaluates the next flavor in the list. - A Workload’s pod set resource fits in a flavor defined for a ClusterQueue
resource if the sum of requests for the resource:
- Is less than or equal to the unused
nominalQuota
for the flavor in the ClusterQueue; or - Is less than or equal to the sum of unused
nominalQuota
for the flavor in the ClusterQueues in the cohort, and - Is less than or equal to the unused
nominalQuota + borrowingLimit
for the flavor in the ClusterQueue. In Kueue, when (2) and (3) are satisfied, but not (1), this is called borrowing quota.
- Is less than or equal to the unused
- A ClusterQueue can only borrow quota for flavors that the ClusterQueue defines.
- For each pod set resource in a Workload, a ClusterQueue can only borrow quota for one flavor.
Borrowing example
Assume you created the following two ClusterQueues:
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-a-cq"
spec:
namespaceSelector: {} # match all.
cohort: "team-ab"
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-b-cq"
spec:
namespaceSelector: {} # match all.
cohort: "team-ab"
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 12
- name: "memory"
nominalQuota: 48Gi
ClusterQueue team-a-cq
can admit Workloads depending on the following
scenarios:
- If ClusterQueue
team-b-cq
has no admitted Workloads, then ClusterQueueteam-a-cq
can admit Workloads with resources adding up to12+9=21
CPUs and48+36=84Gi
of memory. - If ClusterQueue
team-b-cq
has pending Workloads and the ClusterQueueteam-a-cq
has all itsnominalQuota
quota used, Kueue will admit Workloads in ClusterQueueteam-b-cq
before admitting any new Workloads inteam-a-cq
. Therefore, Kueue ensures thenominalQuota
quota forteam-b-cq
is met.
BorrowingLimit
To limit the amount of resources that a ClusterQueue can borrow from others,
you can set the .spec.resourcesGroup[*].flavors[*].resource[*].borrowingLimit
quantity field.
As an example, assume you created the following two ClusterQueues:
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-a-cq"
spec:
namespaceSelector: {} # match all.
cohort: "team-ab"
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 9
borrowingLimit: 1
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-b-cq"
spec:
namespaceSelector: {} # match all.
cohort: "team-ab"
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 12
In this case, because we set borrowingLimit in ClusterQueue team-a-cq
, if
ClusterQueue team-b-cq
has no admitted Workloads, then ClusterQueue team-a-cq
can admit Workloads with resources adding up to 9+1=10
CPUs.
If, for a given flavor/resource, the borrowingLimit
field is empty or null,
a ClusterQueue can borrow up to the sum of nominal quotas from all the
ClusterQueues in the cohort. So for the yamls listed above, team-b-cq
can
borrow 12+9
CPUs.
Preemption
When there is not enough quota left in a ClusterQueue or its cohort, an incoming Workload can trigger preemption of previously admitted Workloads, based on policies for the ClusterQueue.
A configuration for a ClusterQueue that enables preemption looks like the following:
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-a-cq"
spec:
preemption:
reclaimWithinCohort: Any
withinClusterQueue: LowerPriority
The fields above have the following semantics:
-
reclaimWithinCohort
determines whether a pending Workload can preempt Workloads from other ClusterQueues in the cohort that are using more than their nominal quota. The possible values are: -Never
(default): do not preempt Workloads in the cohort. -LowerPriority
: if the pending Workload fits within the nominal quota of its ClusterQueue, only preempt Workloads in the cohort that have lower priority than the pending Workload. -Any
: if the pending Workload fits within the nominal quota of its ClusterQueue, preempt any Workload in the cohort, irrespective of priority. -
withinClusterQueue
determines whether a pending Workload that doesn’t fit within the nominal quota for its ClusterQueue, can preempt active Workloads in the ClusterQueue. The possible values are:Never
(default): do not preempt Workloads in the ClusterQueue.LowerPriority
: only preempt Workloads in the ClusterQueue that have lower priority than the pending Workload.
Note that an incoming Workload can preempt Workloads both within the ClusterQueue and the cohort. Kueue implements heuristics to preempt as few Workloads as possible, preferring Workloads with these characteristics:
- Workloads belonging to ClusterQueues that are borrowing quota.
- Workloads with the lowest priority.
- Workloads that have been admitted more recently.
What’s next?
- Create local queues
- Create resource flavors if you haven’t already done so.
- Learn how to administer cluster quotas.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.