Overview

Why Kueue?

Kueue is a kubernetes-native system that manages quotas and how jobs consume them. Kueue decides when a job should wait, when a job should be admitted to start (as in pods can be created) and when a job should be preempted (as in active pods should be deleted).

Why use Kueue

You can install Kueue on top of a vanilla Kubernetes cluster. Kueue does not replace any existing Kubernetes components. Kueue is compatible with cloud environments where:

Compute resources are elastic and can be scaled up and down.
Compute resources are heterogeneous (in architecture, availability, price, etc.).

Kueue APIs allow you to express:

Quotas and policies for Fair Sharing among tenants.
Resource fungibility: if a resource flavor is fully utilized, Kueue can admit the job using a different flavor.

A core design principle for Kueue is to avoid duplicating mature functionality in Kubernetes components and well-established third-party controllers. Autoscaling, pod-to-node scheduling and job lifecycle management are the responsibility of cluster-autoscaler, kube-scheduler and kube-controller-manager, respectively. Advanced admission control can be delegated to controllers such as gatekeeper.

Features overview

Job management: Support job queueing based on priorities with different strategies: StrictFIFO and BestEffortFIFO.
Advanced Resource management: Comprising: resource flavor fungibility, Fair Sharing, Cohorts and preemption with a variety of policies between different tenants.
Integrations: Built-in support for popular jobs, e.g. BatchJob, Kubeflow training jobs, RayJob, RayCluster, JobSet, AppWrappers, plain Pod and Pod Groups.
System insight: Build-in prometheus metrics to help monitor the state of the system, and on-demand visibility endpoint for monitoring of pending workloads.
AdmissionChecks: A mechanism for internal or external components to influence whether a workload can be admitted.
Advanced autoscaling support: Integration with cluster-autoscaler’s provisioningRequest via admissionChecks.
All-or-nothing with ready Pods: A timeout-based implementation of All-or-nothing scheduling.
Partial admission and dynamic reclaim: mechanisms to run a job with reduced parallelism, based on available quota, and to release the quota the pods complete..
Mixing training and inference: Simultaneous management of batch workloads along with serving workloads (such as Deployments or StatefulSets)
Multi-cluster job dispatching: called MultiKueue, allows to search for capacity and off-load the main cluster.
Topology-Aware Scheduling: Allows to optimize the pod-pod communication throughput by scheduling aware of the data-center topology.

Job-integrated features

Feature	Batch Job	JobSet	PaddleJob	PytorchJob	TFJob	XGBoostJob	MPIJob	JAXJob	Pod	RayCluster	RayJob	AppWrapper	Deployment	StatefulSet	LeaderWorkerSet
Dynamic Reclaim	+	+							+
MultiKueue	+	+	+	+	+	+	+	+		+	+	+
MultiKueueBatchJobWithManagedBy	+
PartialAdmission	+
Workload Priority Class	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+
FlavorFungibility	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+
ProvisioningACC	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+
QueueVisibility	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+
VisibilityOnDemand	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+
PrioritySortingWithinCohort	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+
LendingLimit	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+
All-or-nothing with ready Pods	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+
Fair Sharing	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+
Topology Aware Scheduling	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+

High-level Kueue operation

High Level Kueue Operation

To learn more about Kueue concepts, see the concepts section.

To learn about different Kueue personas and what you can do with Kueue, see the tasks section.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified May 15, 2025: Support JAX in Kueue using training-operator 1.9 (#4613) (f9cb97fd)