Why Kueue?

Kueue is a kubernetes-native system that manages quotas and how jobs consume them. Kueue decides when a job should wait, when a job should be admitted to start (as in pods can be created) and when a job should be preempted (as in active pods should be deleted).

Why use Kueue

You can install Kueue on top of a vanilla Kubernetes cluster. Kueue does not replace any existing Kubernetes components. Kueue is compatible with cloud environments where:

  • Compute resources are elastic and can be scaled up and down.
  • Compute resources are heterogeneous (in architecture, availability, price, etc.).

Kueue APIs allow you to express:

  • Quotas and policies for fair sharing among tenants.
  • Resource fungibility: if a resource flavor is fully utilized, Kueue can admit the job using a different flavor.

A core design principle for Kueue is to avoid duplicating mature functionality in Kubernetes components and well-established third-party controllers. Autoscaling, pod-to-node scheduling and job lifecycle management are the responsibility of cluster-autoscaler, kube-scheduler and kube-controller-manager, respectively. Advanced admission control can be delegated to controllers such as gatekeeper.

Features overview

  • Job management: Support job queueing based on priorities with different strategies: StrictFIFO and BestEffortFIFO.
  • Resource management: Support resource fair sharing and preemption with a variety of policies between different tenants.
  • Dynamic resource reclaim: A mechanism to release quota as the pods of a Job complete.
  • Resource flavor fungibility: Quota borrowing or preemption in ClusterQueue and Cohort.
  • Integrations: Built-in support for popular jobs, e.g. BatchJob, Kubeflow training jobs, RayJob, RayCluster, JobSet, plain Pod.
  • System insight: Built-in prometheus metrics to help monitor the state of the system, as well as Conditions.
  • AdmissionChecks: A mechanism for internal or external components to influence whether a workload can be admitted.
  • Advanced autoscaling support: Integration with cluster-autoscaler’s provisioningRequest via admissionChecks.
  • Sequential admission: A simple implementation of all-or-nothing scheduling.
  • Partial admission: Allows jobs to run with a smaller parallelism, based on available quota, if the application supports it.

High-level Kueue operation

