Elastic Workloads
Elastic Workloads (Workload Slices)
Elastic Workloads extend the core Workload
abstraction in Kueue to support dynamic scaling of admitted jobs, without requiring suspension or requeueing.
This is achieved through the use of Workload Slices, which track partial allocations of a parent job’s scale-up and scale-down operations.
This feature enables more responsive and efficient scheduling for jobs that can adapt to changing cluster capacity, particularly in environments with fluctuating workloads or constrained resources.
Dynamic Scaling
Traditionally, a Workload
in Kueue represents a single atomic unit of admission.
Once admitted, it reflects a fixed set of pod replicas and consumes a defined amount of quota. If a job needs to scale up or down, the Workload
must be suspended, removed, or replaced entirely.
While scaling down a workload is relatively straightforward and does not require additional capacity or a new workload slice, scaling up is more involved. It requires additional capacity that must be explicitly requested and admitted by Kueue through a new Workload Slice
.
Note: While scaling down is conceptually similar to Dynamic Reclaim, it is an orthogonal concept, it neither intersects with nor depends on Dynamic Reclaim functionality.
Use Cases
- Dynamically adjusting throughput for embarrassingly parallel jobs.
- Using AI/ML frameworks which support elasticity (Distributed Torch Elastic).
Lifecycle
- Initial Admission: A job is submitted and its first
Workload
is created and admitted. - Scaling Up: If the job requests more parallelism, a new slice is created with the increased replicas count. Once admitted, the new slice replaces the original workload by marking the old one as
Finished
. - Scaling Down: If the job reduces its parallelism, the updated pod count is recorded directly into the existing workload.
- Preemption: Follows the existing workload preemption mechanism.
- Completion: Follows the existing workload completion behavior.
Example
apiVersion: batch/v1
kind: Job
metadata:
name: sample-elastic-job
namespace: default
annotations:
kueue.x-k8s.io/elastic-job: "true"
labels:
kueue.x-k8s.io/queue-name: user-queue
spec:
parallelism: 3
completions: 100
suspend: true
template:
spec:
containers:
- name: dummy-job
image: registry.k8s.io/e2e-test-images/agnhost:2.53
command: [ "/bin/sh" ]
args: [ "-c", "sleep 60" ]
resources:
requests:
cpu: "100m"
memory: "100Mi"
restartPolicy: Never
The example above will result in an admitted workload and 3 running pods. The parallelism can be adjusted (increased or decreased) as long as the job remains in an “Active” state (i.e., not yet completed).
Feature Gate
Elastic Workloads via Workload Slices are gated by the following feature flag:
ElasticJobsViaWorkloadSlices: true
Additionally, Elastic Job behavior must be explicitly enabled on a per-job basis via annotation:
metadata:
annotations:
kueue.x-k8s.io/elastic-job: "true"
Limitations
Currently available only for
batch/v1.Job
workloads.Elastic workloads are not supported for jobs with partial admission enabled.
Attempting to scale jobs with partial admission enabled will result in an admission validation error similar to the following:
Error from server (Forbidden): error when applying patch: error when patching "job.yaml": admission webhook "vjob.kb.io" denied the request: spec.parallelism: Forbidden: cannot change when partial admission is enabled and the job is not suspended
When scaling up a previously admitted job the new workload must reuse the originally assigned flavor, even if other eligible flavors have available capacity.
No Multikueue support.
No Topology-Aware Scheduling (TAS) support.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.