Run A Kubernetes Job
This page shows you how to run a Job in a Kubernetes cluster with Kueue enabled.
The intended audience for this page are batch users.
Before you begin
Make sure the following conditions are met:
- A Kubernetes cluster is running.
- The kubectl command-line tool has communication with your cluster.
- Kueue is installed.
- The cluster has quotas configured.
The following picture shows all the concepts you will interact with in this tutorial:
0. Identify the queues available in your namespace
Run the following command to list the LocalQueues
available in your namespace.
kubectl -n default get localqueues
# Or use the 'queues' alias.
kubectl -n default get queues
The output is similar to the following:
NAME CLUSTERQUEUE PENDING WORKLOADS
user-queue cluster-queue 3
The ClusterQueue defines the quotas for the Queue.
1. Define the Job
Running a Job in Kueue is similar to running a Job in a Kubernetes cluster without Kueue. However, you must consider the following differences:
- You should create the Job in a suspended state, as Kueue will decide when it’s the best time to start the Job.
- You have to set the Queue you want to submit the Job to. Use the
kueue.x-k8s.io/queue-name
label. - You should include the resource requests for each Job Pod.
Here is a sample Job with three Pods that just sleep for a few seconds.
apiVersion: batch/v1
kind: Job
metadata:
generateName: sample-job-
namespace: default
labels:
kueue.x-k8s.io/queue-name: user-queue
spec:
parallelism: 3
completions: 3
suspend: true
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
args: ["30s"]
resources:
requests:
cpu: 1
memory: "200Mi"
restartPolicy: Never
2. Run the Job
You can run the Job with the following command:
kubectl create -f sample-job.yaml
Internally, Kueue will create a corresponding Workload for this Job with a matching name.
kubectl -n default get workloads
The output will be similar to the following:
NAME QUEUE RESERVED IN ADMITTED AGE
sample-job-sl4bm user-queue 1s
3. (Optional) Monitor the status of the workload
You can see the Workload status with the following command:
kubectl -n default describe workload sample-job-sl4bm
If the ClusterQueue doesn’t have enough quota to run the Workload, the output will be similar to the following:
Name: sample-job-sl4bm
Namespace: default
Labels: <none>
Annotations: <none>
API Version: kueue.x-k8s.io/v1beta1
Kind: Workload
Metadata:
...
Spec:
...
Status:
Conditions:
Last Probe Time: 2022-03-28T19:43:03Z
Last Transition Time: 2022-03-28T19:43:03Z
Message: workload didn't fit
Reason: Pending
Status: False
Type: Admitted
Events: <none>
When the ClusterQueue has enough quota to run the Workload, it will admit the Workload. To see if the Workload was admitted, run the following command:
kubectl -n default get workloads
The output is similar to the following:
NAME QUEUE RESERVED IN ADMITTED AGE
sample-job-sl4bm user-queue cluster-queue True 1s
To view the event for the Workload admission, run the following command:
kubectl -n default describe workload sample-job-sl4bm
The output is similar to the following:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Admitted 50s kueue-manager Admitted by ClusterQueue cluster-queue
To continue monitoring the Workload progress, you can run the following command:
kubectl -n default describe workload sample-job-sl4bm
Once the Workload has finished running, the output is similar to the following:
...
Status:
Conditions:
...
Last Probe Time: 2022-03-28T19:43:37Z
Last Transition Time: 2022-03-28T19:43:37Z
Message: Job finished successfully
Reason: JobFinished
Status: True
Type: Finished
...
To review more details about the Job status, run the following command:
kubectl -n default describe job sample-job-sl4bm
The output is similar to the following:
Name: sample-job-sl4bm
Namespace: default
...
Start Time: Mon, 28 Mar 2022 15:45:17 -0400
Completed At: Mon, 28 Mar 2022 15:45:49 -0400
Duration: 32s
Pods Statuses: 0 Active / 3 Succeeded / 0 Failed
Pod Template:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Suspended 22m job-controller Job suspended
Normal CreatedWorkload 22m kueue-job-controller Created Workload: default/sample-job-sl4bm
Normal SuccessfulCreate 19m job-controller Created pod: sample-job-sl4bm-7bqld
Normal Started 19m kueue-job-controller Admitted by clusterQueue cluster-queue
Normal SuccessfulCreate 19m job-controller Created pod: sample-job-sl4bm-7jw4z
Normal SuccessfulCreate 19m job-controller Created pod: sample-job-sl4bm-m7wgm
Normal Resumed 19m job-controller Job resumed
Normal Completed 18m job-controller Job completed
Since events have a timestamp with a resolution of seconds, the events might be listed in a slightly different order from which they actually occurred.
Partial admission
Kueue provides the ability for a batch user to create Jobs that ideally will run with a parallelism P0
but can accept a smaller parallelism, Pn
, if the Job dose not fit within the available quota.
Kueue will only attempt to decrease the parallelism after both borrowing and preemption was taken into account in the admission process, and none of them are feasible.
To allow partial admission you can provide the minimum acceptable parallelism Pmin
in kueue.x-k8s.io/job-min-parallelism
annotation of the Job, Pn
should be grater that 0 and less that P0
. When a Job is partially admitted its parallelism will be set to Pn
, Pn
will be set to the maximum acceptable value between Pmin
and P0
. The Job’s completions count will not be changed.
For example, a Job defined by the following manifest:
apiVersion: batch/v1
kind: Job
metadata:
name: sample-job-partial-admission
namespace: default
labels:
kueue.x-k8s.io/queue-name: user-queue
annotations:
kueue.x-k8s.io/job-min-parallelism: "5"
spec:
parallelism: 20
completions: 20
suspend: true
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
args: ["30s"]
resources:
requests:
cpu: 1
memory: "200Mi"
restartPolicy: Never
When queued in a ClusterQueue with only 9 CPUs available, it will be admitted with parallelism=9
. Note that the number of completions doesn’t change.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.