Run A Job

Run a Job in a Kubernetes cluster with Kueue enabled.

This page shows you how to run a Job in a Kubernetes cluster with Kueue enabled.

The intended audience for this page are batch users.

Before you begin

Make sure the following conditions are met:

The following picture shows all the concepts you will interact with in this tutorial:

Kueue Components

0. Identify the queues available in your namespace

Run the following command to list the LocalQueues available in your namespace.

kubectl -n default get localqueues
# Or use the 'queues' alias.
kubectl -n default get queues

The output is similar to the following:

NAME         CLUSTERQUEUE    PENDING WORKLOADS
user-queue   cluster-queue   3

The ClusterQueue defines the quotas for the Queue.

1. Define the Job

Running a Job in Kueue is similar to running a Job in a Kubernetes cluster without Kueue. However, you must consider the following differences:

  • You should create the Job in a suspended state, as Kueue will decide when it’s the best time to start the Job.
  • You have to set the Queue you want to submit the Job to. Use the kueue.x-k8s.io/queue-name label.
  • You should include the resource requests for each Job Pod.

Here is a sample Job with three Pods that just sleep for a few seconds. This sample is also available in github.com/kubernetes-sigs/kueue/blob/main/config/samples/sample-job.yaml.

# sample-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  generateName: sample-job-
  labels:
    kueue.x-k8s.io/queue-name: user-queue
spec:
  parallelism: 3
  completions: 3
  suspend: true
  template:
    spec:
      containers:
      - name: dummy-job
        image: gcr.io/k8s-staging-perf-tests/sleep:latest
        args: ["30s"]
        resources:
          requests:
            cpu: 1
            memory: "200Mi"
      restartPolicy: Never

2. Run the Job

You can run the Job with the following command:

kubectl create -f sample-job.yaml

Internally, Kueue will create a corresponding Workload for this Job with a matching name.

kubectl -n default get workloads

The output will be similar to the following:

NAME               QUEUE         ADMITTED BY     AGE
sample-job-sl4bm   user-queue                    1s

3. (Optional) Monitor the status of the workload

You can see the Workload status with the following command:

kubectl -n default describe workload sample-job-sl4bm

If the ClusterQueue doesn’t have enough quota to run the Workload, the output will be similar to the following:

Name:         sample-job-sl4bm
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  kueue.x-k8s.io/v1beta1
Kind:         Workload
Metadata:
  ...
Spec:
  ...
Status:
  Conditions:
    Last Probe Time:       2022-03-28T19:43:03Z
    Last Transition Time:  2022-03-28T19:43:03Z
    Message:               workload didn't fit
    Reason:                Pending
    Status:                False
    Type:                  Admitted
Events:               <none>

When the ClusterQueue has enough quota to run the Workload, it will admit the Workload. To see if the Workload was admitted, run the following command:

kubectl -n default get workloads

The output is similar to the following:

NAME               QUEUE         ADMITTED BY     AGE
sample-job-sl4bm   user-queue    cluster-queue   45s

To view the event for the Workload admission, run the following command:

kubectl -n default describe workload sample-job-sl4bm

The output is similar to the following:

...
Events:
  Type    Reason    Age   From           Message
  ----    ------    ----  ----           -------
  Normal  Admitted  50s   kueue-manager  Admitted by ClusterQueue cluster-queue

To continue monitoring the Workload progress, you can run the following command:

kubectl -n default describe workload sample-job-sl4bm

Once the Workload has finished running, the output is similar to the following:

...
Status:
  Conditions:
    ...
    Last Probe Time:       2022-03-28T19:43:37Z                                                                                                                      
    Last Transition Time:  2022-03-28T19:43:37Z                                                                                                                      
    Message:               Job finished successfully                                                                                                                 
    Reason:                JobFinished                                                                                                                               
    Status:                True                                                                                                                                      
    Type:                  Finished
...

To review more details about the Job status, run the following command:

kubectl -n default describe job sample-job-sl4bm

The output is similar to the following:

Name:             sample-job-sl4bm
Namespace:        default
...
Start Time:       Mon, 28 Mar 2022 15:45:17 -0400
Completed At:     Mon, 28 Mar 2022 15:45:49 -0400
Duration:         32s
Pods Statuses:    0 Active / 3 Succeeded / 0 Failed
Pod Template:
  ...
Events:
  Type    Reason            Age   From                  Message
  ----    ------            ----  ----                  -------
  Normal  Suspended         22m   job-controller        Job suspended
  Normal  CreatedWorkload   22m   kueue-job-controller  Created Workload: default/sample-job-sl4bm
  Normal  SuccessfulCreate  19m   job-controller        Created pod: sample-job-sl4bm-7bqld
  Normal  Started           19m   kueue-job-controller  Admitted by clusterQueue cluster-queue
  Normal  SuccessfulCreate  19m   job-controller        Created pod: sample-job-sl4bm-7jw4z
  Normal  SuccessfulCreate  19m   job-controller        Created pod: sample-job-sl4bm-m7wgm
  Normal  Resumed           19m   job-controller        Job resumed
  Normal  Completed         18m   job-controller        Job completed

Since events have a timestamp with a resolution of seconds, the events might be listed in a slightly different order from which they actually occurred.

Feedback

Was this page helpful?


Last modified March 23, 2023: add diagrams of kueue (#643) (1027290)