Run Plain Pods

Run a single Pod, or a group of Pods as a Kueue-managed job.

This page shows how to leverage Kueue’s scheduling and resource management capabilities when running plain Pods. Kueue supports management of both individual Pods, or Pod groups.

This guide is for batch users that have a basic understanding of Kueue. For more information, see Kueue’s overview.

Before you begin

  1. By default, the integration for v1/pod is not enabled. Learn how to install Kueue with a custom manager configuration and enable the pod integration.

    To allow Kubernetes system pods to be successfully scheduled, you must limit the scope of the pod integration. The recomended mechanism for doing this is using the managedJobsNamespaceSelector.

    One approach is to only enable management only for specific namespaces:

    apiVersion: config.kueue.x-k8s.io/v1beta1
    kind: Configuration
    managedJobsNamespaceSelector:
      matchLabels:
       kueue-managed: "true"
    integrations:
      frameworks:
       - "pod"
    

    An alternate approach is to exempt system namespaces from management:

    apiVersion: config.kueue.x-k8s.io/v1beta1
    kind: Configuration
    managedJobsNamespaceSelector:
       matchExpressions:
       - key: kubernetes.io/metadata.name
         operator: NotIn
         values: [ kube-system, kueue-system ]
    integrations:
      frameworks:
       - "pod"
    

Feature state beta since Kueue v0.10

Prior to Kueue v0.10, the Configuration fields integrations.podOptions.namespaceSelector and integrations.podOptions.podSelector were used instead. Although podOptions is still supported in Kueue v0.10, it is expected to be deprecated in a future release.

  1. Kueue will run webhooks for all created pods if the pod integration is enabled. The webhook namespaceSelector could be used to filter the pods to reconcile. The default webhook namespaceSelector is:

    matchExpressions:
    - key: kubernetes.io/metadata.name
      operator: NotIn
      values: [ kube-system, kueue-system ]
    

    When you install Kueue via Helm, the webhook namespace selector will match the integrations.podOptions.namespaceSelector in the values.yaml.

    Make sure that namespaceSelector never matches the kueue namespace, otherwise the Kueue deployment won’t be able to create Pods.

  2. Pods that belong to other API resources managed by Kueue are excluded from being queued by pod integration. For example, pods managed by batch/v1.Job won’t be managed by pod integration.

  3. Check Administer cluster quotas for details on the initial Kueue setup.

Running a single Pod admitted by Kueue

When running Pods on Kueue, take into consideration the following aspects:

a. Queue selection

The target local queue should be specified in the metadata.labels section of the Pod configuration.

metadata:
  labels:
    kueue.x-k8s.io/queue-name: user-queue

b. Configure the resource needs

The resource needs of the workload can be configured in the spec.containers.

    - resources:
        requests:
          cpu: 3

c. The “managed” label

Kueue will inject the kueue.x-k8s.io/managed=true label to indicate which pods are managed by it.

d. Limitations

  • A Kueue managed Pod cannot be created in kube-system or kueue-system namespaces.
  • In case of preemption, the Pod will be terminated and deleted.

Example Pod

Here is a sample Pod that just sleeps for a few seconds:

apiVersion: v1
kind: Pod
metadata:
  generateName: kueue-sleep-
  labels:
    kueue.x-k8s.io/queue-name: user-queue
spec:
  containers:
    - name: sleep
      image: busybox
      command:
        - sleep
      args:
        - 3s
      resources:
        requests:
          cpu: 3
  restartPolicy: OnFailure

You can create the Pod using the following command:

# Create the pod
kubectl create -f kueue-pod.yaml

Running a group of Pods to be admitted together

In order to run a set of Pods as a single unit, called Pod group, add the “pod-group-name” label, and the “pod-group-total-count” annotation to all members of the group, consistently:

metadata:
  labels:
    kueue.x-k8s.io/pod-group-name: "group-name"
  annotations:
    kueue.x-k8s.io/pod-group-total-count: "2"

Feature limitations

Kueue provides only the minimal required functionality of running Pod groups, just for the need of environments where the Pods are managed by external controllers directly, without a Job-level CRD.

As a consequence of this design decision, Kueue does not re-implement core functionalities that are available in the Kubernetes Job API, such as advanced retry policies. In particular, Kueue does not re-create failed Pods.

This design choice impacts the scenario of preemption. When a Kueue needs to preempt a workload that represents a Pod group, kueue sends delete requests for all of the Pods in the group. It is the responsibility of the user or controller that created the original Pods to create replacement Pods.

Termination

Kueue considers a Pod group as successful, and marks the associated Workload as finished, when the number of succeeded Pods equals the Pod group size.

If a Pod group is not successful, there are two ways you may want to use to terminate execution of a Pod group to free the reserved resources:

  1. Issue a Delete request for the Workload object. Kueue will terminate all remaining Pods.
  2. Set the kueue.x-k8s.io/retriable-in-group: false annotation on at least one Pod in the group (can be a replacement Pod). Kueue will mark the workload as finished once all Pods are terminated.

Example Pod group

Here is a sample Pod group that just sleeps for a few seconds:

---
apiVersion: v1
kind: Pod
metadata:
  generateName: sample-leader-
  labels:
    kueue.x-k8s.io/queue-name: user-queue
    kueue.x-k8s.io/pod-group-name: "sample-group"
  annotations:
    kueue.x-k8s.io/pod-group-total-count: "2"
spec:
  restartPolicy: Never
  containers:
  - name: sleep
    image: busybox
    command: ["sh", "-c", 'echo "hello world from the leader pod" && sleep 3']
    resources:
      requests:
        cpu: 3
---
apiVersion: v1
kind: Pod
metadata:
  generateName: sample-worker-
  labels:
    kueue.x-k8s.io/queue-name: user-queue
    kueue.x-k8s.io/pod-group-name: "sample-group"
  annotations:
    kueue.x-k8s.io/pod-group-total-count: "2"
spec:
  restartPolicy: Never
  containers:
  - name: sleep
    image: busybox
    command: ["sh", "-c", 'echo "hello world from the worker pod" && sleep 2']
    resources:
      requests:
        cpu: 3

You can create the Pod group using the following command:

kubectl create -f kueue-pod-group.yaml

The name of the associated Workload created by Kueue equals the name of the Pod group. In this example it is sample-group, you can inspect the workload using:

kubectl describe workload/sample-group