Set Up Dynamic Resource Allocation

Configure Kueue to manage quota for workloads using Kubernetes Dynamic Resource Allocation (DRA).

This page shows you how to configure Kueue to account for DRA devices in quota management.

The intended audience for this page are batch administrators.

For conceptual details, see Dynamic Resource Allocation concepts. For instructions on submitting workloads with DRA devices, see Run Workloads With DRA Devices.

Before you begin

Make sure the following conditions are met:

Choose a quota accounting path

Kueue supports two paths for accounting DRA devices in quota. Choose the one that matches how your users submit workloads:

PathUser’s Pod specKueue feature gateAdmin configuration
ResourceClaimTemplateReferences a ResourceClaimTemplateDynamicResourceAllocationdeviceClassMappings required
Extended resourceUses resources.requests (e.g., nvidia.com/gpu: 1)DynamicResourceAllocation + DRAExtendedResourcesNo mapping needed

Set up the ResourceClaimTemplate path

Feature state alpha since Kueue v0.14

Use this path when your users submit workloads that explicitly reference ResourceClaimTemplate objects.

1. Enable the feature gate

Install or reconfigure Kueue with the DynamicResourceAllocation feature gate enabled. Follow the custom configuration installation instructions.

2. Configure deviceClassMappings

Add a deviceClassMappings entry to the Kueue Configuration that maps each DeviceClass to a logical resource name for quota:

apiVersion: config.kueue.x-k8s.io/v1beta2
kind: Configuration
featureGates:
  DynamicResourceAllocation: true
resources:
  deviceClassMappings:
  - name: example.com/gpu           # Logical resource name for quota
    deviceClassNames:
    - gpu.example.com               # DeviceClass name(s)
  • name: The resource name used in ClusterQueue quotas and Workload status.
  • deviceClassNames: One or more DeviceClass names that map to this resource.

Multiple device classes can map to the same logical resource name. For example, if you have separate device classes for different GPU models but want a single quota pool:

resources:
  deviceClassMappings:
  - name: example.com/gpu
    deviceClassNames:
    - gpu-a100.example.com
    - gpu-h100.example.com

3. Add the DRA resource to your ClusterQueue

Include the logical resource name from deviceClassMappings in the coveredResources of your ClusterQueue:

apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
  name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: "cluster-queue"
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["cpu", "memory", "example.com/gpu"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 40
      - name: "memory"
        nominalQuota: 200Gi
      - name: "example.com/gpu"
        nominalQuota: 8
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
  namespace: "default"
  name: "user-queue"
spec:
  clusterQueue: "cluster-queue"
kubectl apply -f https://kueue.sigs.k8s.io/examples/dra/sample-dra-queues.yaml

The example.com/gpu resource in the ClusterQueue corresponds to the name field in deviceClassMappings. Each device request referencing a mapped DeviceClass consumes count units of this quota (default 1 when omitted).

Set up the extended resource path

Feature state alpha since Kueue v0.17

Use this path when your users submit workloads using the standard resources.requests syntax (e.g., nvidia.com/gpu: 1) and a DeviceClass with spec.extendedResourceName exists in the cluster.

1. Enable the feature gates

Install or reconfigure Kueue with both feature gates enabled:

apiVersion: config.kueue.x-k8s.io/v1beta2
kind: Configuration
featureGates:
  DynamicResourceAllocation: true
  DRAExtendedResources: true

The Kubernetes cluster also needs the DRAExtendedResource feature gate enabled on kube-apiserver and kube-scheduler. This is alpha (disabled by default) in Kubernetes 1.34.

2. Verify the DeviceClass

Ensure the DeviceClass has spec.extendedResourceName set. This is typically configured by the DRA driver or cluster administrator:

kubectl get deviceclass gpu.example.com -o jsonpath='{.spec.extendedResourceName}'

If you need to create or update the DeviceClass:

apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
  name: gpu.example.com
spec:
  extendedResourceName: example.com/gpu
  selectors:
  - cel:
      expression: device.driver == "gpu.example.com"

No deviceClassMappings configuration is needed for this path. Kueue auto-discovers the mapping by indexing DeviceClass objects.

3. Add the extended resource to your ClusterQueue

The coveredResources must include the extended resource name that matches spec.extendedResourceName on the DeviceClass:

apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
  name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: "cluster-queue"
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["cpu", "memory", "example.com/gpu"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 40
      - name: "memory"
        nominalQuota: 200Gi
      - name: "example.com/gpu"
        nominalQuota: 8
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
  namespace: "default"
  name: "user-queue"
spec:
  clusterQueue: "cluster-queue"
kubectl apply -f https://kueue.sigs.k8s.io/examples/dra/sample-dra-queues.yaml

Why this path exists

Without the DRAExtendedResources feature gate, Kueue charges quota for both the resources.requests entry and the auto-created ResourceClaim, double counting the same device. With the feature gate enabled, Kueue detects the matching DeviceClass and charges quota only for the extended resource.

Path separation

The two paths are independent. Do not configure the same DeviceClass in both paths for the same workload. If overlap occurs, Kueue merges the resources using the deviceClassMappings logical name as the quota key, which may result in incorrect quota accounting.

There is a timing gap between Kueue admitting a workload and the kube-scheduler allocating the actual device. If the cluster state changes between these two steps, the scheduler may fail to allocate. Enabling WaitForPodsReady provides a safety net by evicting workloads that fail to become ready within a configured timeout, allowing them to be re-queued and retried.

MultiKueue considerations

DRA workloads are supported with MultiKueue. MultiKueue syncs the workload and its owning job to worker clusters, but ResourceClaimTemplate and DeviceClass objects are not automatically synced. These must be created on each worker cluster separately.