Provisioning Admission Check Controller

An admission check controller providing kueue integration with cluster autoscaler.

The Provisioning Admission Check Controller is an Admission Check Controller designed to integrate Kueue with Kubernetes cluster-autoscaler. Its primary function is to create ProvisioningRequests for the workloads holding Quota Reservation and keeping the AdmissionCheckState in sync.

The controller is part of kueue. You can enable it by setting the ProvisioningACC feature gate. Check the Installation guide for details on feature gate configuration.

The Provisioning Admission Check Controller is supported on Kubernetes cluster-autoscaler versions 1.29 and later. However, some cloud-providers may not have an implementation for it.

Parameters

This controller uses a ProvisioningRequestConfig as parameters, like:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ProvisioningRequestConfig
metadata:
  name: prov-test-config
spec:
  provisioningClassName: queued-provisioning.gke.io
  managedResources:
  - nvidia.com/gpu

Where:

  • provisioningClassName - describes the different modes of provisioning the resources. Check autoscaling.x-k8s.io ProvisioningRequestSpec.provisioningClassName for details.
  • managedResources - contains the list of resources managed by the autoscaling.

Check the API definition for more details.

Example

Setup

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "cluster-queue"
spec:
  namespaceSelector: {} # match all.
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 9
      - name: "memory"
        nominalQuota: 36Gi
      - name: "nvidia.com/gpu"
        nominalQuota: 9
  admissionChecks:
  - sample-prov
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: "default"
  name: "user-queue"
spec:
  clusterQueue: "cluster-queue"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
  name: sample-prov
spec:
  controllerName: kueue.x-k8s.io/provisioning-request
  parameters:
    apiGroup: kueue.x-k8s.io
    kind: ProvisioningRequestConfig
    name: prov-test-config
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ProvisioningRequestConfig
metadata:
  name: prov-test-config
spec:
  provisioningClassName: queued-provisioning.gke.io
  managedResources:
  - nvidia.com/gpu

Job using a ProvisioningRequest

apiVersion: batch/v1
kind: Job
metadata:
  generateName: sample-job-
  namespace: default
  labels:
    kueue.x-k8s.io/queue-name: user-queue
spec:
  parallelism: 3
  completions: 3
  suspend: true
  template:
    spec:
      tolerations:
      - key: "nvidia.com/gpu"
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - name: dummy-job
        image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
        args: ["120s"]
        resources:
          requests:
            cpu: "100m"
            memory: "100Mi"
            nvidia.com/gpu: 1
          limits:
            nvidia.com/gpu: 1
      restartPolicy: Never

Last modified March 15, 2024: fix website broken links (#1859) (82e1c07)