Installation

Installing Kueue to a Kubernetes Cluster

Before you begin

Make sure the following conditions are met:

  • A Kubernetes cluster with version 1.21 or newer is running. Learn how to install the Kubernetes tools.
  • The SuspendJob feature gate is enabled. In Kubernetes 1.22 or newer, the feature gate is enabled by default.
  • (Optional) The JobMutableNodeSchedulingDirectives feature gate (available in Kubernetes 1.22 or newer) is enabled. In Kubernetes 1.23 or newer, the feature gate is enabled by default.
  • The kubectl command-line tool has communication with your cluster.

Kueue publishes metrics to monitor its operators. You can scrape these metrics with Prometheus. Use kube-prometheus if you don’t have your own monitoring system.

The webhook server in kueue uses an internal cert management for provisioning certificates. If you want to use a third-party one, e.g. cert-manager, follow these steps:

  1. Set internalCertManagement.enable to false in config file.
  2. Comment out the internalcert folder in config/default/kustomization.yaml.
  3. Enable cert-manager in config/default/kustomization.yaml and uncomment all sections with ‘CERTMANAGER’.

Install a released version

To install a released version of Kueue in your cluster, run the following command:

VERSION=v0.6.2
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml

Add metrics scraping for prometheus-operator

Available in Kueue v0.2.1 and later

To allow prometheus-operator to scrape metrics from kueue components, run the following command:

kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/prometheus.yaml

Add visibility API to monitor pending workloads

Available in Kueue v0.6.0 and later

To add the visibility API that enables monitoring pending workloads, change the feature gates configuration and set VisibilityOnDemand=true, and run the following command

kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/visibility-api.yaml

See the visibility API for more details.

Uninstall

To uninstall a released version of Kueue from your cluster, run the following command:

VERSION=v0.6.2
kubectl delete -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml

Install a custom-configured released version

To install a custom-configured released version of Kueue in your cluster, execute the following steps:

  1. Download the release’s manifests.yaml file:

    VERSION=v0.6.2
    wget https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
    
  2. With an editor of your preference, open manifests.yaml.

  3. In the kueue-manager-config ConfigMap manifest, edit the controller_manager_config.yaml data entry. The entry represents the default Kueue Configuration struct (v1beta1@main). The contents of the ConfigMap are similar to the following:

The namespace and internalCertManagement fields are available in Kueue v0.3.0 and later

apiVersion: v1
kind: ConfigMap
metadata:
  name: kueue-manager-config
  namespace: kueue-system
data:
  controller_manager_config.yaml: |
    apiVersion: config.kueue.x-k8s.io/v1beta1
    kind: Configuration
    namespace: kueue-system
    health:
      healthProbeBindAddress: :8081
    metrics:
      bindAddress: :8080
      # enableClusterQueueResources: true
    webhook:
      port: 9443
    manageJobsWithoutQueueName: true
    internalCertManagement:
      enable: true
      webhookServiceName: kueue-webhook-service
      webhookSecretName: kueue-webhook-server-cert
    waitForPodsReady:
      enable: true
      timeout: 10m
    # pprofBindAddress: :8082
    integrations:
      frameworks:
      - "batch/job"
    # - "kubeflow.org/mpijob"
    # - "ray.io/rayjob"    

The namespace, waitForPodsReady, and internalCertManagement fields are available in Kueue v0.3.0 and later

Note See Sequential Admission with Ready Pods to learn more about using waitForPodsReady for Kueue.

  1. Apply the customized manifests to the cluster:
kubectl apply --server-side -f manifests.yaml

Install the latest development version

To install the latest development version of Kueue in your cluster, run the following command:

kubectl apply --server-side -k "github.com/kubernetes-sigs/kueue/config/default?ref=main"

The controller runs in the kueue-system namespace.

Uninstall

To uninstall Kueue, run the following command:

kubectl delete -k "github.com/kubernetes-sigs/kueue/config/default?ref=main"

Build and install from source

To build Kueue from source and install Kueue in your cluster, run the following commands:

git clone https://github.com/kubernetes-sigs/kueue.git
cd kueue
IMAGE_REGISTRY=registry.example.com/my-user make image-local-push deploy

Add metrics scraping for prometheus-operator

Available in Kueue v0.2.0 and later

To allow prometheus-operator to scrape metrics from kueue components, run the following command:

make prometheus

Uninstall

To uninstall Kueue, run the following command:

make undeploy

Install via Helm

To install and configure Kueue with Helm, follow the instructions.

Change the feature gates configuration

Kueue uses a similar mechanism to configure features as described in Kubernetes Feature Gates.

In order to change the default of a feature, you need to edit the kueue-controller-manager deployment within the kueue installation namespace and change the manager container arguments to include

--feature-gates=...,<FeatureName>=<true|false>

For example, to enable PartialAdmission, you should change the manager deployment as follows:

kind: Deployment
...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: manager
        args:
        - --config=/controller_manager_config.yaml
        - --zap-log-level=2
+       - --feature-gates=PartialAdmission=true

The currently supported features are:

FeatureDefaultStageSinceUntil
FlavorFungibilitytruebeta0.5
MultiKueuefalseAlpha0.6
PartialAdmissionfalseAlpha0.40.4
PartialAdmissiontrueBeta0.5
ProvisioningACCfalseAlpha0.5
QueueVisibilityfalseAlpha0.5
VisibilityOnDemandfalseAlpha0.6
PrioritySortingWithinCohorttrueBeta0.6
LendingLimitfalseAlpha0.6

What’s next