Installation

Installing Kueue to a Kubernetes Cluster

Before you begin

Make sure the following conditions are met:

  • A Kubernetes cluster with version 1.25 or newer is running. Learn how to install the Kubernetes tools.
  • The SuspendJob feature gate is enabled. In Kubernetes 1.22 or newer, the feature gate is enabled by default.
  • (Optional) The JobMutableNodeSchedulingDirectives feature gate (available in Kubernetes 1.22 or newer) is enabled. In Kubernetes 1.23 or newer, the feature gate is enabled by default.
  • The kubectl command-line tool has communication with your cluster.

Kueue publishes metrics to monitor its operators. You can scrape these metrics with Prometheus. Use kube-prometheus if you don’t have your own monitoring system.

The webhook server in kueue uses an internal cert management for provisioning certificates. If you want to use a third-party one, e.g. cert-manager, follow these steps:

  1. Set internalCertManagement.enable to false in config file.
  2. Comment out the internalcert folder in config/default/kustomization.yaml.
  3. Enable cert-manager in config/default/kustomization.yaml and uncomment all sections with ‘CERTMANAGER’.

Install a released version

To install a released version of Kueue in your cluster, run the following command:

kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.1/manifests.yaml

To wait for Kueue to be fully available, run:

kubectl wait deploy/kueue-controller-manager -nkueue-system --for=condition=available --timeout=5m

Add metrics scraping for prometheus-operator

To allow prometheus-operator to scrape metrics from kueue components, run the following command:

kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.1/prometheus.yaml

Add API Priority and Fairness configuration for the visibility API

See Configure API Priority and Fairness for more details.

Uninstall

To uninstall a released version of Kueue from your cluster, run the following command:

kubectl delete -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.1/manifests.yaml

Install a custom-configured released version

To install a custom-configured released version of Kueue in your cluster, execute the following steps:

  1. Download the release’s manifests.yaml file:
wget https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.1/manifests.yaml
  1. With an editor of your preference, open manifests.yaml.
  2. In the kueue-manager-config ConfigMap manifest, edit the controller_manager_config.yaml data entry. The entry represents the default KueueConfiguration. The contents of the ConfigMap are similar to the following:
apiVersion: v1
kind: ConfigMap
metadata:
  name: kueue-manager-config
  namespace: kueue-system
data:
  controller_manager_config.yaml: |
    apiVersion: config.kueue.x-k8s.io/v1beta1
    kind: Configuration
    namespace: kueue-system
    health:
      healthProbeBindAddress: :8081
    metrics:
      bindAddress: :8443
      # enableClusterQueueResources: true
    webhook:
      port: 9443
    manageJobsWithoutQueueName: true
    internalCertManagement:
      enable: true
      webhookServiceName: kueue-webhook-service
      webhookSecretName: kueue-webhook-server-cert
    waitForPodsReady:
      enable: true
      timeout: 10m
    integrations:
      frameworks:
      - "batch/job"    

The integrations.externalFrameworks field is available in Kueue v0.7.0 and later.

  1. Apply the customized manifests to the cluster:
kubectl apply --server-side -f manifests.yaml

Install the latest development version

To install the latest development version of Kueue in your cluster, run the following command:

kubectl apply --server-side -k "github.com/kubernetes-sigs/kueue/config/default?ref=main"

The controller runs in the kueue-system namespace.

Uninstall

To uninstall Kueue, run the following command:

kubectl delete -k "github.com/kubernetes-sigs/kueue/config/default?ref=main"

Build and install from source

To build Kueue from source and install Kueue in your cluster, run the following commands:

git clone https://github.com/kubernetes-sigs/kueue.git
cd kueue
IMAGE_REGISTRY=registry.example.com/my-user make image-local-push deploy

Add metrics scraping for prometheus-operator

To allow prometheus-operator to scrape metrics from kueue components, run the following command:

make prometheus

Uninstall

To uninstall Kueue, run the following command:

make undeploy

Install via Helm

To install and configure Kueue with Helm, follow the instructions.

Change the feature gates configuration

Kueue uses a similar mechanism to configure features as described in Kubernetes Feature Gates.

In order to change the default of a feature, you need to edit the kueue-controller-manager deployment within the kueue installation namespace and change the manager container arguments to include

--feature-gates=...,<FeatureName>=<true|false>

For example, to enable PartialAdmission, you should change the manager deployment as follows:

kind: Deployment
...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: manager
        args:
        - --config=/controller_manager_config.yaml
        - --zap-log-level=2
+       - --feature-gates=PartialAdmission=true

The currently supported features are:

FeatureDefaultStageSinceUntil
FlavorFungibilitytrueBeta0.5
MultiKueuefalseAlpha0.60.8
MultiKueuetrueBeta0.9
MultiKueueBatchJobWithManagedByfalseAlpha0.8
PartialAdmissionfalseAlpha0.40.4
PartialAdmissiontrueBeta0.5
ProvisioningACCfalseAlpha0.50.6
ProvisioningACCtrueBeta0.7
QueueVisibilityfalseAlpha0.50.9
QueueVisibilityfalseDeprecated0.9
VisibilityOnDemandfalseAlpha0.60.8
VisibilityOnDemandtrueBeta0.9
PrioritySortingWithinCohorttrueBeta0.6
LendingLimitfalseAlpha0.60.8
LendingLimittrueBeta0.9
MultiplePreemptionsfalseAlpha0.80.8
MultiplePreemptionstrueBeta0.90.9
TopologyAwareSchedulingfalseAlpha0.9
ConfigurableResourceTransformationsfalseAlpha0.90.9
ConfigurableResourceTransformationstrueBeta0.10
WorkloadResourceRequestsSummaryfalseAlpha0.90.9
WorkloadResourceRequestsSummarytrueBeta0.10
AdmissionCheckValidationRulesfalseDeprecated0.90.9
KeepQuotaForProvReqRetryfalseDeprecated0.90.9
ManagedJobsNamespaceSelectortrueBeta0.10
LocalQueueDefaultingfalseAlpha0.10
LocalQueueMetricsfalseAlpha0.10

What’s next


Last modified January 16, 2025: Update metrics docs. (#3993) (9ef44a3f)