Setup Prometheus

Enable Prometheus metrics scraping for Kueue

This page shows how to set up Prometheus to scrape Kueue metrics. For TLS-secured metrics endpoints, see Configure Prometheus with TLS.

The page is intended for a batch administrator.

Before you begin

Make sure the following conditions are met:

  • A Kubernetes cluster is running.
  • The kubectl command-line tool has communication with your cluster.
  • Kueue is installed.
  • Prometheus Operator is installed.

1. Setup

Choose the setup method that matches your Kueue installation.

Option A: Helm

If you installed Kueue using Helm, enable Prometheus scraping in your values.yaml:

enablePrometheus: true

Then upgrade your Helm release:

helm upgrade kueue oci://registry.k8s.io/kueue/charts/kueue \
  --namespace kueue-system \
  -f values.yaml

Option B: Manifests

If you installed Kueue using kubectl with the release manifests, apply the Prometheus ServiceMonitor:

VERSION=v0.16.1
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/${VERSION}/prometheus.yaml

2. Verify metrics

  1. Check the ServiceMonitor is created:

    kubectl get servicemonitor -n kueue-system
    

    You should see kueue-controller-manager-metrics-monitor listed.

  2. In the Prometheus UI, go to Status > Target health (or navigate to /targets) and verify that kueue-system/kueue-controller-manager-metrics-monitor shows as UP.

  3. Run a test query in the Prometheus UI:

    kueue_admitted_workloads_total
    

    If Kueue has processed workloads, you should see data points for your ClusterQueues.

3. Enable optional metrics

By default, Kueue does not export resource-level metrics for ClusterQueues. To enable metrics like kueue_cluster_queue_resource_usage and kueue_cluster_queue_nominal_quota, set enableClusterQueueResources: true in the Kueue configuration.

Edit the kueue-manager-config ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kueue-manager-config
  namespace: kueue-system
data:
  controller_manager_config.yaml: |
    apiVersion: config.kueue.x-k8s.io/v1beta2
    kind: Configuration
    metrics:
      bindAddress: :8443
      enableClusterQueueResources: true
    # ... other configuration

Restart the controller to apply the changes:

kubectl rollout restart deployment/kueue-controller-manager -n kueue-system

See Prometheus Metrics for the full list of optional metrics.

What’s next