Setup Dev Monitoring

Set up Prometheus for development and debugging

This page shows how to set up Prometheus for development, debugging, and testing Kueue metrics.

The page is intended for a platform developer.

Before you begin

Make sure the following conditions are met:

1. Install kube-prometheus

From a scratch directory outside the Kueue repository, install kube-prometheus:

git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
kubectl apply --server-side -f manifests/setup
kubectl wait --for condition=Established --all CustomResourceDefinition --namespace=monitoring
kubectl apply -f manifests/
kubectl wait --for=condition=Ready pods --all -n monitoring --timeout=300s

2. Enable Kueue metrics scraping

Apply the Kueue ServiceMonitor:

VERSION=v0.16.1
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/${VERSION}/prometheus.yaml

Alternatively, if you’re working from a Kueue source checkout, use:

make prometheus

3. Generate test data

Create a ClusterQueue and LocalQueue:

kubectl apply -f https://kueue.sigs.k8s.io/examples/admin/single-clusterqueue-setup.yaml

Submit test jobs:

for i in {1..5}; do
  kubectl create -f https://kueue.sigs.k8s.io/examples/jobs/sample-job.yaml
done

4. Verify metrics

Port-forward to the Prometheus service:

kubectl -n monitoring port-forward svc/prometheus-k8s 9090:9090

Check that Prometheus is scraping Kueue:

curl -s 'http://localhost:9090/api/v1/targets' | jq '.data.activeTargets[] | select(.labels.job | contains("kueue"))'

You should see output like:

{
  "labels": {
    "job": "kueue-controller-manager-metrics-service",
    ...
  },
  "health": "up",
  ...
}

Open http://localhost:9090 in your browser and try a query:

kueue_admitted_workloads_total

For Grafana access, see the kube-prometheus documentation.

5. Enable optional metrics

To enable resource-level metrics like kueue_cluster_queue_resource_usage, edit the Kueue configuration:

kubectl edit configmap kueue-manager-config -n kueue-system

Add enableClusterQueueResources: true under the metrics section:

metrics:
  bindAddress: :8443
  enableClusterQueueResources: true

Restart Kueue:

kubectl rollout restart deployment/kueue-controller-manager -n kueue-system

Verify the optional metrics are available:

kueue_cluster_queue_nominal_quota

See Prometheus Metrics for the full list of optional metrics.

What’s next