Common Grafana Queries

Common PromQL queries for monitoring Kueue in Grafana.

This page shows you how to use common PromQL queries to monitor Kueue metrics in Grafana.

The intended audience for this page are batch administrators.

Before you begin

Make sure the following conditions are met:

A Kubernetes cluster is running.
The kubectl command-line tool has communication with your cluster.
Kueue is installed.
kube-prometheus is installed.
Kueue Prometheus metrics are enabled (see Setup Prometheus).

Quota utilization

Note

The queries in this section require metrics.enableClusterQueueResources: true in the Kueue configuration. See Installation for details.

To monitor the percentage of CPU quota being used in a ClusterQueue:

(sum by (cluster_queue) (kueue_cluster_queue_resource_usage{resource="cpu"}))
/
(sum by (cluster_queue) (kueue_cluster_queue_nominal_quota{resource="cpu"}))
* 100

To see utilization broken down by resource in a ClusterQueue:

(sum by (cluster_queue, resource) (kueue_cluster_queue_resource_usage))
/
(sum by (cluster_queue, resource) (kueue_cluster_queue_nominal_quota))
* 100

To see the average CPU quota utilization over the last week in a ClusterQueue:

avg_over_time(
  (
    (sum by (cluster_queue) (kueue_cluster_queue_resource_usage{resource="cpu"}))
    /
    (sum by (cluster_queue) (kueue_cluster_queue_nominal_quota{resource="cpu"}))
    * 100
  )[1w:1h]
)

To find the top 5 ClusterQueues by CPU utilization:

topk(5,
  (sum by (cluster_queue) (kueue_cluster_queue_resource_usage{resource="cpu"}))
  /
  (sum by (cluster_queue) (kueue_cluster_queue_nominal_quota{resource="cpu"}))
  * 100
)

Pending workloads

To monitor the number of pending workloads per ClusterQueue:

sum by (cluster_queue) (kueue_pending_workloads{status="active"})

To see both active and inadmissible pending workloads per ClusterQueue:

sum by (cluster_queue, status) (kueue_pending_workloads)

Admission wait time

To monitor how long workloads wait before admission, use histogram percentile queries.

For the 95th percentile (P95) admission wait time:

histogram_quantile(0.95,
  sum by (le, cluster_queue) (
    rate(kueue_admission_wait_time_seconds_bucket[5m])
  )
)

For the 50th percentile (median):

histogram_quantile(0.50,
  sum by (le, cluster_queue) (
    rate(kueue_admission_wait_time_seconds_bucket[5m])
  )
)

For the 99th percentile (P99):

histogram_quantile(0.99,
  sum by (le, cluster_queue) (
    rate(kueue_admission_wait_time_seconds_bucket[5m])
  )
)

Workload throughput

To monitor how many workloads are being admitted per hour:

sum by (cluster_queue) (
  increase(kueue_admitted_workloads_total[1h])
)

To monitor finished workloads per hour:

sum by (cluster_queue) (
  increase(kueue_finished_workloads_total[1h])
)

To see the admission rate over time (workloads per minute):

sum by (cluster_queue) (
  rate(kueue_admitted_workloads_total[5m])
) * 60

Eviction rate

To monitor evictions per hour by reason:

sum by (cluster_queue, reason) (
  increase(kueue_evicted_workloads_total[1h])
)

See Prometheus Metrics for the full list of reason label values.

ClusterQueue status

To see which ClusterQueues are active:

kueue_cluster_queue_status{status="active"} == 1

To see ClusterQueues that are not active (pending or terminating):

kueue_cluster_queue_status{status!="active"} == 1

What’s next

See Pending Workloads in Grafana for visibility dashboards using the on-demand API.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified February 12, 2026: Move Grafana queries under observability and align index page format (#9168) (6848365a1)