Prometheus Metrics

Prometheus metrics exported by Kueue

Kueue exposes prometheus metrics to monitor the health of the system and the status of ClusterQueues and LocalQueues.

Kueue health

Use the following metrics to monitor the health of the kueue controllers:

Metric name	Type	Description	Labels
`kueue_admission_attempts_total`	Counter	The total number of attempts to admit workloads. Each admission attempt might try to admit more than one workload.	`result`: possible values are `success` or `inadmissible`
`kueue_admission_attempt_duration_seconds`	Histogram	The latency of an admission attempt.	`result`: possible values are `success` or `inadmissible`

ClusterQueue status

Use the following metrics to monitor the status of your ClusterQueues:

Metric name	Type	Description	Labels
`kueue_pending_workloads`	Gauge	The number of pending workloads.	`cluster_queue`: the name of the ClusterQueue `status`: possible values are `active` or `inadmissible`
`kueue_quota_reserved_workloads_total`	Counter	The total number of quota reserved workloads.	`cluster_queue`: the name of the ClusterQueue
`kueue_quota_reserved_wait_time_seconds`	Histogram	The time between a workload was created or requeued until it got quota reservation.	`cluster_queue`: the name of the ClusterQueue
`kueue_admitted_workloads_total`	Counter	The total number of admitted workloads.	`cluster_queue`: the name of the ClusterQueue
`kueue_evicted_workloads_total`	Counter	The total number of evicted workloads.	`cluster_queue`: the name of the ClusterQueue `reason`: Possible values are `Preempted`, `PodsReadyTimeout`, `AdmissionCheck`, `ClusterQueueStopped` or `Deactivated`
`kueue_evicted_workloads_once_total`	Counter	The number of unique workload evictions per ‘cluster_queue’	`cluster_queue`: the name of the ClusterQueue `reason`: Possible values are `Preempted`, `PodsReadyTimeout`, `AdmissionCheck`, `ClusterQueueStopped` or `Deactivated` `detailedReason`: specifies a finer-grained explanation that complements the eviction cause
`kueue_admission_wait_time_seconds`	Histogram	The time between a workload was created or requeued until admission.	`cluster_queue`: the name of the ClusterQueue
`kueue_admission_checks_wait_time_seconds`	Histogram	The time from when a workload got the quota reservation until admission.	`cluster_queue`: the name of the ClusterQueue
`kueue_admitted_active_workloads`	Gauge	The number of admitted Workloads that are active (unsuspended and not finished)	`cluster_queue`: the name of the ClusterQueue
`kueue_cluster_queue_status`	Gauge	Reports the status of the ClusterQueue	`cluster_queue`: The name of the ClusterQueue `status`: Possible values are `pending`, `active` or `terminated`. For a ClusterQueue, the metric only reports a value of 1 for one of the statuses.
`kueue_reserving_active_workloads`	Gauge	The number of Workloads that are reserving quota, per `cluster_queue`.	`cluster_queue`: the name of the ClusterQueue
`kueue_admission_cycle_preemption_skips`	Gauge	The number of Workloads in the ClusterQueue that got preemption candidates but had to be skipped because other ClusterQueues needed the same resources in the same cycle	`cluster_queue`: the name of the ClusterQueue
`kueue_preempted_workloads_total`	Counter	The number of preempted workloads per `preempting_cluster_queue`	`preempting_cluster_queue`: the name of the ClusterQueue `reason`: possible values are `InClusterQueue` means that the workload was preempted by a workload in the same ClusterQueue; `InCohortReclamation` means that the workload was preempted by a workload in the same cohort due to reclamation of nominal quota; `InCohortFairSharing` means that the workload was preempted by a workload in the same cohort due to Fair Sharing; `InCohortReclaimWhileBorrowing` means that the workload was preempted by a workload in the same cohort due to reclamation of nominal quota while borrowing

LocalQueue Status (alpha)

The following metrics are available only if LocalQueueMetrics feature gate is enabled. Check the Change the feature gates configuration section of the Installation for details.

Metric Name	Type	Description	Labels
`kueue_local_queue_pending_workloads`	Gauge	The number of pending workloads, per `local_queue` and `status`.	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in `status`: can be either `active` for the number of active pending workloads or `inadmissible`
`kueue_local_queue_quota_reserved_workloads_total`	Counter	The number of workloads with quota reserved in a LocalQueue	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in
`kueue_local_queue_quota_reserved_wait_time_seconds`	Histogram	The time between a workload was created or requeued until it got quota reservation, per `local_queue`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in
`kueue_local_queue_admitted_workloads_total`	Counter	The total number of admitted workloads per `local_queue`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in
`kueue_local_queue_admission_checks_wait_time_seconds`	Histogram	The time from when a workload got the quota reservation until admission, per `local_queue`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in
`kueue_local_queue_admission_wait_time_seconds`	Histogram	The time between a workload was created or requeued until admission, per `local_queue`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in
`kueue_local_queue_evicted_workloads_total`	Counter	The number of evicted workloads per `local_queue`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in `reason`: the reason the workload was pre-empted. It can have the following values [“Preempted”, “PodsReadyTimeout”, “AdmissionCheck”, “ClusterQueueStopped”, “Deactivated”]
`kueue_local_queue_reserving_active_workloads`	Gauge	The number of Workloads that are reserving quota, per `localQueue`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in
`kueue_local_queue_admitted_active_workloads`	Gauge	The number of admitted Workloads that are active (unsuspended and not finished), per `localQueue`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in
`kueue_local_queue_status`	Gauge	Reports a LocalQueue’s `active` status (ability to schedule workloads)	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in `active`: one of [`True`, `False`, `Unknown`] and exclusively one is positive at any given time
`kueue_local_queue_resource_reservation`	Gauge	Reports the LocalQueue’s total resource usage within all the`flavors`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in `flavor`: the name of the flavor which resources are being consumed from `resource`: the resource which is being consumed
`kueue_local_queue_resource_usage`	Gauge	Reports the localQueue’s total resource reservation within all the `flavors`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in `flavor`: the name of the flavor which resources are being consumed from `resource`: the resource which is being consumed

Cohort Status

Metric name	Type	Description	Labels
`kueue_cohort_weighted_share`	Gauge	Reports a value that representing the maximum of the ratios of usage above nominal quota to the lendable resources in the Cohort, among all the resources provided by the Cohort, and divided by the weight. If zero, it means that the usage of the Cohort is below the nominal quota. If the Cohort has a weight of zero, this will return 9223372036854775807, the maximum possible share value.	`cohort`: The name of the Cohort

Optional metrics

The following metrics are available only if metrics.enableClusterQueueResources is enabled in the manager’s configuration.

Metric name	Type	Description	Labels
`kueue_cluster_queue_resource_reservation`	Gauge	Reports the cluster_queue’s total resource reservation within all the flavors	`cohort`: The cohort in which the queue belongs `cluster_queue`: The name of the ClusterQueue `flavor`: referenced flavor `resource`: The resource name
`kueue_cluster_queue_resource_usage`	Gauge	Reports the ClusterQueue’s total resource usage	`cohort`: The cohort in which the queue belongs `cluster_queue`: The name of the ClusterQueue `flavor`: referenced flavor `resource`: The resource name
`kueue_cluster_queue_nominal_quota`	Gauge	Reports the ClusterQueue’s resource quota	`cohort`: The cohort in which the queue belongs `cluster_queue`: The name of the ClusterQueue `flavor`: referenced flavor `resource`: The resource name
`kueue_cluster_queue_borrowing_limit`	Gauge	Reports the ClusterQueue’s resource borrowing limit	`cohort`: The cohort in which the queue belongs `cluster_queue`: The name of the ClusterQueue `flavor`: referenced flavor `resource`: The resource name
`kueue_cluster_queue_lending_limit`	Gauge	Reports the cluster_queue’s resource lending limit within all the flavors	`cohort`: The cohort in which the queue belongs `cluster_queue`: The name of the ClusterQueue `flavor`: referenced flavor `resource`: The resource name
`kueue_cluster_queue_weighted_share`	Gauge	Reports a value that representing the maximum of the ratios of usage above nominal quota to the lendable resources in the cohort, among all the resources provided by the ClusterQueue.	`cluster_queue`: The name of the ClusterQueue

The following metrics are available only if waitForPodsReady is enabled in the manager’s configuration. For more details see.

Metric name	Type	Description	Labels
`kueue_ready_wait_time_seconds`	Histogram	The time between a workload was created or requeued until ready.	`cluster_queue`: the name of the ClusterQueue
`kueue_admitted_until_ready_wait_time_seconds`	Histogram	The time between a workload was admitted until ready.	`cluster_queue`: the name of the ClusterQueue
`kueue_local_queue_ready_wait_time_seconds`	Histogram	The time between a workload was created or requeued until ready, per `local_queue`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in
`kueue_local_queue_admitted_until_ready_wait_time_seconds`	Histogram	The time between a workload was admitted until ready, per `local_queue`	`name`: the name of the LocalQueue `namespace`: the namespace that the LocalQueue resides in

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified May 20, 2025: Create a metric to track the number of workloads (#5259) (ad5f00a9)