Kueue Configuration API

Generated API reference documentation for Kueue Configuration.

Resource Types

ClientConnection

Appears in:

FieldDescription
qps [Required]
float32

QPS controls the number of queries per second allowed for K8S api server connection.

burst [Required]
int32

Burst allows extra queries to accumulate when a client is exceeding its rate.

ClusterQueueVisibility

Appears in:

FieldDescription
maxCount [Required]
int32

MaxCount indicates the maximal number of pending workloads exposed in the cluster queue status. When the value is set to 0, then ClusterQueue visibility updates are disabled. The maximal value is 4000. Defaults to 10.

Configuration

Configuration is the Schema for the kueueconfigurations API

FieldDescription
namespace [Required]
string

Namespace is the namespace in which kueue is deployed. It is used as part of DNSName of the webhook Service. If not set, the value is set from the file /var/run/secrets/kubernetes.io/serviceaccount/namespace If the file doesn't exist, default value is kueue-system.

ControllerManager [Required]
ControllerManager
(Members of ControllerManager are embedded into this type.)

ControllerManager returns the configurations for controllers

manageJobsWithoutQueueName [Required]
bool

ManageJobsWithoutQueueName controls whether or not Kueue reconciles jobs that don't set the annotation kueue.x-k8s.io/queue-name. If set to true, then those jobs will be suspended and never started unless they are assigned a queue and eventually admitted. This also applies to jobs created before starting the kueue controller. Defaults to false; therefore, those jobs are not managed and if they are created unsuspended, they will start immediately.

internalCertManagement [Required]
InternalCertManagement

InternalCertManagement is configuration for internalCertManagement

waitForPodsReady [Required]
WaitForPodsReady

WaitForPodsReady is configuration to provide a time-based all-or-nothing scheduling semantics for Jobs, by ensuring all pods are ready (running and passing the readiness probe) within the specified time. If the timeout is exceeded, then the workload is evicted.

clientConnection [Required]
ClientConnection

ClientConnection provides additional configuration options for Kubernetes API server client.

integrations [Required]
Integrations

Integrations provide configuration options for AI/ML/Batch frameworks integrations (including K8S job).

queueVisibility [Required]
QueueVisibility

QueueVisibility is configuration to expose the information about the top pending workloads. Deprecated: This field will be removed on v1beta2, use VisibilityOnDemand (https://kueue.sigs.k8s.io/docs/tasks/manage/monitor_pending_workloads/pending_workloads_on_demand/) instead.

multiKueue [Required]
MultiKueue

MultiKueue controls the behaviour of the MultiKueue AdmissionCheck Controller.

fairSharing [Required]
FairSharing

FairSharing controls the fair sharing semantics across the cluster.

resources [Required]
Resources

Resources provides additional configuration options for handling the resources.

ControllerConfigurationSpec

Appears in:

ControllerConfigurationSpec defines the global configuration for controllers registered with the manager.

FieldDescription
groupKindConcurrency
map[string]int

GroupKindConcurrency is a map from a Kind to the number of concurrent reconciliation allowed for that controller.

When a controller is registered within this manager using the builder utilities, users have to specify the type the controller reconciles in the For(...) call. If the object's kind passed matches one of the keys in this map, the concurrency for that controller is set to the number specified.

The key is expected to be consistent in form with GroupKind.String(), e.g. ReplicaSet in apps group (regardless of version) would be ReplicaSet.apps.

cacheSyncTimeout
time.Duration

CacheSyncTimeout refers to the time limit set to wait for syncing caches. Defaults to 2 minutes if not set.

ControllerHealth

Appears in:

ControllerHealth defines the health configs.

FieldDescription
healthProbeBindAddress
string

HealthProbeBindAddress is the TCP address that the controller should bind to for serving health probes It can be set to "0" or "" to disable serving the health probe.

readinessEndpointName
string

ReadinessEndpointName, defaults to "readyz"

livenessEndpointName
string

LivenessEndpointName, defaults to "healthz"

ControllerManager

Appears in:

FieldDescription
webhook
ControllerWebhook

Webhook contains the controllers webhook configuration

leaderElection
k8s.io/component-base/config/v1alpha1.LeaderElectionConfiguration

LeaderElection is the LeaderElection config to be used when configuring the manager.Manager leader election

metrics
ControllerMetrics

Metrics contains the controller metrics configuration

health
ControllerHealth

Health contains the controller health configuration

pprofBindAddress
string

PprofBindAddress is the TCP address that the controller should bind to for serving pprof. It can be set to "" or "0" to disable the pprof serving. Since pprof may contain sensitive information, make sure to protect it before exposing it to public.

controller
ControllerConfigurationSpec

Controller contains global configuration options for controllers registered within this manager.

ControllerMetrics

Appears in:

ControllerMetrics defines the metrics configs.

FieldDescription
bindAddress
string

BindAddress is the TCP address that the controller should bind to for serving prometheus metrics. It can be set to "0" to disable the metrics serving.

enableClusterQueueResources
bool

EnableClusterQueueResources, if true the cluster queue resource usage and quotas metrics will be reported.

ControllerWebhook

Appears in:

ControllerWebhook defines the webhook server for the controller.

FieldDescription
port
int

Port is the port that the webhook server serves at. It is used to set webhook.Server.Port.

host
string

Host is the hostname that the webhook server binds to. It is used to set webhook.Server.Host.

certDir
string

CertDir is the directory that contains the server key and certificate. if not set, webhook server would look up the server key and certificate in {TempDir}/k8s-webhook-server/serving-certs. The server key and certificate must be named tls.key and tls.crt, respectively.

FairSharing

Appears in:

FieldDescription
enable [Required]
bool

enable indicates whether to enable fair sharing for all cohorts. Defaults to false.

preemptionStrategies [Required]
[]PreemptionStrategy

preemptionStrategies indicates which constraints should a preemption satisfy. The preemption algorithm will only use the next strategy in the list if the incoming workload (preemptor) doesn't fit after using the previous strategies. Possible values are:

  • LessThanOrEqualToFinalShare: Only preempt a workload if the share of the preemptor CQ with the preemptor workload is less than or equal to the share of the preemptee CQ without the workload to be preempted. This strategy might favor preemption of smaller workloads in the preemptee CQ, regardless of priority or start time, in an effort to keep the share of the CQ as high as possible.
  • LessThanInitialShare: Only preempt a workload if the share of the preemptor CQ with the incoming workload is strictly less than the share of the preemptee CQ. This strategy doesn't depend on the share usage of the workload being preempted. As a result, the strategy chooses to preempt workloads with the lowest priority and newest start time first. The default strategy is ["LessThanOrEqualToFinalShare", "LessThanInitialShare"].

Integrations

Appears in:

FieldDescription
frameworks [Required]
[]string

List of framework names to be enabled. Possible options:

  • "batch/job"
  • "kubeflow.org/mpijob"
  • "ray.io/rayjob"
  • "ray.io/raycluster"
  • "jobset.x-k8s.io/jobset"
  • "kubeflow.org/mxjob"
  • "kubeflow.org/paddlejob"
  • "kubeflow.org/pytorchjob"
  • "kubeflow.org/tfjob"
  • "kubeflow.org/xgboostjob"
  • "pod"
  • "deployment" (requires enabling pod integration)
  • "statefulset" (requires enabling pod integration)
externalFrameworks [Required]
[]string

List of GroupVersionKinds that are managed for Kueue by external controllers; the expected format is Kind.version.group.com.

podOptions [Required]
PodIntegrationOptions

PodOptions defines kueue controller behaviour for pod objects

labelKeysToCopy [Required]
[]string

labelKeysToCopy is a list of label keys that should be copied from the job into the workload object. It is not required for the job to have all the labels from this list. If a job does not have some label with the given key from this list, the constructed workload object will be created without this label. In the case of creating a workload from a composable job (pod group), if multiple objects have labels with some key from the list, the values of these labels must match or otherwise the workload creation would fail. The labels are copied only during the workload creation and are not updated even if the labels of the underlying job are changed.

InternalCertManagement

Appears in:

FieldDescription
enable [Required]
bool

Enable controls whether to enable internal cert management or not. Defaults to true. If you want to use a third-party management, e.g. cert-manager, set it to false. See the user guide for more information.

webhookServiceName [Required]
string

WebhookServiceName is the name of the Service used as part of the DNSName. Defaults to kueue-webhook-service.

webhookSecretName [Required]
string

WebhookSecretName is the name of the Secret used to store CA and server certs. Defaults to kueue-webhook-server-cert.

MultiKueue

Appears in:

FieldDescription
gcInterval
k8s.io/apimachinery/pkg/apis/meta/v1.Duration

GCInterval defines the time interval between two consecutive garbage collection runs. Defaults to 1min. If 0, the garbage collection is disabled.

origin
string

Origin defines a label value used to track the creator of workloads in the worker clusters. This is used by multikueue in components like its garbage collector to identify remote objects that ware created by this multikueue manager cluster and delete them if their local counterpart no longer exists.

workerLostTimeout
k8s.io/apimachinery/pkg/apis/meta/v1.Duration

WorkerLostTimeout defines the time a local workload's multikueue admission check state is kept Ready if the connection with its reserving worker cluster is lost.

Defaults to 15 minutes.

PodIntegrationOptions

Appears in:

FieldDescription
namespaceSelector [Required]
k8s.io/apimachinery/pkg/apis/meta/v1.LabelSelector

NamespaceSelector can be used to omit some namespaces from pod reconciliation

podSelector [Required]
k8s.io/apimachinery/pkg/apis/meta/v1.LabelSelector

PodSelector can be used to choose what pods to reconcile

PreemptionStrategy

(Alias of string)

Appears in:

QueueVisibility

Appears in:

FieldDescription
clusterQueues [Required]
ClusterQueueVisibility

ClusterQueues is configuration to expose the information about the top pending workloads in the cluster queue.

updateIntervalSeconds [Required]
int32

UpdateIntervalSeconds specifies the time interval for updates to the structure of the top pending workloads in the queues. The minimum value is 1. Defaults to 5.

RequeuingStrategy

Appears in:

FieldDescription
timestamp
RequeuingTimestamp

Timestamp defines the timestamp used for re-queuing a Workload that was evicted due to Pod readiness. The possible values are:

  • Eviction (default) indicates from Workload Evicted condition with PodsReadyTimeout reason.
  • Creation indicates from Workload .metadata.creationTimestamp.
backoffLimitCount
int32

BackoffLimitCount defines the maximum number of re-queuing retries. Once the number is reached, the workload is deactivated (.spec.activate=false). When it is null, the workloads will repeatedly and endless re-queueing.

Every backoff duration is about "b*2^(n-1)+Rand" where:

  • "b" represents the base set by "BackoffBaseSeconds" parameter,
  • "n" represents the "workloadStatus.requeueState.count",
  • "Rand" represents the random jitter. During this time, the workload is taken as an inadmissible and other workloads will have a chance to be admitted. By default, the consecutive requeue delays are around: (60s, 120s, 240s, ...).

Defaults to null.

backoffBaseSeconds
int32

BackoffBaseSeconds defines the base for the exponential backoff for re-queuing an evicted workload.

Defaults to 60.

backoffMaxSeconds
int32

BackoffMaxSeconds defines the maximum backoff time to re-queue an evicted workload.

Defaults to 3600.

RequeuingTimestamp

(Alias of string)

Appears in:

ResourceTransformation

Appears in:

FieldDescription
input [Required]
k8s.io/api/core/v1.ResourceName

Input is the name of the input resource.

strategy [Required]
ResourceTransformationStrategy

Strategy specifies if the input resource should be replaced or retained. Defaults to Retain

outputs [Required]
k8s.io/api/core/v1.ResourceList

Outputs specifies the output resources and quantities per unit of input resource. An empty Outputs combined with a Replace Strategy causes the Input resource to be ignored by Kueue.

ResourceTransformationStrategy

(Alias of string)

Appears in:

Resources

Appears in:

FieldDescription
excludeResourcePrefixes [Required]
[]string

ExcludedResourcePrefixes defines which resources should be ignored by Kueue

transformations [Required]
[]ResourceTransformation

Transformations defines how to transform PodSpec resources into Workload resource requests. This is intended to be a map with Input as the key (enforced by validation code)

WaitForPodsReady

Appears in:

WaitForPodsReady defines configuration for the Wait For Pods Ready feature, which is used to ensure that all Pods are ready within the specified time.

FieldDescription
enable [Required]
bool

Enable indicates whether to enable wait for pods ready feature. Defaults to false.

timeout
k8s.io/apimachinery/pkg/apis/meta/v1.Duration

Timeout defines the time for an admitted workload to reach the PodsReady=true condition. When the timeout is exceeded, the workload evicted and requeued in the same cluster queue. Defaults to 5min.

blockAdmission [Required]
bool

BlockAdmission when true, cluster queue will block admissions for all subsequent jobs until the jobs reach the PodsReady=true condition. This setting is only honored when Enable is set to true.

requeuingStrategy
RequeuingStrategy

RequeuingStrategy defines the strategy for requeuing a Workload.


Last modified October 23, 2024: StatefulSet integration (#3001) (e8df54d8)