Run A RayCluster

Run a RayCluster with Kueue.

This page shows how to leverage Kueue’s scheduling and resource management capabilities when running RayCluster.

This guide is for batch users that have a basic understanding of Kueue. For more information, see Kueue’s overview.

Before you begin

Make sure you are using Kueue v0.6.0 version or newer and KubeRay v1.1.0 or newer.
Check Administer cluster quotas for details on the initial Kueue setup.
See KubeRay Installation for installation and configuration details of KubeRay.

Note

In order to use RayCluster, prior to v0.8.1, you need to restart Kueue after the installation. You can do it by running: kubectl delete pods -l control-plane=controller-manager -n kueue-system.

RayCluster definition

When running RayClusters on Kueue, take into consideration the following aspects:

a. Queue selection

The target local queue should be specified in the metadata.labels section of the RayCluster configuration.

metadata:
  labels:
    kueue.x-k8s.io/queue-name: user-queue

b. Configure the resource needs

The resource needs of the workload can be configured in the spec.

spec:
  headGroupSpec:
    template:
      spec:
        containers:
          - resources:
              requests:
                cpu: "1"
  workerGroupSpecs:
    - template:
        spec:
          containers:
            - resources:
                requests:
                  cpu: "1"

Note that a RayCluster will hold resource quotas while it exists. For optimal resource management, you should delete a RayCluster that is no longer in use.

c. Suspend control

Kueue controls the spec.suspend field of the RayCluster. When a RayCluster is admitted by Kueue, Kueue will unsuspend it by setting spec.suspend to false, regardless of its previous value.

d. Limitations

Limited Worker Groups: Because a Kueue workload can have a maximum of 8 PodSets, the maximum number of spec.workerGroupSpecs is 7
In-Tree Autoscaling Constraints: Autoscaling is only supported for elastic RayCluster objects. To enable in-tree autoscaling:
1. Activate the ElasticJobsViaWorkloadSlices feature gate.
2. Annotate the RayCluster object with:
```
metadata:
  annotations:
    kueue.x-k8s.io/elastic-job: "true"
```
3. Enable the Ray autoscaler of your RayCluster object by setting:
```
spec:
  enableInTreeAutoscaling: true
```

Example RayCluster

The RayCluster looks like the following:

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  labels:
    kueue.x-k8s.io/queue-name: user-queue
    controller-tools.k8s.io: "1.0"
    # A unique identifier for the head node and workers of this cluster.
  name: raycluster-complete
spec:
  rayVersion: '2.9.0'
  # Ray head pod configuration
  headGroupSpec:
    # Kubernetes Service Type. This is an optional field, and the default value is ClusterIP.
    # Refer to https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types.
    serviceType: ClusterIP
    # The `rayStartParams` are used to configure the `ray start` command.
    # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
    # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
    rayStartParams:
      dashboard-host: '0.0.0.0'
    # pod template
    template:
      metadata:
        # Custom labels. NOTE: To avoid conflicts with KubeRay operator, do not define custom labels start with `raycluster`.
        # Refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
        labels: {}
      spec:
        containers:
          - name: ray-head
            image: rayproject/ray:2.9.0
            ports:
              - containerPort: 6379
                name: gcs
              - containerPort: 8265
                name: dashboard
              - containerPort: 10001
                name: client
            lifecycle:
              preStop:
                exec:
                  command: ["/bin/sh","-c","ray stop"]
            volumeMounts:
              - mountPath: /tmp/ray
                name: ray-logs
            # The resource requests and limits in this config are too small for production!
            # For an example with more realistic resource configuration, see
            # ray-cluster.autoscaler.large.yaml.
            # It is better to use a few large Ray pod than many small ones.
            # For production, it is ideal to size each Ray pod to take up the
            # entire Kubernetes node on which it is scheduled.
            resources:
              limits:
                cpu: "1"
                memory: "2G"
              requests:
                # For production use-cases, we recommend specifying integer CPU requests and limits.
                # We also recommend setting requests equal to limits for both CPU and memory.
                # For this example, we use a 500m CPU request to accommodate resource-constrained local
                # Kubernetes testing environments such as KinD and minikube.
                cpu: "1"
                memory: "2G"
        volumes:
          - name: ray-logs
            emptyDir: {}
  workerGroupSpecs:
    # the pod replicas in this group typed worker
    - replicas: 1
      minReplicas: 1
      maxReplicas: 10
      # logical group name, for this called small-group, also can be functional
      groupName: small-group
      # If worker pods need to be added, we can increment the replicas.
      # If worker pods need to be removed, we decrement the replicas, and populate the workersToDelete list.
      # The operator will remove pods from the list until the desired number of replicas is satisfied.
      # If the difference between the current replica count and the desired replicas is greater than the
      # number of entries in workersToDelete, random worker pods will be deleted.
      #scaleStrategy:
      #  workersToDelete:
      #  - raycluster-complete-worker-small-group-bdtwh
      #  - raycluster-complete-worker-small-group-hv457
      #  - raycluster-complete-worker-small-group-k8tj7
      # The `rayStartParams` are used to configure the `ray start` command.
      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
      rayStartParams: {}
      #pod template
      template:
        spec:
          containers:
            - name: ray-worker
              image: rayproject/ray:2.9.0
              lifecycle:
                preStop:
                  exec:
                    command: ["/bin/sh","-c","ray stop"]
              # use volumeMounts.Optional.
              # Refer to https://kubernetes.io/docs/concepts/storage/volumes/
              volumeMounts:
                - mountPath: /tmp/ray
                  name: ray-logs
              # The resource requests and limits in this config are too small for production!
              # For an example with more realistic resource configuration, see
              # ray-cluster.autoscaler.large.yaml.
              # It is better to use a few large Ray pod than many small ones.
              # For production, it is ideal to size each Ray pod to take up the
              # entire Kubernetes node on which it is scheduled.
              resources:
                limits:
                  cpu: "1"
                  memory: "1G"
                # For production use-cases, we recommend specifying integer CPU requests and limits.
                # We also recommend setting requests equal to limits for both CPU and memory.
                # For this example, we use a 500m CPU request to accommodate resource-constrained local
                # Kubernetes testing environments such as KinD and minikube.
                requests:
                  # For production use-cases, we recommend specifying integer CPU requests and limits.
                  # We also recommend setting requests equal to limits for both CPU and memory.
                  # For this example, we use a 500m CPU request to accommodate resource-constrained local
                  # Kubernetes testing environments such as KinD and minikube.
                  cpu: "1"
                  # For production use-cases, we recommend allocating at least 8GB memory for each Ray container.
                  memory: "1G"
          # use volumes
          # Refer to https://kubernetes.io/docs/concepts/storage/volumes/
          volumes:
            - name: ray-logs
              emptyDir: {}

You can submit a Ray Job using the CLI or log into the Ray Head and execute a job following this example with kind cluster.

Note

The example above comes from here and only has the queue-name label added and requests updated.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified February 23, 2026: docs: document suspend behavior for raycluster/rayservice/rayjob (#9426) (bbd942eac)