运行 RayService
在 Kueue 上运行 RayService 的指南。
本页演示如何利用 Kueue 的调度与资源管理能力运行 RayService 。
Kueue 通过为 RayService 创建的 RayCluster 来管理 RayService。
因此,RayService 需要在 metadata.labels 中包含 kueue.x-k8s.io/queue-name: user-queue 标签,该标签会被传递到相应的 RayCluster,以触发 Kueue 的管理。
本指南面向对 Kueue 有基本了解的、对外提供服务的用户。 更多信息,请参见 Kueue 概览。
开始之前
请确保你使用的是 Kueue v0.6.0 版本或更高版本,以及 KubeRay v1.3.0 或更高版本。
请参见 管理集群配额了解初始 Kueue 设置的详细信息。
请参见 KubeRay 安装说明了解 KubeRay 的安装和配置详情。
注意
RayService 通过 RayCluster 由 Kueue 管理; 在 v0.8.1 之前,你需要在完成安装后重启 Kueue 才能使用 RayCluster。你可以通过运行kubectl delete pods -l control-plane=controller-manager -n kueue-system 来完成此操作。RayService 定义
在 Kueue 上运行 RayService 时,请考虑以下方面:
a. 队列选择
目标 本地队列应在 RayService 配置的 metadata.labels
部分指定,该标签会被传递到其 RayCluster。
metadata:
labels:
kueue.x-k8s.io/queue-name: user-queue
b. 配置资源需求
工作负载的资源需求可以在 spec.rayClusterConfig 中配置。
spec:
rayClusterConfig:
headGroupSpec:
template:
spec:
containers:
- resources:
requests:
cpu: "1"
workerGroupSpecs:
- template:
spec:
containers:
- resources:
requests:
cpu: "1"
c. 限制事项
- 有限的 Worker Group:由于 Kueue 工作负载最多可以有 8 个 PodSet,
所以
spec.rayClusterConfig.workerGroupSpecs的最大数量为 7。 - 内建自动扩缩禁用:Kueue 管理 RayService 的资源分配,因此,集群的内部自动扩缩机制需要禁用。
RayService 示例
RayService 如下所示:
apiVersion: ray.io/v1
kind: RayService
metadata:
name: test-rayservice
namespace: default
labels:
kueue.x-k8s.io/queue-name: user-queue
spec:
# serveConfigV2 takes a yaml multi-line scalar, which should be a Ray Serve multi-application config. See https://docs.ray.io/en/latest/serve/multi-app.html.
serveConfigV2: |
applications:
- name: fruit_app
import_path: fruit.deployment_graph
route_prefix: /fruit
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip"
deployments:
- name: MangoStand
num_replicas: 2
max_replicas_per_node: 1
user_config:
price: 3
ray_actor_options:
num_cpus: 0.1
- name: OrangeStand
num_replicas: 1
user_config:
price: 2
ray_actor_options:
num_cpus: 0.1
- name: PearStand
num_replicas: 1
user_config:
price: 1
ray_actor_options:
num_cpus: 0.1
- name: FruitMarket
num_replicas: 1
ray_actor_options:
num_cpus: 0.1
- name: math_app
import_path: conditional_dag.serve_dag
route_prefix: /calc
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip"
deployments:
- name: Adder
num_replicas: 1
user_config:
increment: 3
ray_actor_options:
num_cpus: 0.1
- name: Multiplier
num_replicas: 1
user_config:
factor: 5
ray_actor_options:
num_cpus: 0.1
- name: Router
num_replicas: 1
rayClusterConfig:
rayVersion: '2.46.0' # should match the Ray version in the image of the containers
######################headGroupSpecs#################################
# Ray head pod template.
headGroupSpec:
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams: {}
#pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.46.0
resources:
limits:
cpu: 4
memory: 6Gi
requests:
cpu: 2
memory: 4Gi
workerGroupSpecs:
# the pod replicas in this group typed worker
- replicas: 1
minReplicas: 1
maxReplicas: 5
# logical group name, for this called small-group, also can be functional
groupName: small-group
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams: {}
#pod template
template:
spec:
containers:
- name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: rayproject/ray:2.46.0
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"反馈
这个页面有帮助吗?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.