设置 MultiKueue 环境

设置 MultiKueue 集群所需的额外步骤。

本教程解释了如何在 MultiKueue 环境中配置管理集群和一个工作集群来运行 JobSetsbatch/Jobs

请查看概念部分了解 MultiKueue 概述

假设您的管理集群名为 manager-cluster,工作集群名为 worker1-cluster。 要遵循本教程,请确保所有这些集群的凭据都存在于您本地机器的 kubeconfig 中。 查看 kubectl 文档 了解更多关于如何配置多集群访问的信息。

在工作集群中

当 MultiKueue 将工作负载从管理集群分发到工作集群时,它期望作业的命名空间和 LocalQueue 也存在于工作集群中。 换句话说,您应该确保工作集群配置在命名空间和 LocalQueues 方面与管理集群的配置保持一致。

要在 default 命名空间中创建示例队列设置,您可以应用以下清单:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "cluster-queue"
spec:
  namespaceSelector: {} # match all.
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 9
      - name: "memory"
        nominalQuota: 36Gi
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: "default"
  name: "user-queue"
spec:
  clusterQueue: "cluster-queue"

MultiKueue 专用 kubeconfig

为了在工作集群中委托 Job,管理集群需要能够创建、删除和监视工作负载及其父 Job。

kubectl 设置为使用工作集群时,下载:

#!/bin/bash

# Copyright 2024 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -o errexit
set -o nounset
set -o pipefail

KUBECONFIG_OUT=${1:-kubeconfig}
MULTIKUEUE_SA=multikueue-sa
NAMESPACE=kueue-system

# Creating a restricted MultiKueue role, service account and role binding"
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ${MULTIKUEUE_SA}
  namespace: ${NAMESPACE}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: ${MULTIKUEUE_SA}-role
rules:
- apiGroups:
  - batch
  resources:
  - jobs
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - jobs/status
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - jobset.x-k8s.io
  resources:
  - jobsets
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - jobset.x-k8s.io
  resources:
  - jobsets/status
  verbs:
  - get
- apiGroups:
  - kueue.x-k8s.io
  resources:
  - workloads
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - kueue.x-k8s.io
  resources:
  - workloads/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - kubeflow.org
  resources:
  - tfjobs
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - kubeflow.org
  resources:
  - tfjobs/status
  verbs:
  - get
- apiGroups:
  - kubeflow.org
  resources:
  - paddlejobs
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - kubeflow.org
  resources:
  - paddlejobs/status
  verbs:
  - get
- apiGroups:
  - kubeflow.org
  resources:
  - pytorchjobs
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - kubeflow.org
  resources:
  - pytorchjobs/status
  verbs:
  - get
- apiGroups:
  - kubeflow.org
  resources:
  - xgboostjobs
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - kubeflow.org
  resources:
  - xgboostjobs/status
  verbs:
  - get
- apiGroups:
  - kubeflow.org
  resources:
  - mpijobs
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - kubeflow.org
  resources:
  - mpijobs/status
  verbs:
  - get
- apiGroups:
  - ray.io
  resources:
  - rayjobs
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - ray.io
  resources:
  - rayjobs/status
  verbs:
  - get
- apiGroups:
  - ray.io
  resources:
  - rayclusters
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - ray.io
  resources:
  - rayclusters/status
  verbs:
  - get
- apiGroups:
  - workload.codeflare.dev
  resources:
  - appwrappers
  verbs:
  - create
  - delete
  - get
  - list
  - watch
- apiGroups:
  - workload.codeflare.dev
  resources:
  - appwrappers/status
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: ${MULTIKUEUE_SA}-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: ${MULTIKUEUE_SA}-role
subjects:
- kind: ServiceAccount
  name: ${MULTIKUEUE_SA}
  namespace: ${NAMESPACE}
EOF

# Get or create a secret bound to the new service account.
SA_SECRET_NAME=$(kubectl get -n ${NAMESPACE} sa/${MULTIKUEUE_SA} -o "jsonpath={.secrets[0]..name}")
if [ -z "$SA_SECRET_NAME" ]
then
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: ${MULTIKUEUE_SA}
  namespace: ${NAMESPACE}
  annotations:
    kubernetes.io/service-account.name: "${MULTIKUEUE_SA}"
EOF

SA_SECRET_NAME=${MULTIKUEUE_SA}
fi

# Note: service account token is stored base64-encoded in the secret but must
# be plaintext in kubeconfig.
SA_TOKEN=$(kubectl get -n ${NAMESPACE} "secrets/${SA_SECRET_NAME}" -o "jsonpath={.data['token']}" | base64 -d)
CA_CERT=$(kubectl get -n ${NAMESPACE} "secrets/${SA_SECRET_NAME}" -o "jsonpath={.data['ca\.crt']}")

# Extract cluster IP from the current context
CURRENT_CONTEXT=$(kubectl config current-context)
CURRENT_CLUSTER=$(kubectl config view -o jsonpath="{.contexts[?(@.name == \"${CURRENT_CONTEXT}\"})].context.cluster}")
CURRENT_CLUSTER_ADDR=$(kubectl config view -o jsonpath="{.clusters[?(@.name == \"${CURRENT_CLUSTER}\"})].cluster.server}")

# Create the Kubeconfig file
echo "Writing kubeconfig in ${KUBECONFIG_OUT}"
cat > "${KUBECONFIG_OUT}" <<EOF
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: ${CA_CERT}
    server: ${CURRENT_CLUSTER_ADDR}
  name: ${CURRENT_CLUSTER}
contexts:
- context:
    cluster: ${CURRENT_CLUSTER}
    user: ${CURRENT_CLUSTER}-${MULTIKUEUE_SA}
  name: ${CURRENT_CONTEXT}
current-context: ${CURRENT_CONTEXT}
kind: Config
preferences: {}
users:
- name: ${CURRENT_CLUSTER}-${MULTIKUEUE_SA}
  user:
    token: ${SA_TOKEN}
EOF

然后运行:

chmod +x create-multikueue-kubeconfig.sh
./create-multikueue-kubeconfig.sh worker1.kubeconfig

这将创建一个 kubeconfig,可以在管理集群中使用它来委托当前工作集群中的 Job。

Kubeflow 安装

在工作集群中安装 Kubeflow Trainer(有关更多详细信息,请参阅 Kubeflow Trainer 安装)。请使用 v1.7.0 或更高版本以支持 MultiKueue。

在管理集群中

CRD 安装

有关与 MultiKueue 兼容的 CRD 安装,请参阅专用页面这里

创建工作集群的 Kubeconfig 密钥

对于下一个示例,将 worker1 集群的 Kubeconfig 存储在名为 worker1.kubeconfig 的文件中,您可以通过运行以下命令创建 worker1-secret 密钥:

 kubectl create secret generic worker1-secret -n kueue-system --from-file=kubeconfig=worker1.kubeconfig

有关 kubeconfig 生成的详细信息,请查看工作集群部分。

创建示例设置

应用以下配置来创建一个示例设置,其中提交到 ClusterQueue cluster-queue 的 Job 被委托给工作集群 worker1

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "cluster-queue"
spec:
  namespaceSelector: {} # match all.
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 9
      - name: "memory"
        nominalQuota: 36Gi
  admissionChecks:
  - sample-multikueue
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: "default"
  name: "user-queue"
spec:
  clusterQueue: "cluster-queue"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
  name: sample-multikueue
spec:
  controllerName: kueue.x-k8s.io/multikueue
  parameters:
    apiGroup: kueue.x-k8s.io
    kind: MultiKueueConfig
    name: multikueue-test
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: MultiKueueConfig
metadata:
  name: multikueue-test
spec:
  clusters:
  - multikueue-test-worker1
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: MultiKueueCluster
metadata:
  name: multikueue-test-worker1
spec:
  kubeConfig:
    locationType: Secret
    location: worker1-secret
    # a secret called "worker1-secret" should be created in the namespace the kueue
    # controller manager runs into, holding the kubeConfig needed to connect to the
    # worker cluster in the "kubeconfig" key;

配置成功后,创建的 ClusterQueue、AdmissionCheck 和 MultiKueueCluster 将变为活跃状态。

运行:

kubectl get clusterqueues cluster-queue -o jsonpath="{range .status.conditions[?(@.type == \"Active\")]}CQ - Active: {@.status} Reason: {@.reason} Message: {@.message}{'\n'}{end}"
kubectl get admissionchecks sample-multikueue -o jsonpath="{range .status.conditions[?(@.type == \"Active\")]}AC - Active: {@.status} Reason: {@.reason} Message: {@.message}{'\n'}{end}"
kubectl get multikueuecluster multikueue-test-worker1 -o jsonpath="{range .status.conditions[?(@.type == \"Active\")]}MC - Active: {@.status} Reason: {@.reason} Message: {@.message}{'\n'}{end}"

期望输出如下:

CQ - Active: True Reason: Ready Message: Can admit new workloads
AC - Active: True Reason: Active Message: The admission check is active
MC - Active: True Reason: Active Message: Connected

(可选)使用 Open Cluster Management 设置 MultiKueue

Open Cluster Management (OCM) 是一个专注于 Kubernetes 应用程序多集群和多云场景的社区驱动项目。 它提供了一个强大、模块化和可扩展的框架,帮助其他开源项目跨多个集群编排、调度和管理工作负载。

与 OCM 的集成是一个可选的解决方案,它使 Kueue 用户能够简化 MultiKueue 设置过程,自动化生成 MultiKueue 专用的 Kubeconfig,并增强多集群调度能力。 有关此解决方案的更多详细信息,请参阅此链接