Run MPIJobs in Multi-Cluster
Before you begin
Check the MultiKueue installation guide on how to properly setup MultiKueue clusters.
For the proper setup and use it is required using at least Kueue v0.9.0 and for MPI Operator at least v0.6.0.
Installation on the Clusters
Note
Note: While both MPI Operator and Training Operator must be running on the same cluster, there are special steps that has to be applied to Training Operator deployment. See Working alongside MPI Operator for more details.See MPI Operator Installation for installation and configuration details of MPI Operator.
MultiKueue integration
Once the setup is complete you can test it by running a MPIJob sample-mpijob.yaml
.
Note
Note: Kueue defaults the spec.runPolicy.managedBy
field to kueue.x-k8s.io/multikueue
on the management cluster for MPIJob.
This allows the MPI Operator to ignore the Jobs managed by MultiKueue on the management cluster, and in particular skip Pod creation.
The pods are created and the actual computation will happen on the mirror copy of the Job on the selected worker cluster. The mirror copy of the Job does not have the field set.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.