Run a SparkApplication

Run a Kueue scheduled SparkApplication
Feature state alpha since Kueue v0.17

This page shows how to leverage Kueue’s scheduling and resource management capabilities when running Spark Operator SparkApplication.

This guide is for batch users that have a basic understanding of Kueue. For more information, see Kueue’s overview.

Before you begin

Enable SparkApplication integration in Kueue. You can modify Kueue configurations from installed releases to include sparkoperator.k8s.io/sparkapplication as an allowed workload.

Enable the SparkApplicationIntegration feature gate. Check the installation guide for details on feature gate configuration.

Check administer cluster quotas for details on the initial cluster setup.

Check the Spark Operator installation guide.

Spark Operator definition

a. Queue selection

The target local queue should be specified in the metadata.labels section of the SparkApplication configuration.

metadata:
  labels:
    kueue.x-k8s.io/queue-name: user-queue

b. Optionally set Suspend field in SparkOperation

spec:
  suspend: true

By default, Kueue will set suspend to true via webhook and unsuspend it when the SparkApplication is admitted.

Sample SparkApplication

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  labels:
    kueue.x-k8s.io/queue-name: user-queue
spec:
  type: Scala
  mode: cluster                 # spark-operator supports "cluster" mode only
  sparkVersion: 4.0.0
  image: spark:4.0.0
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
  arguments:
  - "50000"
  memoryOverheadFactor: "0"     # spark adds extra memory on memory limits
                                # for non-JVM tasks. 0 can avoid it.
  driver:
    coreRequest: "1"
    memory: 1g                  # In Java format (e.g. 512m, 2g)
    serviceAccount: spark       # You need to create this service account beforehand,
                                # and the service account should have proper role
                                # ref: https://github.com/kubeflow/spark-operator/blob/master/config/rbac/spark-application-rbac.yaml
  executor:
    instances: 2
    coreRequest: "1"
    memory: 1g                  # In Java format (e.g. 512m, 2g)
    deleteOnTermination: false  # to keep terminated executor pods for demo purpose