the-tree/docs/running-on-kubernetes.md

0001 ---
0002 layout: global
0003 title: Running Spark on Kubernetes
0004 license: |
0005   Licensed to the Apache Software Foundation (ASF) under one or more
0006   contributor license agreements.  See the NOTICE file distributed with
0007   this work for additional information regarding copyright ownership.
0008   The ASF licenses this file to You under the Apache License, Version 2.0
0009   (the "License"); you may not use this file except in compliance with
0010   the License.  You may obtain a copy of the License at
0011
0012      http://www.apache.org/licenses/LICENSE-2.0
0013
0014   Unless required by applicable law or agreed to in writing, software
0015   distributed under the License is distributed on an "AS IS" BASIS,
0016   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0017   See the License for the specific language governing permissions and
0018   limitations under the License.
0019 ---
0020 * This will become a table of contents (this text will be scraped).
0021 {:toc}
0022
0023 Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This feature makes use of native
0024 Kubernetes scheduler that has been added to Spark.
0025
0026 **The Kubernetes scheduler is currently experimental.
0027 In future versions, there may be behavioral changes around configuration,
0028 container images and entrypoints.**
0029
0030 # Security
0031
0032 Security in Spark is OFF by default. This could mean you are vulnerable to attack by default.
0033 Please see [Spark Security](security.html) and the specific advice below before running Spark.
0034
0035 ## User Identity
0036
0037 Images built from the project provided Dockerfiles contain a default [`USER`](https://docs.docker.com/engine/reference/builder/#user) directive with a default UID of `185`.  This means that the resulting images will be running the Spark processes as this UID inside the container. Security conscious deployments should consider providing custom images with `USER` directives specifying their desired unprivileged UID and GID.  The resulting UID should include the root group in its supplementary groups in order to be able to run the Spark executables.  Users building their own images with the provided `docker-image-tool.sh` script can use the `-u <uid>` option to specify the desired UID.
0038
0039 Alternatively the [Pod Template](#pod-template) feature can be used to add a [Security Context](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#volumes-and-file-systems) with a `runAsUser` to the pods that Spark submits.  This can be used to override the `USER` directives in the images themselves.  Please bear in mind that this requires cooperation from your users and as such may not be a suitable solution for shared environments.  Cluster administrators should use [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups) if they wish to limit the users that pods may run as.
0040
0041 ## Volume Mounts
0042
0043 As described later in this document under [Using Kubernetes Volumes](#using-kubernetes-volumes) Spark on K8S provides configuration options that allow for mounting certain volume types into the driver and executor pods.  In particular it allows for [`hostPath`](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) volumes which as described in the Kubernetes documentation have known security vulnerabilities.
0044
0045 Cluster administrators should use [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) to limit the ability to mount `hostPath` volumes appropriately for their environments.
0046
0047 # Prerequisites
0048
0049 * A runnable distribution of Spark 2.3 or above.
0050 * A running Kubernetes cluster at version >= 1.6 with access configured to it using
0051 [kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not already have a working Kubernetes cluster,
0052 you may set up a test cluster on your local machine using
0053 [minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
0054   * We recommend using the latest release of minikube with the DNS addon enabled.
0055   * Be aware that the default minikube configuration is not enough for running Spark applications.
0056   We recommend 3 CPUs and 4g of memory to be able to start a simple Spark application with a single
0057   executor.
0058 * You must have appropriate permissions to list, create, edit and delete
0059 [pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources
0060 by running `kubectl auth can-i <list|create|edit|delete> pods`.
0061   * The service account credentials used by the driver pods must be allowed to create pods, services and configmaps.
0062 * You must have [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) configured in your cluster.
0063
0064 # How it works
0065
0066 <p style="text-align: center;">
0067   <img src="img/k8s-cluster-mode.png" title="Spark cluster components" alt="Spark cluster components" />
0068 </p>
0069
0070 <code>spark-submit</code> can be directly used to submit a Spark application to a Kubernetes cluster.
0071 The submission mechanism works as follows:
0072
0073 * Spark creates a Spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
0074 * The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code.
0075 * When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists
0076 logs and remains in "completed" state in the Kubernetes API until it's eventually garbage collected or manually cleaned up.
0077
0078 Note that in the completed state, the driver pod does *not* use any computational or memory resources.
0079
0080 The driver and executor pod scheduling is handled by Kubernetes. Communication to the Kubernetes API is done via fabric8. It is possible to schedule the
0081 driver and executor pods on a subset of available nodes through a [node selector](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
0082 using the configuration property for it. It will be possible to use more advanced
0083 scheduling hints like [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) in a future release.
0084
0085 # Submitting Applications to Kubernetes
0086
0087 ## Docker Images
0088
0089 Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
0090 be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
0091 frequently used with Kubernetes. Spark (starting with version 2.3) ships with a Dockerfile that can be used for this
0092 purpose, or customized to match an individual application's needs. It can be found in the `kubernetes/dockerfiles/`
0093 directory.
0094
0095 Spark also ships with a `bin/docker-image-tool.sh` script that can be used to build and publish the Docker images to
0096 use with the Kubernetes backend.
0097
0098 Example usage is:
0099
0100 ```bash
0101 $ ./bin/docker-image-tool.sh -r <repo> -t my-tag build
0102 $ ./bin/docker-image-tool.sh -r <repo> -t my-tag push
0103 ```
0104 This will build using the projects provided default `Dockerfiles`. To see more options available for customising the behaviour of this tool, including providing custom `Dockerfiles`, please run with the `-h` flag.
0105
0106 By default `bin/docker-image-tool.sh` builds docker image for running JVM jobs. You need to opt-in to build additional
0107 language binding docker images.
0108
0109 Example usage is
0110 ```bash
0111 # To build additional PySpark docker image
0112 $ ./bin/docker-image-tool.sh -r <repo> -t my-tag -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
0113
0114 # To build additional SparkR docker image
0115 $ ./bin/docker-image-tool.sh -r <repo> -t my-tag -R ./kubernetes/dockerfiles/spark/bindings/R/Dockerfile build
0116 ```
0117
0118 ## Cluster Mode
0119
0120 To launch Spark Pi in cluster mode,
0121
0122 ```bash
0123 $ ./bin/spark-submit \
0124     --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
0125     --deploy-mode cluster \
0126     --name spark-pi \
0127     --class org.apache.spark.examples.SparkPi \
0128     --conf spark.executor.instances=5 \
0129     --conf spark.kubernetes.container.image=<spark-image> \
0130     local:///path/to/examples.jar
0131 ```
0132
0133 The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting
0134 `spark.master` in the application's configuration, must be a URL with the format `k8s://<api_server_host>:<k8s-apiserver-port>`. The port must always be specified, even if it's the HTTPS port 443. Prefixing the
0135 master string with `k8s://` will cause the Spark application to launch on the Kubernetes cluster, with the API server
0136 being contacted at `api_server_url`. If no HTTP protocol is specified in the URL, it defaults to `https`. For example,
0137 setting the master to `k8s://example.com:443` is equivalent to setting it to `k8s://https://example.com:443`, but to
0138 connect without TLS on a different port, the master would be set to `k8s://http://example.com:8080`.
0139
0140 In Kubernetes mode, the Spark application name that is specified by `spark.app.name` or the `--name` argument to
0141 `spark-submit` is used by default to name the Kubernetes resources created like drivers and executors. So, application names
0142 must consist of lower case alphanumeric characters, `-`, and `.`  and must start and end with an alphanumeric character.
0143
0144 If you have a Kubernetes cluster setup, one way to discover the apiserver URL is by executing `kubectl cluster-info`.
0145
0146 ```bash
0147 $ kubectl cluster-info
0148 Kubernetes master is running at http://127.0.0.1:6443
0149 ```
0150
0151 In the above example, the specific Kubernetes cluster can be used with <code>spark-submit</code> by specifying
0152 `--master k8s://http://127.0.0.1:6443` as an argument to spark-submit. Additionally, it is also possible to use the
0153 authenticating proxy, `kubectl proxy` to communicate to the Kubernetes API.
0154
0155 The local proxy can be started by:
0156
0157 ```bash
0158 $ kubectl proxy
0159 ```
0160
0161 If the local proxy is running at localhost:8001, `--master k8s://http://127.0.0.1:8001` can be used as the argument to
0162 spark-submit. Finally, notice that in the above example we specify a jar with a specific URI with a scheme of `local://`.
0163 This URI is the location of the example jar that is already in the Docker image.
0164
0165 ## Client Mode
0166
0167 Starting with Spark 2.4.0, it is possible to run Spark applications on Kubernetes in client mode. When your application
0168 runs in client mode, the driver can run inside a pod or on a physical host. When running an application in client mode,
0169 it is recommended to account for the following factors:
0170
0171 ### Client Mode Networking
0172
0173 Spark executors must be able to connect to the Spark driver over a hostname and a port that is routable from the Spark
0174 executors. The specific network configuration that will be required for Spark to work in client mode will vary per
0175 setup. If you run your driver inside a Kubernetes pod, you can use a
0176 [headless service](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services) to allow your
0177 driver pod to be routable from the executors by a stable hostname. When deploying your headless service, ensure that
0178 the service's label selector will only match the driver pod and no other pods; it is recommended to assign your driver
0179 pod a sufficiently unique label and to use that label in the label selector of the headless service. Specify the driver's
0180 hostname via `spark.driver.host` and your spark driver's port to `spark.driver.port`.
0181
0182 ### Client Mode Executor Pod Garbage Collection
0183
0184 If you run your Spark driver in a pod, it is highly recommended to set `spark.kubernetes.driver.pod.name` to the name of that pod.
0185 When this property is set, the Spark scheduler will deploy the executor pods with an
0186 [OwnerReference](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/), which in turn will
0187 ensure that once the driver pod is deleted from the cluster, all of the application's executor pods will also be deleted.
0188 The driver will look for a pod with the given name in the namespace specified by `spark.kubernetes.namespace`, and
0189 an OwnerReference pointing to that pod will be added to each executor pod's OwnerReferences list. Be careful to avoid
0190 setting the OwnerReference to a pod that is not actually that driver pod, or else the executors may be terminated
0191 prematurely when the wrong pod is deleted.
0192
0193 If your application is not running inside a pod, or if `spark.kubernetes.driver.pod.name` is not set when your application is
0194 actually running in a pod, keep in mind that the executor pods may not be properly deleted from the cluster when the
0195 application exits. The Spark scheduler attempts to delete these pods, but if the network request to the API server fails
0196 for any reason, these pods will remain in the cluster. The executor processes should exit when they cannot reach the
0197 driver, so the executor pods should not consume compute resources (cpu and memory) in the cluster after your application
0198 exits.
0199
0200 ### Authentication Parameters
0201
0202 Use the exact prefix `spark.kubernetes.authenticate` for Kubernetes authentication parameters in client mode.
0203
0204 ## Dependency Management
0205
0206 If your application's dependencies are all hosted in remote locations like HDFS or HTTP servers, they may be referred to
0207 by their appropriate remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images.
0208 Those dependencies can be added to the classpath by referencing them with `local://` URIs and/or setting the
0209 `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. The `local://` scheme is also required when referring to
0210 dependencies in custom-built Docker images in `spark-submit`. We support dependencies from the submission
0211 client's local file system using the `file://` scheme or without a scheme (using a full path), where the destination should be a Hadoop compatible filesystem.
0212 A typical example of this using S3 is via passing the following options:
0213
0214 ```
0215 ...
0216 --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.6
0217 --conf spark.kubernetes.file.upload.path=s3a://<s3-bucket>/path
0218 --conf spark.hadoop.fs.s3a.access.key=...
0219 --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
0220 --conf spark.hadoop.fs.s3a.fast.upload=true
0221 --conf spark.hadoop.fs.s3a.secret.key=....
0222 --conf spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp
0223 file:///full/path/to/app.jar
0224 ```
0225 The app jar file will be uploaded to the S3 and then when the driver is launched it will be downloaded
0226 to the driver pod and will be added to its classpath. Spark will generate a subdir under the upload path with a random name
0227 to avoid conflicts with spark apps running in parallel. User could manage the subdirs created according to his needs.
0228
0229 The client scheme is supported for the application jar, and dependencies specified by properties `spark.jars` and `spark.files`.
0230
0231 Important: all client-side dependencies will be uploaded to the given path with a flat directory structure so
0232 file names must be unique otherwise files will be overwritten. Also make sure in the derived k8s image default ivy dir
0233 has the required access rights or modify the settings as above. The latter is also important if you use `--packages` in
0234 cluster mode.
0235
0236 ## Secret Management
0237 Kubernetes [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) can be used to provide credentials for a
0238 Spark application to access secured services. To mount a user-specified secret into the driver container, users can use
0239 the configuration property of the form `spark.kubernetes.driver.secrets.[SecretName]=<mount path>`. Similarly, the
0240 configuration property of the form `spark.kubernetes.executor.secrets.[SecretName]=<mount path>` can be used to mount a
0241 user-specified secret into the executor containers. Note that it is assumed that the secret to be mounted is in the same
0242 namespace as that of the driver and executor pods. For example, to mount a secret named `spark-secret` onto the path
0243 `/etc/secrets` in both the driver and executor containers, add the following options to the `spark-submit` command:
0244
0245 ```
0246 --conf spark.kubernetes.driver.secrets.spark-secret=/etc/secrets
0247 --conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets
0248 ```
0249
0250 To use a secret through an environment variable use the following options to the `spark-submit` command:
0251 ```
0252 --conf spark.kubernetes.driver.secretKeyRef.ENV_NAME=name:key
0253 --conf spark.kubernetes.executor.secretKeyRef.ENV_NAME=name:key
0254 ```
0255
0256 ## Pod Template
0257 Kubernetes allows defining pods from [template files](https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/#pod-templates).
0258 Spark users can similarly use template files to define the driver or executor pod configurations that Spark configurations do not support.
0259 To do so, specify the spark properties `spark.kubernetes.driver.podTemplateFile` and `spark.kubernetes.executor.podTemplateFile`
0260 to point to local files accessible to the `spark-submit` process. To allow the driver pod access the executor pod template
0261 file, the file will be automatically mounted onto a volume in the driver pod when it's created.
0262 Spark does not do any validation after unmarshalling these template files and relies on the Kubernetes API server for validation.
0263
0264 It is important to note that Spark is opinionated about certain pod configurations so there are values in the
0265 pod template that will always be overwritten by Spark. Therefore, users of this feature should note that specifying
0266 the pod template file only lets Spark start with a template pod instead of an empty pod during the pod-building process.
0267 For details, see the [full list](#pod-template-properties) of pod template values that will be overwritten by spark.
0268
0269 Pod template files can also define multiple containers. In such cases, you can use the spark properties
0270 `spark.kubernetes.driver.podTemplateContainerName` and `spark.kubernetes.executor.podTemplateContainerName`
0271 to indicate which container should be used as a basis for the driver or executor.
0272 If not specified, or if the container name is not valid, Spark will assume that the first container in the list
0273 will be the driver or executor container.
0274
0275 ## Using Kubernetes Volumes
0276
0277 Starting with Spark 2.4.0, users can mount the following types of Kubernetes [volumes](https://kubernetes.io/docs/concepts/storage/volumes/) into the driver and executor pods:
0278 * [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath): mounts a file or directory from the host node’s filesystem into a pod.
0279 * [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir): an initially empty volume created when a pod is assigned to a node.
0280 * [persistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/volumes/#persistentvolumeclaim): used to mount a `PersistentVolume` into a pod.
0281
0282 **NB:** Please see the [Security](#security) section of this document for security issues related to volume mounts.
0283
0284 To mount a volume of any of the types above into the driver pod, use the following configuration property:
0285
0286 ```
0287 --conf spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path=<mount path>
0288 --conf spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.readOnly=<true|false>
0289 --conf spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.subPath=<mount subPath>
0290 ```
0291
0292 Specifically, `VolumeType` can be one of the following values: `hostPath`, `emptyDir`, and `persistentVolumeClaim`. `VolumeName` is the name you want to use for the volume under the `volumes` field in the pod specification.
0293
0294 Each supported type of volumes may have some specific configuration options, which can be specified using configuration properties of the following form:
0295
0296 ```
0297 spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].options.[OptionName]=<value>
0298 ```
0299
0300 For example, the claim name of a `persistentVolumeClaim` with volume name `checkpointpvc` can be specified using the following property:
0301
0302 ```
0303 spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.claimName=check-point-pvc-claim
0304 ```
0305
0306 The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below.
0307
0308 ## Local Storage
0309
0310 Spark supports using volumes to spill data during shuffles and other operations. To use a volume as local storage, the volume's name should starts with `spark-local-dir-`, for example:
0311
0312 ```
0313 --conf spark.kubernetes.driver.volumes.[VolumeType].spark-local-dir-[VolumeName].mount.path=<mount path>
0314 --conf spark.kubernetes.driver.volumes.[VolumeType].spark-local-dir-[VolumeName].mount.readOnly=false
0315 ```
0316
0317
0318 If no volume is set as local storage, Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `spark.local.dir` or the environment variable `SPARK_LOCAL_DIRS` .  If no directories are explicitly specified then a default directory is created and configured appropriately.
0319
0320 `emptyDir` volumes use the ephemeral storage feature of Kubernetes and do not persist beyond the life of the pod.
0321
0322 ### Using RAM for local storage
0323
0324 `emptyDir` volumes use the nodes backing storage for ephemeral storage by default, this behaviour may not be appropriate for some compute environments.  For example if you have diskless nodes with remote storage mounted over a network, having lots of executors doing IO to this remote storage may actually degrade performance.
0325
0326 In this case it may be desirable to set `spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.  When configured like this Spark's local storage usage will count towards your pods memory usage therefore you may wish to increase your memory requests by increasing the value of `spark.kubernetes.memoryOverheadFactor` as appropriate.
0327
0328
0329 ## Introspection and Debugging
0330
0331 These are the different ways in which you can investigate a running/completed Spark application, monitor progress, and
0332 take actions.
0333
0334 ### Accessing Logs
0335
0336 Logs can be accessed using the Kubernetes API and the `kubectl` CLI. When a Spark application is running, it's possible
0337 to stream logs from the application using:
0338
0339 ```bash
0340 $ kubectl -n=<namespace> logs -f <driver-pod-name>
0341 ```
0342
0343 The same logs can also be accessed through the
0344 [Kubernetes dashboard](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/) if installed on
0345 the cluster.
0346
0347 ### Accessing Driver UI
0348
0349 The UI associated with any application can be accessed locally using
0350 [`kubectl port-forward`](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/#forward-a-local-port-to-a-port-on-the-pod).
0351
0352 ```bash
0353 $ kubectl port-forward <driver-pod-name> 4040:4040
0354 ```
0355
0356 Then, the Spark driver UI can be accessed on `http://localhost:4040`.
0357
0358 ### Debugging
0359
0360 There may be several kinds of failures. If the Kubernetes API server rejects the request made from spark-submit, or the
0361 connection is refused for a different reason, the submission logic should indicate the error encountered. However, if there
0362 are errors during the running of the application, often, the best way to investigate may be through the Kubernetes CLI.
0363
0364 To get some basic information about the scheduling decisions made around the driver pod, you can run:
0365
0366 ```bash
0367 $ kubectl describe pod <spark-driver-pod>
0368 ```
0369
0370 If the pod has encountered a runtime error, the status can be probed further using:
0371
0372 ```bash
0373 $ kubectl logs <spark-driver-pod>
0374 ```
0375
0376 Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
0377 application, including all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
0378 the Spark application.
0379
0380 ## Kubernetes Features
0381
0382 ### Configuration File
0383
0384 Your Kubernetes config file typically lives under `.kube/config` in your home directory or in a location specified by the `KUBECONFIG` environment variable.  Spark on Kubernetes will attempt to use this file to do an initial auto-configuration of the Kubernetes client used to interact with the Kubernetes cluster.  A variety of Spark configuration properties are provided that allow further customising the client configuration e.g. using an alternative authentication method.
0385
0386 ### Contexts
0387
0388 Kubernetes configuration files can contain multiple contexts that allow for switching between different clusters and/or user identities.  By default Spark on Kubernetes will use your current context (which can be checked by running `kubectl config current-context`) when doing the initial auto-configuration of the Kubernetes client.
0389
0390 In order to use an alternative context users can specify the desired context via the Spark configuration property `spark.kubernetes.context` e.g. `spark.kubernetes.context=minikube`.
0391
0392 ### Namespaces
0393
0394 Kubernetes has the concept of [namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
0395 Namespaces are ways to divide cluster resources between multiple users (via resource quota). Spark on Kubernetes can
0396 use namespaces to launch Spark applications. This can be made use of through the `spark.kubernetes.namespace` configuration.
0397
0398 Kubernetes allows using [ResourceQuota](https://kubernetes.io/docs/concepts/policy/resource-quotas/) to set limits on
0399 resources, number of objects, etc on individual namespaces. Namespaces and ResourceQuota can be used in combination by
0400 administrator to control sharing and resource allocation in a Kubernetes cluster running Spark applications.
0401
0402 ### RBAC
0403
0404 In Kubernetes clusters with [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/) enabled, users can configure
0405 Kubernetes RBAC roles and service accounts used by the various Spark on Kubernetes components to access the Kubernetes
0406 API server.
0407
0408 The Spark driver pod uses a Kubernetes service account to access the Kubernetes API server to create and watch executor
0409 pods. The service account used by the driver pod must have the appropriate permission for the driver to be able to do
0410 its work. Specifically, at minimum, the service account must be granted a
0411 [`Role` or `ClusterRole`](https://kubernetes.io/docs/admin/authorization/rbac/#role-and-clusterrole) that allows driver
0412 pods to create pods and services. By default, the driver pod is automatically assigned the `default` service account in
0413 the namespace specified by `spark.kubernetes.namespace`, if no service account is specified when the pod gets created.
0414
0415 Depending on the version and setup of Kubernetes deployed, this `default` service account may or may not have the role
0416 that allows driver pods to create pods and services under the default Kubernetes
0417 [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/) policies. Sometimes users may need to specify a custom
0418 service account that has the right role granted. Spark on Kubernetes supports specifying a custom service account to
0419 be used by the driver pod through the configuration property
0420 `spark.kubernetes.authenticate.driver.serviceAccountName=<service account name>`. For example, to make the driver pod
0421 use the `spark` service account, a user simply adds the following option to the `spark-submit` command:
0422
0423 ```
0424 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
0425 ```
0426
0427 To create a custom service account, a user can use the `kubectl create serviceaccount` command. For example, the
0428 following command creates a service account named `spark`:
0429
0430 ```bash
0431 $ kubectl create serviceaccount spark
0432 ```
0433
0434 To grant a service account a `Role` or `ClusterRole`, a `RoleBinding` or `ClusterRoleBinding` is needed. To create
0435 a `RoleBinding` or `ClusterRoleBinding`, a user can use the `kubectl create rolebinding` (or `clusterrolebinding`
0436 for `ClusterRoleBinding`) command. For example, the following command creates an `edit` `ClusterRole` in the `default`
0437 namespace and grants it to the `spark` service account created above:
0438
0439 ```bash
0440 $ kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
0441 ```
0442
0443 Note that a `Role` can only be used to grant access to resources (like pods) within a single namespace, whereas a
0444 `ClusterRole` can be used to grant access to cluster-scoped resources (like nodes) as well as namespaced resources
0445 (like pods) across all namespaces. For Spark on Kubernetes, since the driver always creates executor pods in the
0446 same namespace, a `Role` is sufficient, although users may use a `ClusterRole` instead. For more information on
0447 RBAC authorization and how to configure Kubernetes service accounts for pods, please refer to
0448 [Using RBAC Authorization](https://kubernetes.io/docs/admin/authorization/rbac/) and
0449 [Configure Service Accounts for Pods](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/).
0450
0451 ## Spark Application Management
0452
0453 Kubernetes provides simple application management via the spark-submit CLI tool in cluster mode.
0454 Users can kill a job by providing the submission ID that is printed when submitting their job.
0455 The submission ID follows the format ``namespace:driver-pod-name``.
0456 If user omits the namespace then the namespace set in current k8s context is used.
0457 For example if user has set a specific namespace as follows `kubectl config set-context minikube --namespace=spark`
0458 then the `spark` namespace will be used by default. On the other hand, if there is no namespace added to the specific context
0459 then all namespaces will be considered by default. That means operations will affect all Spark applications matching the given submission ID regardless of namespace.
0460 Moreover, spark-submit for application management uses the same backend code that is used for submitting the driver, so the same properties
0461 like `spark.kubernetes.context` etc., can be re-used.
0462
0463 For example:
0464 ```bash
0465 $ spark-submit --kill spark:spark-pi-1547948636094-driver --master k8s://https://192.168.2.8:8443
0466 ```
0467 Users also can list the application status by using the `--status` flag:
0468
0469 ```bash
0470 $ spark-submit --status spark:spark-pi-1547948636094-driver --master  k8s://https://192.168.2.8:8443
0471 ```
0472 Both operations support glob patterns. For example user can run:
0473 ```bash
0474 $ spark-submit --kill spark:spark-pi* --master  k8s://https://192.168.2.8:8443
0475 ```
0476 The above will kill all application with the specific prefix.
0477
0478 User can specify the grace period for pod termination via the `spark.kubernetes.appKillPodDeletionGracePeriod` property,
0479 using `--conf` as means to provide it (default value for all K8s pods is <a href="https://kubernetes.io/docs/concepts/workloads/pods/pod">30 secs</a>).
0480
0481 ## Future Work
0482
0483 There are several Spark on Kubernetes features that are currently being worked on or planned to be worked on. Those features are expected to eventually make it into future versions of the spark-kubernetes integration.
0484
0485 Some of these include:
0486
0487 * Dynamic Resource Allocation and External Shuffle Service
0488 * Job Queues and Resource Management
0489
0490 # Configuration
0491
0492 See the [configuration page](configuration.html) for information on Spark configurations.  The following configurations are specific to Spark on Kubernetes.
0493
0494 #### Spark Properties
0495
0496 <table class="table">
0497 <tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
0498 <tr>
0499   <td><code>spark.kubernetes.context</code></td>
0500   <td><code>(none)</code></td>
0501   <td>
0502     The context from the user Kubernetes configuration file used for the initial
0503     auto-configuration of the Kubernetes client library.  When not specified then
0504     the users current context is used.  <strong>NB:</strong> Many of the
0505     auto-configured settings can be overridden by the use of other Spark
0506     configuration properties e.g. <code>spark.kubernetes.namespace</code>.
0507   </td>
0508   <td>3.0.0</td>
0509 </tr>
0510 <tr>
0511   <td><code>spark.kubernetes.driver.master</code></td>
0512   <td><code>https://kubernetes.default.svc</code></td>
0513   <td>
0514     The internal Kubernetes master (API server) address to be used for driver to request executors.
0515   </td>
0516   <td>3.0.0</td>
0517 </tr>
0518 <tr>
0519   <td><code>spark.kubernetes.namespace</code></td>
0520   <td><code>default</code></td>
0521   <td>
0522     The namespace that will be used for running the driver and executor pods.
0523   </td>
0524   <td>2.3.0</td>
0525 </tr>
0526 <tr>
0527   <td><code>spark.kubernetes.container.image</code></td>
0528   <td><code>(none)</code></td>
0529   <td>
0530     Container image to use for the Spark application.
0531     This is usually of the form <code>example.com/repo/spark:v1.0.0</code>.
0532     This configuration is required and must be provided by the user, unless explicit
0533     images are provided for each different container type.
0534   </td>
0535   <td>2.3.0</td>
0536 </tr>
0537 <tr>
0538   <td><code>spark.kubernetes.driver.container.image</code></td>
0539   <td><code>(value of spark.kubernetes.container.image)</code></td>
0540   <td>
0541     Custom container image to use for the driver.
0542   </td>
0543   <td>2.3.0</td>
0544 </tr>
0545 <tr>
0546   <td><code>spark.kubernetes.executor.container.image</code></td>
0547   <td><code>(value of spark.kubernetes.container.image)</code></td>
0548   <td>
0549     Custom container image to use for executors.
0550   </td>
0551   <td>2.3.0</td>
0552 </tr>
0553 <tr>
0554   <td><code>spark.kubernetes.container.image.pullPolicy</code></td>
0555   <td><code>IfNotPresent</code></td>
0556   <td>
0557     Container image pull policy used when pulling images within Kubernetes.
0558   </td>
0559   <td>2.3.0</td>
0560 </tr>
0561 <tr>
0562   <td><code>spark.kubernetes.container.image.pullSecrets</code></td>
0563   <td><code></code></td>
0564   <td>
0565     Comma separated list of Kubernetes secrets used to pull images from private image registries.
0566   </td>
0567   <td>2.4.0</td>
0568 </tr>
0569 <tr>
0570   <td><code>spark.kubernetes.allocation.batch.size</code></td>
0571   <td><code>5</code></td>
0572   <td>
0573     Number of pods to launch at once in each round of executor pod allocation.
0574   </td>
0575   <td>2.3.0</td>
0576 </tr>
0577 <tr>
0578   <td><code>spark.kubernetes.allocation.batch.delay</code></td>
0579   <td><code>1s</code></td>
0580   <td>
0581     Time to wait between each round of executor pod allocation. Specifying values less than 1 second may lead to
0582     excessive CPU usage on the spark driver.
0583   </td>
0584   <td>2.3.0</td>
0585 </tr>
0586 <tr>
0587   <td><code>spark.kubernetes.authenticate.submission.caCertFile</code></td>
0588   <td>(none)</td>
0589   <td>
0590     Path to the CA cert file for connecting to the Kubernetes API server over TLS when starting the driver. This file
0591     must be located on the submitting machine's disk. Specify this as a path as opposed to a URI (i.e. do not provide
0592     a scheme). In client mode, use <code>spark.kubernetes.authenticate.caCertFile</code> instead.
0593   </td>
0594   <td>2.3.0</td>
0595 </tr>
0596 <tr>
0597   <td><code>spark.kubernetes.authenticate.submission.clientKeyFile</code></td>
0598   <td>(none)</td>
0599   <td>
0600     Path to the client key file for authenticating against the Kubernetes API server when starting the driver. This file
0601     must be located on the submitting machine's disk. Specify this as a path as opposed to a URI (i.e. do not provide
0602     a scheme). In client mode, use <code>spark.kubernetes.authenticate.clientKeyFile</code> instead.
0603   </td>
0604   <td>2.3.0</td>
0605 </tr>
0606 <tr>
0607   <td><code>spark.kubernetes.authenticate.submission.clientCertFile</code></td>
0608   <td>(none)</td>
0609   <td>
0610     Path to the client cert file for authenticating against the Kubernetes API server when starting the driver. This
0611     file must be located on the submitting machine's disk. Specify this as a path as opposed to a URI (i.e. do not
0612     provide a scheme). In client mode, use <code>spark.kubernetes.authenticate.clientCertFile</code> instead.
0613   </td>
0614   <td>2.3.0</td>
0615 </tr>
0616 <tr>
0617   <td><code>spark.kubernetes.authenticate.submission.oauthToken</code></td>
0618   <td>(none)</td>
0619   <td>
0620     OAuth token to use when authenticating against the Kubernetes API server when starting the driver. Note
0621     that unlike the other authentication options, this is expected to be the exact string value of the token to use for
0622     the authentication. In client mode, use <code>spark.kubernetes.authenticate.oauthToken</code> instead.
0623   </td>
0624   <td>2.3.0</td>
0625 </tr>
0626 <tr>
0627   <td><code>spark.kubernetes.authenticate.submission.oauthTokenFile</code></td>
0628   <td>(none)</td>
0629   <td>
0630     Path to the OAuth token file containing the token to use when authenticating against the Kubernetes API server when starting the driver.
0631     This file must be located on the submitting machine's disk. Specify this as a path as opposed to a URI (i.e. do not
0632     provide a scheme). In client mode, use <code>spark.kubernetes.authenticate.oauthTokenFile</code> instead.
0633   </td>
0634   <td>2.3.0</td>
0635 </tr>
0636 <tr>
0637   <td><code>spark.kubernetes.authenticate.driver.caCertFile</code></td>
0638   <td>(none)</td>
0639   <td>
0640     Path to the CA cert file for connecting to the Kubernetes API server over TLS from the driver pod when requesting
0641     executors. This file must be located on the submitting machine's disk, and will be uploaded to the driver pod.
0642     Specify this as a path as opposed to a URI (i.e. do not provide a scheme). In client mode, use
0643     <code>spark.kubernetes.authenticate.caCertFile</code> instead.
0644   </td>
0645   <td>2.3.0</td>
0646 </tr>
0647 <tr>
0648   <td><code>spark.kubernetes.authenticate.driver.clientKeyFile</code></td>
0649   <td>(none)</td>
0650   <td>
0651     Path to the client key file for authenticating against the Kubernetes API server from the driver pod when requesting
0652     executors. This file must be located on the submitting machine's disk, and will be uploaded to the driver pod as
0653     a Kubernetes secret. Specify this as a path as opposed to a URI (i.e. do not provide a scheme).
0654     In client mode, use <code>spark.kubernetes.authenticate.clientKeyFile</code> instead.
0655   </td>
0656   <td>2.3.0</td>
0657 </tr>
0658 <tr>
0659   <td><code>spark.kubernetes.authenticate.driver.clientCertFile</code></td>
0660   <td>(none)</td>
0661   <td>
0662     Path to the client cert file for authenticating against the Kubernetes API server from the driver pod when
0663     requesting executors. This file must be located on the submitting machine's disk, and will be uploaded to the
0664     driver pod as a Kubernetes secret. Specify this as a path as opposed to a URI (i.e. do not provide a scheme).
0665     In client mode, use <code>spark.kubernetes.authenticate.clientCertFile</code> instead.
0666   </td>
0667   <td>2.3.0</td>
0668 </tr>
0669 <tr>
0670   <td><code>spark.kubernetes.authenticate.driver.oauthToken</code></td>
0671   <td>(none)</td>
0672   <td>
0673     OAuth token to use when authenticating against the Kubernetes API server from the driver pod when
0674     requesting executors. Note that unlike the other authentication options, this must be the exact string value of
0675     the token to use for the authentication. This token value is uploaded to the driver pod as a Kubernetes secret.
0676     In client mode, use <code>spark.kubernetes.authenticate.oauthToken</code> instead.
0677   </td>
0678   <td>2.3.0</td>
0679 </tr>
0680 <tr>
0681   <td><code>spark.kubernetes.authenticate.driver.oauthTokenFile</code></td>
0682   <td>(none)</td>
0683   <td>
0684     Path to the OAuth token file containing the token to use when authenticating against the Kubernetes API server from the driver pod when
0685     requesting executors. Note that unlike the other authentication options, this file must contain the exact string value of
0686     the token to use for the authentication. This token value is uploaded to the driver pod as a secret. In client mode, use
0687     <code>spark.kubernetes.authenticate.oauthTokenFile</code> instead.
0688   </td>
0689   <td>2.3.0</td>
0690 </tr>
0691 <tr>
0692   <td><code>spark.kubernetes.authenticate.driver.mounted.caCertFile</code></td>
0693   <td>(none)</td>
0694   <td>
0695     Path to the CA cert file for connecting to the Kubernetes API server over TLS from the driver pod when requesting
0696     executors. This path must be accessible from the driver pod.
0697     Specify this as a path as opposed to a URI (i.e. do not provide a scheme). In client mode, use
0698     <code>spark.kubernetes.authenticate.caCertFile</code> instead.
0699   </td>
0700   <td>2.3.0</td>
0701 </tr>
0702 <tr>
0703   <td><code>spark.kubernetes.authenticate.driver.mounted.clientKeyFile</code></td>
0704   <td>(none)</td>
0705   <td>
0706     Path to the client key file for authenticating against the Kubernetes API server from the driver pod when requesting
0707     executors. This path must be accessible from the driver pod.
0708     Specify this as a path as opposed to a URI (i.e. do not provide a scheme). In client mode, use
0709     <code>spark.kubernetes.authenticate.clientKeyFile</code> instead.
0710   </td>
0711   <td>2.3.0</td>
0712 </tr>
0713 <tr>
0714   <td><code>spark.kubernetes.authenticate.driver.mounted.clientCertFile</code></td>
0715   <td>(none)</td>
0716   <td>
0717     Path to the client cert file for authenticating against the Kubernetes API server from the driver pod when
0718     requesting executors. This path must be accessible from the driver pod.
0719     Specify this as a path as opposed to a URI (i.e. do not provide a scheme). In client mode, use
0720     <code>spark.kubernetes.authenticate.clientCertFile</code> instead.
0721   </td>
0722   <td>2.3.0</td>
0723 </tr>
0724 <tr>
0725   <td><code>spark.kubernetes.authenticate.driver.mounted.oauthTokenFile</code></td>
0726   <td>(none)</td>
0727   <td>
0728     Path to the file containing the OAuth token to use when authenticating against the Kubernetes API server from the driver pod when
0729     requesting executors. This path must be accessible from the driver pod.
0730     Note that unlike the other authentication options, this file must contain the exact string value of the token to use
0731     for the authentication. In client mode, use <code>spark.kubernetes.authenticate.oauthTokenFile</code> instead.
0732   </td>
0733   <td>2.3.0</td>
0734 </tr>
0735 <tr>
0736   <td><code>spark.kubernetes.authenticate.driver.serviceAccountName</code></td>
0737   <td><code>default</code></td>
0738   <td>
0739     Service account that is used when running the driver pod. The driver pod uses this service account when requesting
0740     executor pods from the API server. Note that this cannot be specified alongside a CA cert file, client key file,
0741     client cert file, and/or OAuth token. In client mode, use <code>spark.kubernetes.authenticate.serviceAccountName</code> instead.
0742   </td>
0743   <td>2.3.0</td>
0744 </tr>
0745 <tr>
0746   <td><code>spark.kubernetes.authenticate.caCertFile</code></td>
0747   <td>(none)</td>
0748   <td>
0749     In client mode, path to the CA cert file for connecting to the Kubernetes API server over TLS when
0750     requesting executors. Specify this as a path as opposed to a URI (i.e. do not provide a scheme).
0751   </td>
0752   <td>2.4.0</td>
0753 </tr>
0754 <tr>
0755   <td><code>spark.kubernetes.authenticate.clientKeyFile</code></td>
0756   <td>(none)</td>
0757   <td>
0758     In client mode, path to the client key file for authenticating against the Kubernetes API server
0759     when requesting executors. Specify this as a path as opposed to a URI (i.e. do not provide a scheme).
0760   </td>
0761   <td>2.4.0</td>
0762 </tr>
0763 <tr>
0764   <td><code>spark.kubernetes.authenticate.clientCertFile</code></td>
0765   <td>(none)</td>
0766   <td>
0767     In client mode, path to the client cert file for authenticating against the Kubernetes API server
0768     when requesting executors. Specify this as a path as opposed to a URI (i.e. do not provide a scheme).
0769   </td>
0770   <td>2.4.0</td>
0771 </tr>
0772 <tr>
0773   <td><code>spark.kubernetes.authenticate.oauthToken</code></td>
0774   <td>(none)</td>
0775   <td>
0776     In client mode, the OAuth token to use when authenticating against the Kubernetes API server when
0777     requesting executors. Note that unlike the other authentication options, this must be the exact string value of
0778     the token to use for the authentication.
0779   </td>
0780   <td>2.4.0</td>
0781 </tr>
0782 <tr>
0783   <td><code>spark.kubernetes.authenticate.oauthTokenFile</code></td>
0784   <td>(none)</td>
0785   <td>
0786     In client mode, path to the file containing the OAuth token to use when authenticating against the Kubernetes API
0787     server when requesting executors.
0788   </td>
0789   <td>2.4.0</td>
0790 </tr>
0791 <tr>
0792   <td><code>spark.kubernetes.driver.label.[LabelName]</code></td>
0793   <td>(none)</td>
0794   <td>
0795     Add the label specified by <code>LabelName</code> to the driver pod.
0796     For example, <code>spark.kubernetes.driver.label.something=true</code>.
0797     Note that Spark also adds its own labels to the driver pod
0798     for bookkeeping purposes.
0799   </td>
0800   <td>2.3.0</td>
0801 </tr>
0802 <tr>
0803   <td><code>spark.kubernetes.driver.annotation.[AnnotationName]</code></td>
0804   <td>(none)</td>
0805   <td>
0806     Add the Kubernetes <a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/">annotation</a> specified by <code>AnnotationName</code> to the driver pod.
0807     For example, <code>spark.kubernetes.driver.annotation.something=true</code>.
0808   </td>
0809   <td>2.3.0</td>
0810 </tr>
0811 <tr>
0812   <td><code>spark.kubernetes.driver.service.annotation.[AnnotationName]</code></td>
0813   <td>(none)</td>
0814   <td>
0815     Add the Kubernetes <a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/">annotation</a> specified by <code>AnnotationName</code> to the driver service.
0816     For example, <code>spark.kubernetes.driver.service.annotation.something=true</code>.
0817   </td>
0818   <td>3.0.0</td>
0819 </tr>
0820 <tr>
0821   <td><code>spark.kubernetes.executor.label.[LabelName]</code></td>
0822   <td>(none)</td>
0823   <td>
0824     Add the label specified by <code>LabelName</code> to the executor pods.
0825     For example, <code>spark.kubernetes.executor.label.something=true</code>.
0826     Note that Spark also adds its own labels to the executor pod
0827     for bookkeeping purposes.
0828   </td>
0829   <td>2.3.0</td>
0830 </tr>
0831 <tr>
0832   <td><code>spark.kubernetes.executor.annotation.[AnnotationName]</code></td>
0833   <td>(none)</td>
0834   <td>
0835     Add the Kubernetes <a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/">annotation</a> specified by <code>AnnotationName</code> to the executor pods.
0836     For example, <code>spark.kubernetes.executor.annotation.something=true</code>.
0837   </td>
0838   <td>2.3.0</td>
0839 </tr>
0840 <tr>
0841   <td><code>spark.kubernetes.driver.pod.name</code></td>
0842   <td>(none)</td>
0843   <td>
0844     Name of the driver pod. In cluster mode, if this is not set, the driver pod name is set to "spark.app.name"
0845     suffixed by the current timestamp to avoid name conflicts. In client mode, if your application is running
0846     inside a pod, it is highly recommended to set this to the name of the pod your driver is running in. Setting this
0847     value in client mode allows the driver to become the owner of its executor pods, which in turn allows the executor
0848     pods to be garbage collected by the cluster.
0849   </td>
0850   <td>2.3.0</td>
0851 </tr>
0852 <tr>
0853   <td><code>spark.kubernetes.executor.lostCheck.maxAttempts</code></td>
0854   <td><code>10</code></td>
0855   <td>
0856     Number of times that the driver will try to ascertain the loss reason for a specific executor.
0857     The loss reason is used to ascertain whether the executor failure is due to a framework or an application error
0858     which in turn decides whether the executor is removed and replaced, or placed into a failed state for debugging.
0859   </td>
0860   <td>2.3.0</td>
0861 </tr>
0862 <tr>
0863   <td><code>spark.kubernetes.submission.waitAppCompletion</code></td>
0864   <td><code>true</code></td>
0865   <td>
0866     In cluster mode, whether to wait for the application to finish before exiting the launcher process.  When changed to
0867     false, the launcher has a "fire-and-forget" behavior when launching the Spark job.
0868   </td>
0869   <td>2.3.0</td>
0870 </tr>
0871 <tr>
0872   <td><code>spark.kubernetes.report.interval</code></td>
0873   <td><code>1s</code></td>
0874   <td>
0875     Interval between reports of the current Spark job status in cluster mode.
0876   </td>
0877   <td>2.3.0</td>
0878 </tr>
0879 <tr>
0880   <td><code>spark.kubernetes.driver.request.cores</code></td>
0881   <td>(none)</td>
0882   <td>
0883     Specify the cpu request for the driver pod. Values conform to the Kubernetes <a href="https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu">convention</a>.
0884     Example values include 0.1, 500m, 1.5, 5, etc., with the definition of cpu units documented in <a href="https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#cpu-units">CPU units</a>.
0885     This takes precedence over <code>spark.driver.cores</code> for specifying the driver pod cpu request if set.
0886   </td>
0887   <td>3.0.0</td>
0888 </tr>
0889 <tr>
0890   <td><code>spark.kubernetes.driver.limit.cores</code></td>
0891   <td>(none)</td>
0892   <td>
0893     Specify a hard cpu <a href="https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container">limit</a> for the driver pod.
0894   </td>
0895   <td>2.3.0</td>
0896 </tr>
0897 <tr>
0898   <td><code>spark.kubernetes.executor.request.cores</code></td>
0899   <td>(none)</td>
0900   <td>
0901     Specify the cpu request for each executor pod. Values conform to the Kubernetes <a href="https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu">convention</a>.
0902     Example values include 0.1, 500m, 1.5, 5, etc., with the definition of cpu units documented in <a href="https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#cpu-units">CPU units</a>.
0903     This is distinct from <code>spark.executor.cores</code>: it is only used and takes precedence over <code>spark.executor.cores</code> for specifying the executor pod cpu request if set. Task
0904     parallelism, e.g., number of tasks an executor can run concurrently is not affected by this.
0905   </td>
0906   <td>2.4.0</td>
0907 </tr>
0908 <tr>
0909   <td><code>spark.kubernetes.executor.limit.cores</code></td>
0910   <td>(none)</td>
0911   <td>
0912     Specify a hard cpu <a href="https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container">limit</a> for each executor pod launched for the Spark Application.
0913   </td>
0914   <td>2.3.0</td>
0915 </tr>
0916 <tr>
0917   <td><code>spark.kubernetes.node.selector.[labelKey]</code></td>
0918   <td>(none)</td>
0919   <td>
0920     Adds to the node selector of the driver pod and executor pods, with key <code>labelKey</code> and the value as the
0921     configuration's value. For example, setting <code>spark.kubernetes.node.selector.identifier</code> to <code>myIdentifier</code>
0922     will result in the driver pod and executors having a node selector with key <code>identifier</code> and value
0923      <code>myIdentifier</code>. Multiple node selector keys can be added by setting multiple configurations with this prefix.
0924   </td>
0925   <td>2.3.0</td>
0926 </tr>
0927 <tr>
0928   <td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
0929   <td>(none)</td>
0930   <td>
0931     Add the environment variable specified by <code>EnvironmentVariableName</code> to
0932     the Driver process. The user can specify multiple of these to set multiple environment variables.
0933   </td>
0934   <td>2.3.0</td>
0935 </tr>
0936 <tr>
0937   <td><code>spark.kubernetes.driver.secrets.[SecretName]</code></td>
0938   <td>(none)</td>
0939   <td>
0940    Add the <a href="https://kubernetes.io/docs/concepts/configuration/secret/">Kubernetes Secret</a> named <code>SecretName</code> to the driver pod on the path specified in the value. For example,
0941    <code>spark.kubernetes.driver.secrets.spark-secret=/etc/secrets</code>.
0942   </td>
0943   <td>2.3.0</td>
0944 </tr>
0945 <tr>
0946   <td><code>spark.kubernetes.executor.secrets.[SecretName]</code></td>
0947   <td>(none)</td>
0948   <td>
0949    Add the <a href="https://kubernetes.io/docs/concepts/configuration/secret/">Kubernetes Secret</a> named <code>SecretName</code> to the executor pod on the path specified in the value. For example,
0950    <code>spark.kubernetes.executor.secrets.spark-secret=/etc/secrets</code>.
0951   </td>
0952   <td>2.3.0</td>
0953 </tr>
0954 <tr>
0955   <td><code>spark.kubernetes.driver.secretKeyRef.[EnvName]</code></td>
0956   <td>(none)</td>
0957   <td>
0958    Add as an environment variable to the driver container with name EnvName (case sensitive), the value referenced by key <code> key </code> in the data of the referenced <a href="https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables">Kubernetes Secret</a>. For example,
0959    <code>spark.kubernetes.driver.secretKeyRef.ENV_VAR=spark-secret:key</code>.
0960   </td>
0961   <td>2.4.0</td>
0962 </tr>
0963 <tr>
0964   <td><code>spark.kubernetes.executor.secretKeyRef.[EnvName]</code></td>
0965   <td>(none)</td>
0966   <td>
0967    Add as an environment variable to the executor container with name EnvName (case sensitive), the value referenced by key <code> key </code> in the data of the referenced <a href="https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables">Kubernetes Secret</a>. For example,
0968    <code>spark.kubernetes.executor.secrets.ENV_VAR=spark-secret:key</code>.
0969   </td>
0970   <td>2.4.0</td>
0971 </tr>
0972 <tr>
0973   <td><code>spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path</code></td>
0974   <td>(none)</td>
0975   <td>
0976    Add the <a href="https://kubernetes.io/docs/concepts/storage/volumes/">Kubernetes Volume</a> named <code>VolumeName</code> of the <code>VolumeType</code> type to the driver pod on the path specified in the value. For example,
0977    <code>spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.mount.path=/checkpoint</code>.
0978   </td>
0979   <td>2.4.0</td>
0980 </tr>
0981 <tr>
0982   <td><code>spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.subPath</code></td>
0983   <td>(none)</td>
0984   <td>
0985    Specifies a <a href="https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath">subpath</a> to be mounted from the volume into the driver pod.
0986    <code>spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.mount.subPath=checkpoint</code>.
0987   </td>
0988   <td>3.0.0</td>
0989 </tr>
0990 <tr>
0991   <td><code>spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.readOnly</code></td>
0992   <td>(none)</td>
0993   <td>
0994    Specify if the mounted volume is read only or not. For example,
0995    <code>spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.mount.readOnly=false</code>.
0996   </td>
0997   <td>2.4.0</td>
0998 </tr>
0999 <tr>
1000   <td><code>spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].options.[OptionName]</code></td>
1001   <td>(none)</td>
1002   <td>
1003    Configure <a href="https://kubernetes.io/docs/concepts/storage/volumes/">Kubernetes Volume</a> options passed to the Kubernetes with <code>OptionName</code> as key having specified value, must conform with Kubernetes option format. For example,
1004    <code>spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.claimName=spark-pvc-claim</code>.
1005   </td>
1006   <td>2.4.0</td>
1007 </tr>
1008 <tr>
1009   <td><code>spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.path</code></td>
1010   <td>(none)</td>
1011   <td>
1012    Add the <a href="https://kubernetes.io/docs/concepts/storage/volumes/">Kubernetes Volume</a> named <code>VolumeName</code> of the <code>VolumeType</code> type to the executor pod on the path specified in the value. For example,
1013    <code>spark.kubernetes.executor.volumes.persistentVolumeClaim.checkpointpvc.mount.path=/checkpoint</code>.
1014   </td>
1015   <td>2.4.0</td>
1016 </tr>
1017 <tr>
1018   <td><code>spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPath</code></td>
1019   <td>(none)</td>
1020   <td>
1021    Specifies a <a href="https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath">subpath</a> to be mounted from the volume into the executor pod.
1022    <code>spark.kubernetes.executor.volumes.persistentVolumeClaim.checkpointpvc.mount.subPath=checkpoint</code>.
1023   </td>
1024   <td>3.0.0</td>
1025 </tr>
1026 <tr>
1027   <td><code>spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.readOnly</code></td>
1028   <td>false</td>
1029   <td>
1030    Specify if the mounted volume is read only or not. For example,
1031    <code>spark.kubernetes.executor.volumes.persistentVolumeClaim.checkpointpvc.mount.readOnly=false</code>.
1032   </td>
1033   <td>2.4.0</td>
1034 </tr>
1035 <tr>
1036   <td><code>spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].options.[OptionName]</code></td>
1037   <td>(none)</td>
1038   <td>
1039    Configure <a href="https://kubernetes.io/docs/concepts/storage/volumes/">Kubernetes Volume</a> options passed to the Kubernetes with <code>OptionName</code> as key having specified value. For example,
1040    <code>spark.kubernetes.executor.volumes.persistentVolumeClaim.checkpointpvc.options.claimName=spark-pvc-claim</code>.
1041   </td>
1042   <td>2.4.0</td>
1043 </tr>
1044 <tr>
1045   <td><code>spark.kubernetes.local.dirs.tmpfs</code></td>
1046   <td><code>false</code></td>
1047   <td>
1048    Configure the <code>emptyDir</code> volumes used to back <code>SPARK_LOCAL_DIRS</code> within the Spark driver and executor pods to use <code>tmpfs</code> backing i.e. RAM.  See <a href="#local-storage">Local Storage</a> earlier on this page
1049    for more discussion of this.
1050   </td>
1051   <td>3.0.0</td>
1052 </tr>
1053 <tr>
1054   <td><code>spark.kubernetes.memoryOverheadFactor</code></td>
1055   <td><code>0.1</code></td>
1056   <td>
1057     This sets the Memory Overhead Factor that will allocate memory to non-JVM memory, which includes off-heap memory allocations, non-JVM tasks, and various systems processes. For JVM-based jobs this value will default to 0.10 and 0.40 for non-JVM jobs.
1058     This is done as non-JVM tasks need more non-JVM heap space and such tasks commonly fail with "Memory Overhead Exceeded" errors. This prempts this error with a higher default.
1059   </td>
1060   <td>2.4.0</td>
1061 </tr>
1062 <tr>
1063   <td><code>spark.kubernetes.pyspark.pythonVersion</code></td>
1064   <td><code>"3"</code></td>
1065   <td>
1066    This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3.
1067   </td>
1068   <td>2.4.0</td>
1069 </tr>
1070 <tr>
1071   <td><code>spark.kubernetes.kerberos.krb5.path</code></td>
1072   <td><code>(none)</code></td>
1073   <td>
1074    Specify the local location of the krb5.conf file to be mounted on the driver and executors for Kerberos interaction.
1075    It is important to note that the KDC defined needs to be visible from inside the containers.
1076   </td>
1077   <td>3.0.0</td>
1078 </tr>
1079 <tr>
1080   <td><code>spark.kubernetes.kerberos.krb5.configMapName</code></td>
1081   <td><code>(none)</code></td>
1082   <td>
1083    Specify the name of the ConfigMap, containing the krb5.conf file, to be mounted on the driver and executors
1084    for Kerberos interaction. The KDC defined needs to be visible from inside the containers. The ConfigMap must also
1085    be in the same namespace of the driver and executor pods.
1086   </td>
1087   <td>3.0.0</td>
1088 </tr>
1089 <tr>
1090   <td><code>spark.kubernetes.hadoop.configMapName</code></td>
1091   <td><code>(none)</code></td>
1092   <td>
1093     Specify the name of the ConfigMap, containing the HADOOP_CONF_DIR files, to be mounted on the driver
1094     and executors for custom Hadoop configuration.
1095   </td>
1096   <td>3.0.0</td>
1097 </tr>
1098 <tr>
1099   <td><code>spark.kubernetes.kerberos.tokenSecret.name</code></td>
1100   <td><code>(none)</code></td>
1101   <td>
1102     Specify the name of the secret where your existing delegation tokens are stored. This removes the need for the job user
1103     to provide any kerberos credentials for launching a job.
1104   </td>
1105   <td>3.0.0</td>
1106 </tr>
1107 <tr>
1108   <td><code>spark.kubernetes.kerberos.tokenSecret.itemKey</code></td>
1109   <td><code>(none)</code></td>
1110   <td>
1111     Specify the item key of the data where your existing delegation tokens are stored. This removes the need for the job user
1112     to provide any kerberos credentials for launching a job.
1113   </td>
1114   <td>3.0.0</td>
1115 </tr>
1116 <tr>
1117   <td><code>spark.kubernetes.driver.podTemplateFile</code></td>
1118   <td>(none)</td>
1119   <td>
1120    Specify the local file that contains the driver <a href="#pod-template">pod template</a>. For example
1121    <code>spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml</code>
1122   </td>
1123   <td>3.0.0</td>
1124 </tr>
1125 <tr>
1126   <td><code>spark.kubernetes.driver.podTemplateContainerName</code></td>
1127   <td>(none)</td>
1128   <td>
1129    Specify the container name to be used as a basis for the driver in the given <a href="#pod-template">pod template</a>.
1130    For example <code>spark.kubernetes.driver.podTemplateContainerName=spark-driver</code>
1131   </td>
1132   <td>3.0.0</td>
1133 </tr>
1134 <tr>
1135   <td><code>spark.kubernetes.executor.podTemplateFile</code></td>
1136   <td>(none)</td>
1137   <td>
1138    Specify the local file that contains the executor <a href="#pod-template">pod template</a>. For example
1139    <code>spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml</code>
1140   </td>
1141   <td>3.0.0</td>
1142 </tr>
1143 <tr>
1144   <td><code>spark.kubernetes.executor.podTemplateContainerName</code></td>
1145   <td>(none)</td>
1146   <td>
1147    Specify the container name to be used as a basis for the executor in the given <a href="#pod-template">pod template</a>.
1148    For example <code>spark.kubernetes.executor.podTemplateContainerName=spark-executor</code>
1149   </td>
1150   <td>3.0.0</td>
1151 </tr>
1152 <tr>
1153   <td><code>spark.kubernetes.executor.deleteOnTermination</code></td>
1154   <td>true</td>
1155   <td>
1156   Specify whether executor pods should be deleted in case of failure or normal termination.
1157   </td>
1158   <td>3.0.0</td>
1159 </tr>
1160 <tr>
1161   <td><code>spark.kubernetes.submission.connectionTimeout</code></td>
1162   <td>10000</td>
1163   <td>
1164     Connection timeout in milliseconds for the kubernetes client to use for starting the driver.
1165   </td>
1166   <td>3.0.0</td>
1167 </tr>
1168 <tr>
1169   <td><code>spark.kubernetes.submission.requestTimeout</code></td>
1170   <td>10000</td>
1171   <td>
1172     Request timeout in milliseconds for the kubernetes client to use for starting the driver.
1173   </td>
1174   <td>3.0.0</td>
1175 </tr>
1176 <tr>
1177   <td><code>spark.kubernetes.driver.connectionTimeout</code></td>
1178   <td>10000</td>
1179   <td>
1180     Connection timeout in milliseconds for the kubernetes client in driver to use when requesting executors.
1181   </td>
1182   <td>3.0.0</td>
1183 </tr>
1184 <tr>
1185   <td><code>spark.kubernetes.driver.requestTimeout</code></td>
1186   <td>10000</td>
1187   <td>
1188     Request timeout in milliseconds for the kubernetes client in driver to use when requesting executors.
1189   </td>
1190   <td>3.0.0</td>
1191 </tr>
1192 <tr>
1193   <td><code>spark.kubernetes.appKillPodDeletionGracePeriod</code></td>
1194   <td>(none)</td>
1195   <td>
1196   Specify the grace period in seconds when deleting a Spark application using spark-submit.
1197   </td>
1198   <td>3.0.0</td>
1199 </tr>
1200 <tr>
1201   <td><code>spark.kubernetes.file.upload.path</code></td>
1202   <td>(none)</td>
1203   <td>
1204     Path to store files at the spark submit side in cluster mode. For example:
1205     <code>spark.kubernetes.file.upload.path=s3a://&lt;s3-bucket&gt;/path</code>
1206     File should specified as <code>file://path/to/file </code> or absolute path.
1207   </td>
1208   <td>3.0.0</td>
1209 </tr>
1210 </table>
1211
1212 #### Pod template properties
1213
1214 See the below table for the full list of pod specifications that will be overwritten by spark.
1215
1216 ### Pod Metadata
1217
1218 <table class="table">
1219 <tr><th>Pod metadata key</th><th>Modified value</th><th>Description</th></tr>
1220 <tr>
1221   <td>name</td>
1222   <td>Value of <code>spark.kubernetes.driver.pod.name</code></td>
1223   <td>
1224     The driver pod name will be overwritten with either the configured or default value of
1225     <code>spark.kubernetes.driver.pod.name</code>. The executor pod names will be unaffected.
1226   </td>
1227 </tr>
1228 <tr>
1229   <td>namespace</td>
1230   <td>Value of <code>spark.kubernetes.namespace</code></td>
1231   <td>
1232     Spark makes strong assumptions about the driver and executor namespaces. Both driver and executor namespaces will
1233     be replaced by either the configured or default spark conf value.
1234   </td>
1235 </tr>
1236 <tr>
1237   <td>labels</td>
1238   <td>Adds the labels from <code>spark.kubernetes.{driver,executor}.label.*</code></td>
1239   <td>
1240     Spark will add additional labels specified by the spark configuration.
1241   </td>
1242 </tr>
1243 <tr>
1244   <td>annotations</td>
1245   <td>Adds the annotations from <code>spark.kubernetes.{driver,executor}.annotation.*</code></td>
1246   <td>
1247     Spark will add additional annotations specified by the spark configuration.
1248   </td>
1249 </tr>
1250 </table>
1251
1252 ### Pod Spec
1253
1254 <table class="table">
1255 <tr><th>Pod spec key</th><th>Modified value</th><th>Description</th></tr>
1256 <tr>
1257   <td>imagePullSecrets</td>
1258   <td>Adds image pull secrets from <code>spark.kubernetes.container.image.pullSecrets</code></td>
1259   <td>
1260     Additional pull secrets will be added from the spark configuration to both executor pods.
1261   </td>
1262 </tr>
1263 <tr>
1264   <td>nodeSelector</td>
1265   <td>Adds node selectors from <code>spark.kubernetes.node.selector.*</code></td>
1266   <td>
1267     Additional node selectors will be added from the spark configuration to both executor pods.
1268   </td>
1269 </tr>
1270 <tr>
1271   <td>restartPolicy</td>
1272   <td><code>"never"</code></td>
1273   <td>
1274     Spark assumes that both drivers and executors never restart.
1275   </td>
1276 </tr>
1277 <tr>
1278   <td>serviceAccount</td>
1279   <td>Value of <code>spark.kubernetes.authenticate.driver.serviceAccountName</code></td>
1280   <td>
1281     Spark will override <code>serviceAccount</code> with the value of the spark configuration for only
1282     driver pods, and only if the spark configuration is specified. Executor pods will remain unaffected.
1283   </td>
1284 </tr>
1285 <tr>
1286   <td>serviceAccountName</td>
1287   <td>Value of <code>spark.kubernetes.authenticate.driver.serviceAccountName</code></td>
1288   <td>
1289     Spark will override <code>serviceAccountName</code> with the value of the spark configuration for only
1290     driver pods, and only if the spark configuration is specified. Executor pods will remain unaffected.
1291   </td>
1292 </tr>
1293 <tr>
1294   <td>volumes</td>
1295   <td>Adds volumes from <code>spark.kubernetes.{driver,executor}.volumes.[VolumeType].[VolumeName].mount.path</code></td>
1296   <td>
1297     Spark will add volumes as specified by the spark conf, as well as additional volumes necessary for passing
1298     spark conf and pod template files.
1299   </td>
1300 </tr>
1301 </table>
1302
1303 ### Container spec
1304
1305 The following affect the driver and executor containers. All other containers in the pod spec will be unaffected.
1306
1307 <table class="table">
1308 <tr><th>Container spec key</th><th>Modified value</th><th>Description</th></tr>
1309 <tr>
1310   <td>env</td>
1311   <td>Adds env variables from <code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
1312   <td>
1313     Spark will add driver env variables from <code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code>, and
1314     executor env variables from <code>spark.executorEnv.[EnvironmentVariableName]</code>.
1315   </td>
1316 </tr>
1317 <tr>
1318   <td>image</td>
1319   <td>Value of <code>spark.kubernetes.{driver,executor}.container.image</code></td>
1320   <td>
1321     The image will be defined by the spark configurations.
1322   </td>
1323 </tr>
1324 <tr>
1325   <td>imagePullPolicy</td>
1326   <td>Value of <code>spark.kubernetes.container.image.pullPolicy</code></td>
1327   <td>
1328     Spark will override the pull policy for both driver and executors.
1329   </td>
1330 </tr>
1331 <tr>
1332   <td>name</td>
1333   <td>See description</td>
1334   <td>
1335     The container name will be assigned by spark ("spark-kubernetes-driver" for the driver container, and
1336     "executor" for each executor container) if not defined by the pod template. If the container is defined by the
1337     template, the template's name will be used.
1338   </td>
1339 </tr>
1340 <tr>
1341   <td>resources</td>
1342   <td>See description</td>
1343   <td>
1344     The cpu limits are set by <code>spark.kubernetes.{driver,executor}.limit.cores</code>. The cpu is set by
1345     <code>spark.{driver,executor}.cores</code>. The memory request and limit are set by summing the values of
1346     <code>spark.{driver,executor}.memory</code> and <code>spark.{driver,executor}.memoryOverhead</code>.
1347     Other resource limits are set by <code>spark.{driver,executor}.resources.{resourceName}.*</code> configs.
1348   </td>
1349 </tr>
1350 <tr>
1351   <td>volumeMounts</td>
1352   <td>Add volumes from <code>spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.{path,readOnly}</code></td>
1353   <td>
1354     Spark will add volumes as specified by the spark conf, as well as additional volumes necessary for passing
1355     spark conf and pod template files.
1356   </td>
1357 </tr>
1358 </table>
1359
1360 ### Resource Allocation and Configuration Overview
1361
1362 Please make sure to have read the Custom Resource Scheduling and Configuration Overview section on the [configuration page](configuration.html). This section only talks about the Kubernetes specific aspects of resource scheduling.
1363
1364 The user is responsible to properly configuring the Kubernetes cluster to have the resources available and ideally isolate each resource per container so that a resource is not shared between multiple containers. If the resource is not isolated the user is responsible for writing a discovery script so that the resource is not shared between containers. See the Kubernetes documentation for specifics on configuring Kubernetes with [custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/).
1365
1366 Spark automatically handles translating the Spark configs <code>spark.{driver/executor}.resource.{resourceType}</code> into the kubernetes configs as long as the Kubernetes resource type follows the Kubernetes device plugin format of `vendor-domain/resourcetype`. The user must specify the vendor using the <code>spark.{driver/executor}.resource.{resourceType}.vendor</code> config. The user does not need to explicitly add anything if you are using Pod templates. For reference and an example, you can see the Kubernetes documentation for scheduling [GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/). Spark only supports setting the resource limits.
1367
1368 Kubernetes does not tell Spark the addresses of the resources allocated to each container. For that reason, the user must specify a discovery script that gets run by the executor on startup to discover what resources are available to that executor. You can find an example scripts in `examples/src/main/scripts/getGpusResources.sh`. The script must have execute permissions set and the user should setup permissions to not allow malicious users to modify it. The script should write to STDOUT a JSON string in the format of the ResourceInformation class. This has the resource name and an array of resource addresses available to just that executor.
1369