0001 ---
0002 layout: global
0003 title: Running Spark on Mesos
0004 license: |
0005 Licensed to the Apache Software Foundation (ASF) under one or more
0006 contributor license agreements. See the NOTICE file distributed with
0007 this work for additional information regarding copyright ownership.
0008 The ASF licenses this file to You under the Apache License, Version 2.0
0009 (the "License"); you may not use this file except in compliance with
0010 the License. You may obtain a copy of the License at
0011
0012 http://www.apache.org/licenses/LICENSE-2.0
0013
0014 Unless required by applicable law or agreed to in writing, software
0015 distributed under the License is distributed on an "AS IS" BASIS,
0016 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0017 See the License for the specific language governing permissions and
0018 limitations under the License.
0019 ---
0020 * This will become a table of contents (this text will be scraped).
0021 {:toc}
0022
0023 Spark can run on hardware clusters managed by [Apache Mesos](http://mesos.apache.org/).
0024
0025 The advantages of deploying Spark with Mesos include:
0026
0027 - dynamic partitioning between Spark and other
0028 [frameworks](https://mesos.apache.org/documentation/latest/frameworks/)
0029 - scalable partitioning between multiple instances of Spark
0030
0031 # Security
0032
0033 Security in Spark is OFF by default. This could mean you are vulnerable to attack by default.
0034 Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark.
0035
0036 # How it Works
0037
0038 In a standalone cluster deployment, the cluster manager in the below diagram is a Spark master
0039 instance. When using Mesos, the Mesos master replaces the Spark master as the cluster manager.
0040
0041 <p style="text-align: center;">
0042 <img src="img/cluster-overview.png" title="Spark cluster components" alt="Spark cluster components" />
0043 </p>
0044
0045 Now when a driver creates a job and starts issuing tasks for scheduling, Mesos determines what
0046 machines handle what tasks. Because it takes into account other frameworks when scheduling these
0047 many short-lived tasks, multiple frameworks can coexist on the same cluster without resorting to a
0048 static partitioning of resources.
0049
0050 To get started, follow the steps below to install Mesos and deploy Spark jobs via Mesos.
0051
0052
0053 # Installing Mesos
0054
0055 Spark {{site.SPARK_VERSION}} is designed for use with Mesos {{site.MESOS_VERSION}} or newer and does not
0056 require any special patches of Mesos. File and environment-based secrets support requires Mesos 1.3.0 or
0057 newer.
0058
0059 If you already have a Mesos cluster running, you can skip this Mesos installation step.
0060
0061 Otherwise, installing Mesos for Spark is no different than installing Mesos for use by other
0062 frameworks. You can install Mesos either from source or using prebuilt packages.
0063
0064 ## From Source
0065
0066 To install Apache Mesos from source, follow these steps:
0067
0068 1. Download a Mesos release from a
0069 [mirror](http://www.apache.org/dyn/closer.lua/mesos/{{site.MESOS_VERSION}}/)
0070 2. Follow the Mesos [Getting Started](http://mesos.apache.org/getting-started) page for compiling and
0071 installing Mesos
0072
0073 **Note:** If you want to run Mesos without installing it into the default paths on your system
0074 (e.g., if you lack administrative privileges to install it), pass the
0075 `--prefix` option to `configure` to tell it where to install. For example, pass
0076 `--prefix=/home/me/mesos`. By default the prefix is `/usr/local`.
0077
0078 ## Third-Party Packages
0079
0080 The Apache Mesos project only publishes source releases, not binary packages. But other
0081 third party projects publish binary releases that may be helpful in setting Mesos up.
0082
0083 One of those is Mesosphere. To install Mesos using the binary releases provided by Mesosphere:
0084
0085 1. Download Mesos installation package from [downloads page](https://open.mesosphere.com/downloads/mesos/)
0086 2. Follow their instructions for installation and configuration
0087
0088 The Mesosphere installation documents suggest setting up ZooKeeper to handle Mesos master failover,
0089 but Mesos can be run without ZooKeeper using a single master as well.
0090
0091 ## Verification
0092
0093 To verify that the Mesos cluster is ready for Spark, navigate to the Mesos master webui at port
0094 `:5050` Confirm that all expected machines are present in the slaves tab.
0095
0096
0097 # Connecting Spark to Mesos
0098
0099 To use Mesos from Spark, you need a Spark binary package available in a place accessible by Mesos, and
0100 a Spark driver program configured to connect to Mesos.
0101
0102 Alternatively, you can also install Spark in the same location in all the Mesos slaves, and configure
0103 `spark.mesos.executor.home` (defaults to SPARK_HOME) to point to that location.
0104
0105 ## Authenticating to Mesos
0106
0107 When Mesos Framework authentication is enabled it is necessary to provide a principal and secret by which to authenticate Spark to Mesos. Each Spark job will register with Mesos as a separate framework.
0108
0109 Depending on your deployment environment you may wish to create a single set of framework credentials that are shared across all users or create framework credentials for each user. Creating and managing framework credentials should be done following the Mesos [Authentication documentation](http://mesos.apache.org/documentation/latest/authentication/).
0110
0111 Framework credentials may be specified in a variety of ways depending on your deployment environment and security requirements. The most simple way is to specify the `spark.mesos.principal` and `spark.mesos.secret` values directly in your Spark configuration. Alternatively you may specify these values indirectly by instead specifying `spark.mesos.principal.file` and `spark.mesos.secret.file`, these settings point to files containing the principal and secret. These files must be plaintext files in UTF-8 encoding. Combined with appropriate file ownership and mode/ACLs this provides a more secure way to specify these credentials.
0112
0113 Additionally, if you prefer to use environment variables you can specify all of the above via environment variables instead, the environment variable names are simply the configuration settings uppercased with `.` replaced with `_` e.g. `SPARK_MESOS_PRINCIPAL`.
0114
0115 ### Credential Specification Preference Order
0116
0117 Please note that if you specify multiple ways to obtain the credentials then the following preference order applies. Spark will use the first valid value found and any subsequent values are ignored:
0118
0119 - `spark.mesos.principal` configuration setting
0120 - `SPARK_MESOS_PRINCIPAL` environment variable
0121 - `spark.mesos.principal.file` configuration setting
0122 - `SPARK_MESOS_PRINCIPAL_FILE` environment variable
0123
0124 An equivalent order applies for the secret. Essentially we prefer the configuration to be specified directly rather than indirectly by files, and we prefer that configuration settings are used over environment variables.
0125
0126 ### Deploy to a Mesos running on Secure Sockets
0127
0128 If you want to deploy a Spark Application into a Mesos cluster that is running in a secure mode there are some environment variables that need to be set.
0129
0130 - `LIBPROCESS_SSL_ENABLED=true` enables SSL communication
0131 - `LIBPROCESS_SSL_VERIFY_CERT=false` verifies the ssl certificate
0132 - `LIBPROCESS_SSL_KEY_FILE=pathToKeyFile.key` path to key
0133 - `LIBPROCESS_SSL_CERT_FILE=pathToCRTFile.crt` the certificate file to be used
0134
0135 All options can be found at http://mesos.apache.org/documentation/latest/ssl/
0136
0137 Then submit happens as described in Client mode or Cluster mode below
0138
0139 ## Uploading Spark Package
0140
0141 When Mesos runs a task on a Mesos slave for the first time, that slave must have a Spark binary
0142 package for running the Spark Mesos executor backend.
0143 The Spark package can be hosted at any Hadoop-accessible URI, including HTTP via `http://`,
0144 [Amazon Simple Storage Service](http://aws.amazon.com/s3) via `s3n://`, or HDFS via `hdfs://`.
0145
0146 To use a precompiled package:
0147
0148 1. Download a Spark binary package from the Spark [download page](https://spark.apache.org/downloads.html)
0149 2. Upload to hdfs/http/s3
0150
0151 To host on HDFS, use the Hadoop fs put command: `hadoop fs -put spark-{{site.SPARK_VERSION}}.tar.gz
0152 /path/to/spark-{{site.SPARK_VERSION}}.tar.gz`
0153
0154
0155 Or if you are using a custom-compiled version of Spark, you will need to create a package using
0156 the `dev/make-distribution.sh` script included in a Spark source tarball/checkout.
0157
0158 1. Download and build Spark using the instructions [here](index.html)
0159 2. Create a binary package using `./dev/make-distribution.sh --tgz`.
0160 3. Upload archive to http/s3/hdfs
0161
0162
0163 ## Using a Mesos Master URL
0164
0165 The Master URLs for Mesos are in the form `mesos://host:5050` for a single-master Mesos
0166 cluster, or `mesos://zk://host1:2181,host2:2181,host3:2181/mesos` for a multi-master Mesos cluster using ZooKeeper.
0167
0168 ## Client Mode
0169
0170 In client mode, a Spark Mesos framework is launched directly on the client machine and waits for the driver output.
0171
0172 The driver needs some configuration in `spark-env.sh` to interact properly with Mesos:
0173
0174 1. In `spark-env.sh` set some environment variables:
0175 * `export MESOS_NATIVE_JAVA_LIBRARY=<path to libmesos.so>`. This path is typically
0176 `<prefix>/lib/libmesos.so` where the prefix is `/usr/local` by default. See Mesos installation
0177 instructions above. On Mac OS X, the library is called `libmesos.dylib` instead of
0178 `libmesos.so`.
0179 * `export SPARK_EXECUTOR_URI=<URL of spark-{{site.SPARK_VERSION}}.tar.gz uploaded above>`.
0180 2. Also set `spark.executor.uri` to `<URL of spark-{{site.SPARK_VERSION}}.tar.gz>`.
0181
0182 Now when starting a Spark application against the cluster, pass a `mesos://`
0183 URL as the master when creating a `SparkContext`. For example:
0184
0185 {% highlight scala %}
0186 val conf = new SparkConf()
0187 .setMaster("mesos://HOST:5050")
0188 .setAppName("My app")
0189 .set("spark.executor.uri", "<path to spark-{{site.SPARK_VERSION}}.tar.gz uploaded above>")
0190 val sc = new SparkContext(conf)
0191 {% endhighlight %}
0192
0193 (You can also use [`spark-submit`](submitting-applications.html) and configure `spark.executor.uri`
0194 in the [conf/spark-defaults.conf](configuration.html#loading-default-configurations) file.)
0195
0196 When running a shell, the `spark.executor.uri` parameter is inherited from `SPARK_EXECUTOR_URI`, so
0197 it does not need to be redundantly passed in as a system property.
0198
0199 {% highlight bash %}
0200 ./bin/spark-shell --master mesos://host:5050
0201 {% endhighlight %}
0202
0203 ## Cluster mode
0204
0205 Spark on Mesos also supports cluster mode, where the driver is launched in the cluster and the client
0206 can find the results of the driver from the Mesos Web UI.
0207
0208 To use cluster mode, you must start the `MesosClusterDispatcher` in your cluster via the `sbin/start-mesos-dispatcher.sh` script,
0209 passing in the Mesos master URL (e.g: mesos://host:5050). This starts the `MesosClusterDispatcher` as a daemon running on the host.
0210 Note that the `MesosClusterDispatcher` does not support authentication. You should ensure that all network access to it is
0211 protected (port 7077 by default).
0212
0213 By setting the Mesos proxy config property (requires mesos version >= 1.4), `--conf spark.mesos.proxy.baseURL=http://localhost:5050` when launching the dispatcher, the mesos sandbox URI for each driver is added to the mesos dispatcher UI.
0214
0215 If you like to run the `MesosClusterDispatcher` with Marathon, you need to run the `MesosClusterDispatcher` in the foreground (i.e: `./bin/spark-class org.apache.spark.deploy.mesos.MesosClusterDispatcher`). Note that the `MesosClusterDispatcher` not yet supports multiple instances for HA.
0216
0217 The `MesosClusterDispatcher` also supports writing recovery state into Zookeeper. This will allow the `MesosClusterDispatcher` to be able to recover all submitted and running containers on relaunch. In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env by configuring `spark.deploy.recoveryMode` and related spark.deploy.zookeeper.* configurations.
0218 For more information about these configurations please refer to the configurations [doc](configuration.html#deploy).
0219
0220 You can also specify any additional jars required by the `MesosClusterDispatcher` in the classpath by setting the environment variable SPARK_DAEMON_CLASSPATH in spark-env.
0221
0222 From the client, you can submit a job to Mesos cluster by running `spark-submit` and specifying the master URL
0223 to the URL of the `MesosClusterDispatcher` (e.g: mesos://dispatcher:7077). You can view driver statuses on the
0224 Spark cluster Web UI.
0225
0226 For example:
0227 {% highlight bash %}
0228 ./bin/spark-submit \
0229 --class org.apache.spark.examples.SparkPi \
0230 --master mesos://207.184.161.138:7077 \
0231 --deploy-mode cluster \
0232 --supervise \
0233 --executor-memory 20G \
0234 --total-executor-cores 100 \
0235 http://path/to/examples.jar \
0236 1000
0237 {% endhighlight %}
0238
0239
0240 Note that jars or python files that are passed to spark-submit should be URIs reachable by Mesos slaves, as the Spark driver doesn't automatically upload local jars.
0241
0242 # Mesos Run Modes
0243
0244 Spark can run over Mesos in two modes: "coarse-grained" (default) and
0245 "fine-grained" (deprecated).
0246
0247 ## Coarse-Grained
0248
0249 In "coarse-grained" mode, each Spark executor runs as a single Mesos
0250 task. Spark executors are sized according to the following
0251 configuration variables:
0252
0253 * Executor memory: `spark.executor.memory`
0254 * Executor cores: `spark.executor.cores`
0255 * Number of executors: `spark.cores.max`/`spark.executor.cores`
0256
0257 Please see the [Spark Configuration](configuration.html) page for
0258 details and default values.
0259
0260 Executors are brought up eagerly when the application starts, until
0261 `spark.cores.max` is reached. If you don't set `spark.cores.max`, the
0262 Spark application will consume all resources offered to it by Mesos,
0263 so we, of course, urge you to set this variable in any sort of
0264 multi-tenant cluster, including one which runs multiple concurrent
0265 Spark applications.
0266
0267 The scheduler will start executors round-robin on the offers Mesos
0268 gives it, but there are no spread guarantees, as Mesos does not
0269 provide such guarantees on the offer stream.
0270
0271 In this mode Spark executors will honor port allocation if such is
0272 provided from the user. Specifically, if the user defines
0273 `spark.blockManager.port` in Spark configuration,
0274 the mesos scheduler will check the available offers for a valid port
0275 range containing the port numbers. If no such range is available it will
0276 not launch any task. If no restriction is imposed on port numbers by the
0277 user, ephemeral ports are used as usual. This port honouring implementation
0278 implies one task per host if the user defines a port. In the future network,
0279 isolation shall be supported.
0280
0281 The benefit of coarse-grained mode is much lower startup overhead, but
0282 at the cost of reserving Mesos resources for the complete duration of
0283 the application. To configure your job to dynamically adjust to its
0284 resource requirements, look into
0285 [Dynamic Allocation](#dynamic-resource-allocation-with-mesos).
0286
0287 ## Fine-Grained (deprecated)
0288
0289 **NOTE:** Fine-grained mode is deprecated as of Spark 2.0.0. Consider
0290 using [Dynamic Allocation](#dynamic-resource-allocation-with-mesos)
0291 for some of the benefits. For a full explanation see
0292 [SPARK-11857](https://issues.apache.org/jira/browse/SPARK-11857)
0293
0294 In "fine-grained" mode, each Spark task inside the Spark executor runs
0295 as a separate Mesos task. This allows multiple instances of Spark (and
0296 other frameworks) to share cores at a very fine granularity, where
0297 each application gets more or fewer cores as it ramps up and down, but
0298 it comes with an additional overhead in launching each task. This mode
0299 may be inappropriate for low-latency requirements like interactive
0300 queries or serving web requests.
0301
0302 Note that while Spark tasks in fine-grained will relinquish cores as
0303 they terminate, they will not relinquish memory, as the JVM does not
0304 give memory back to the Operating System. Neither will executors
0305 terminate when they're idle.
0306
0307 To run in fine-grained mode, set the `spark.mesos.coarse` property to false in your
0308 [SparkConf](configuration.html#spark-properties):
0309
0310 {% highlight scala %}
0311 conf.set("spark.mesos.coarse", "false")
0312 {% endhighlight %}
0313
0314 You may also make use of `spark.mesos.constraints` to set
0315 attribute-based constraints on Mesos resource offers. By default, all
0316 resource offers will be accepted.
0317
0318 {% highlight scala %}
0319 conf.set("spark.mesos.constraints", "os:centos7;us-east-1:false")
0320 {% endhighlight %}
0321
0322 For example, Let's say `spark.mesos.constraints` is set to `os:centos7;us-east-1:false`, then the resource offers will
0323 be checked to see if they meet both these constraints and only then will be accepted to start new executors.
0324
0325 To constrain where driver tasks are run, use `spark.mesos.driver.constraints`
0326
0327 # Mesos Docker Support
0328
0329 Spark can make use of a Mesos Docker containerizer by setting the property `spark.mesos.executor.docker.image`
0330 in your [SparkConf](configuration.html#spark-properties).
0331
0332 The Docker image used must have an appropriate version of Spark already part of the image, or you can
0333 have Mesos download Spark via the usual methods.
0334
0335 Requires Mesos version 0.20.1 or later.
0336
0337 Note that by default Mesos agents will not pull the image if it already exists on the agent. If you use mutable image
0338 tags you can set `spark.mesos.executor.docker.forcePullImage` to `true` in order to force the agent to always pull the
0339 image before running the executor. Force pulling images is only available in Mesos version 0.22 and above.
0340
0341 # Running Alongside Hadoop
0342
0343 You can run Spark and Mesos alongside your existing Hadoop cluster by just launching them as a
0344 separate service on the machines. To access Hadoop data from Spark, a full `hdfs://` URL is required
0345 (typically `hdfs://<namenode>:9000/path`, but you can find the right URL on your Hadoop Namenode web
0346 UI).
0347
0348 In addition, it is possible to also run Hadoop MapReduce on Mesos for better resource isolation and
0349 sharing between the two. In this case, Mesos will act as a unified scheduler that assigns cores to
0350 either Hadoop or Spark, as opposed to having them share resources via the Linux scheduler on each
0351 node. Please refer to [Hadoop on Mesos](https://github.com/mesos/hadoop).
0352
0353 In either case, HDFS runs separately from Hadoop MapReduce, without being scheduled through Mesos.
0354
0355 # Dynamic Resource Allocation with Mesos
0356
0357 Mesos supports dynamic allocation only with coarse-grained mode, which can resize the number of
0358 executors based on statistics of the application. For general information,
0359 see [Dynamic Resource Allocation](job-scheduling.html#dynamic-resource-allocation).
0360
0361 The External Shuffle Service to use is the Mesos Shuffle Service. It provides shuffle data cleanup functionality
0362 on top of the Shuffle Service since Mesos doesn't yet support notifying another framework's
0363 termination. To launch it, run `$SPARK_HOME/sbin/start-mesos-shuffle-service.sh` on all slave nodes, with `spark.shuffle.service.enabled` set to `true`.
0364
0365 This can also be achieved through Marathon, using a unique host constraint, and the following command: `./bin/spark-class org.apache.spark.deploy.mesos.MesosExternalShuffleService`.
0366
0367 # Configuration
0368
0369 See the [configuration page](configuration.html) for information on Spark configurations. The following configs are specific for Spark on Mesos.
0370
0371 #### Spark Properties
0372
0373 <table class="table">
0374 <tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
0375 <tr>
0376 <td><code>spark.mesos.coarse</code></td>
0377 <td>true</td>
0378 <td>
0379 If set to <code>true</code>, runs over Mesos clusters in "coarse-grained" sharing mode, where Spark acquires one long-lived Mesos task on each machine.
0380 If set to <code>false</code>, runs over Mesos cluster in "fine-grained" sharing mode, where one Mesos task is created per Spark task.
0381 Detailed information in <a href="running-on-mesos.html#mesos-run-modes">'Mesos Run Modes'</a>.
0382 </td>
0383 <td>0.6.0</td>
0384 </tr>
0385 <tr>
0386 <td><code>spark.mesos.extra.cores</code></td>
0387 <td><code>0</code></td>
0388 <td>
0389 Set the extra number of cores for an executor to advertise. This
0390 does not result in more cores allocated. It instead means that an
0391 executor will "pretend" it has more cores, so that the driver will
0392 send it more tasks. Use this to increase parallelism. This
0393 setting is only used for Mesos coarse-grained mode.
0394 </td>
0395 <td>0.6.0</td>
0396 </tr>
0397 <tr>
0398 <td><code>spark.mesos.mesosExecutor.cores</code></td>
0399 <td><code>1.0</code></td>
0400 <td>
0401 (Fine-grained mode only) Number of cores to give each Mesos executor. This does not
0402 include the cores used to run the Spark tasks. In other words, even if no Spark task
0403 is being run, each Mesos executor will occupy the number of cores configured here.
0404 The value can be a floating point number.
0405 </td>
0406 <td>1.4.0</td>
0407 </tr>
0408 <tr>
0409 <td><code>spark.mesos.executor.docker.image</code></td>
0410 <td>(none)</td>
0411 <td>
0412 Set the name of the docker image that the Spark executors will run in. The selected
0413 image must have Spark installed, as well as a compatible version of the Mesos library.
0414 The installed path of Spark in the image can be specified with <code>spark.mesos.executor.home</code>;
0415 the installed path of the Mesos library can be specified with <code>spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY</code>.
0416 </td>
0417 <td>1.4.0</td>
0418 </tr>
0419 <tr>
0420 <td><code>spark.mesos.executor.docker.forcePullImage</code></td>
0421 <td>false</td>
0422 <td>
0423 Force Mesos agents to pull the image specified in <code>spark.mesos.executor.docker.image</code>.
0424 By default Mesos agents will not pull images they already have cached.
0425 </td>
0426 <td>2.1.0</td>
0427 </tr>
0428 <tr>
0429 <td><code>spark.mesos.executor.docker.parameters</code></td>
0430 <td>(none)</td>
0431 <td>
0432 Set the list of custom parameters which will be passed into the <code>docker run</code> command when launching the Spark executor on Mesos using the docker containerizer. The format of this property is a comma-separated list of
0433 key/value pairs. Example:
0434
0435 <pre>key1=val1,key2=val2,key3=val3</pre>
0436 </td>
0437 <td>2.2.0</td>
0438 </tr>
0439 <tr>
0440 <td><code>spark.mesos.executor.docker.volumes</code></td>
0441 <td>(none)</td>
0442 <td>
0443 Set the list of volumes which will be mounted into the Docker image, which was set using
0444 <code>spark.mesos.executor.docker.image</code>. The format of this property is a comma-separated list of
0445 mappings following the form passed to <code>docker run -v</code>. That is they take the form:
0446
0447 <pre>[host_path:]container_path[:ro|:rw]</pre>
0448 </td>
0449 <td>1.4.0</td>
0450 </tr>
0451 <tr>
0452 <td><code>spark.mesos.task.labels</code></td>
0453 <td>(none)</td>
0454 <td>
0455 Set the Mesos labels to add to each task. Labels are free-form key-value pairs.
0456 Key-value pairs should be separated by a colon, and commas used to
0457 list more than one. If your label includes a colon or comma, you
0458 can escape it with a backslash. Ex. key:value,key2:a\:b.
0459 </td>
0460 <td>2.2.0</td>
0461 </tr>
0462 <tr>
0463 <td><code>spark.mesos.executor.home</code></td>
0464 <td>driver side <code>SPARK_HOME</code></td>
0465 <td>
0466 Set the directory in which Spark is installed on the executors in Mesos. By default, the
0467 executors will simply use the driver's Spark home directory, which may not be visible to
0468 them. Note that this is only relevant if a Spark binary package is not specified through
0469 <code>spark.executor.uri</code>.
0470 </td>
0471 <td>1.1.1</td>
0472 </tr>
0473 <tr>
0474 <td><code>spark.mesos.executor.memoryOverhead</code></td>
0475 <td>executor memory * 0.10, with minimum of 384</td>
0476 <td>
0477 The amount of additional memory, specified in MiB, to be allocated per executor. By default,
0478 the overhead will be larger of either 384 or 10% of <code>spark.executor.memory</code>. If set,
0479 the final overhead will be this value.
0480 </td>
0481 <td>1.1.1</td>
0482 </tr>
0483 <tr>
0484 <td><code>spark.mesos.uris</code></td>
0485 <td>(none)</td>
0486 <td>
0487 A comma-separated list of URIs to be downloaded to the sandbox
0488 when driver or executor is launched by Mesos. This applies to
0489 both coarse-grained and fine-grained mode.
0490 </td>
0491 <td>1.5.0</td>
0492 </tr>
0493 <tr>
0494 <td><code>spark.mesos.principal</code></td>
0495 <td>(none)</td>
0496 <td>
0497 Set the principal with which Spark framework will use to authenticate with Mesos. You can also specify this via the environment variable `SPARK_MESOS_PRINCIPAL`.
0498 </td>
0499 <td>1.5.0</td>
0500 </tr>
0501 <tr>
0502 <td><code>spark.mesos.principal.file</code></td>
0503 <td>(none)</td>
0504 <td>
0505 Set the file containing the principal with which Spark framework will use to authenticate with Mesos. Allows specifying the principal indirectly in more security conscious deployments. The file must be readable by the user launching the job and be UTF-8 encoded plaintext. You can also specify this via the environment variable `SPARK_MESOS_PRINCIPAL_FILE`.
0506 </td>
0507 <td>2.4.0</td>
0508 </tr>
0509 <tr>
0510 <td><code>spark.mesos.secret</code></td>
0511 <td>(none)</td>
0512 <td>
0513 Set the secret with which Spark framework will use to authenticate with Mesos. Used, for example, when
0514 authenticating with the registry. You can also specify this via the environment variable `SPARK_MESOS_SECRET`.
0515 </td>
0516 <td>1.5.0</td>
0517 </tr>
0518 <tr>
0519 <td><code>spark.mesos.secret.file</code></td>
0520 <td>(none)</td>
0521 <td>
0522 Set the file containing the secret with which Spark framework will use to authenticate with Mesos. Used, for example, when
0523 authenticating with the registry. Allows for specifying the secret indirectly in more security conscious deployments. The file must be readable by the user launching the job and be UTF-8 encoded plaintext. You can also specify this via the environment variable `SPARK_MESOS_SECRET_FILE`.
0524 </td>
0525 <td>2.4.0</td>
0526 </tr>
0527 <tr>
0528 <td><code>spark.mesos.role</code></td>
0529 <td><code>*</code></td>
0530 <td>
0531 Set the role of this Spark framework for Mesos. Roles are used in Mesos for reservations
0532 and resource weight sharing.
0533 </td>
0534 <td>1.5.0</td>
0535 </tr>
0536 <tr>
0537 <td><code>spark.mesos.constraints</code></td>
0538 <td>(none)</td>
0539 <td>
0540 Attribute-based constraints on mesos resource offers. By default, all resource offers will be accepted. This setting
0541 applies only to executors. Refer to <a href="http://mesos.apache.org/documentation/attributes-resources/">Mesos
0542 Attributes & Resources</a> for more information on attributes.
0543 <ul>
0544 <li>Scalar constraints are matched with "less than equal" semantics i.e. value in the constraint must be less than or equal to the value in the resource offer.</li>
0545 <li>Range constraints are matched with "contains" semantics i.e. value in the constraint must be within the resource offer's value.</li>
0546 <li>Set constraints are matched with "subset of" semantics i.e. value in the constraint must be a subset of the resource offer's value.</li>
0547 <li>Text constraints are matched with "equality" semantics i.e. value in the constraint must be exactly equal to the resource offer's value.</li>
0548 <li>In case there is no value present as a part of the constraint any offer with the corresponding attribute will be accepted (without value check).</li>
0549 </ul>
0550 </td>
0551 <td>1.5.0</td>
0552 </tr>
0553 <tr>
0554 <td><code>spark.mesos.driver.constraints</code></td>
0555 <td>(none)</td>
0556 <td>
0557 Same as <code>spark.mesos.constraints</code> except applied to drivers when launched through the dispatcher. By default,
0558 all offers with sufficient resources will be accepted.
0559 </td>
0560 <td>2.2.1</td>
0561 </tr>
0562 <tr>
0563 <td><code>spark.mesos.containerizer</code></td>
0564 <td><code>docker</code></td>
0565 <td>
0566 This only affects docker containers, and must be one of "docker"
0567 or "mesos". Mesos supports two types of
0568 containerizers for docker: the "docker" containerizer, and the preferred
0569 "mesos" containerizer. Read more here: http://mesos.apache.org/documentation/latest/container-image/
0570 </td>
0571 <td>2.1.0</td>
0572 </tr>
0573 <tr>
0574 <td><code>spark.mesos.driver.webui.url</code></td>
0575 <td><code>(none)</code></td>
0576 <td>
0577 Set the Spark Mesos driver webui_url for interacting with the framework.
0578 If unset it will point to Spark's internal web UI.
0579 </td>
0580 <td>2.0.0</td>
0581 </tr>
0582 <tr>
0583 <td><code>spark.mesos.driver.labels</code></td>
0584 <td><code>(none)</code></td>
0585 <td>
0586 Mesos labels to add to the driver. See <code>spark.mesos.task.labels</code>
0587 for formatting information.
0588 </td>
0589 <td>2.3.0</td>
0590 </tr>
0591 <tr>
0592 <td>
0593 <code>spark.mesos.driver.secret.values</code>,
0594 <code>spark.mesos.driver.secret.names</code>,
0595 <code>spark.mesos.executor.secret.values</code>,
0596 <code>spark.mesos.executor.secret.names</code>,
0597 </td>
0598 <td><code>(none)</code></td>
0599 <td>
0600 <p>
0601 A secret is specified by its contents and destination. These properties
0602 specify a secret's contents. To specify a secret's destination, see the cell below.
0603 </p>
0604 <p>
0605 You can specify a secret's contents either (1) by value or (2) by reference.
0606 </p>
0607 <p>
0608 (1) To specify a secret by value, set the
0609 <code>spark.mesos.[driver|executor].secret.values</code>
0610 property, to make the secret available in the driver or executors.
0611 For example, to make a secret password "guessme" available to the driver process, set:
0612
0613 <pre>spark.mesos.driver.secret.values=guessme</pre>
0614 </p>
0615 <p>
0616 (2) To specify a secret that has been placed in a secret store
0617 by reference, specify its name within the secret store
0618 by setting the <code>spark.mesos.[driver|executor].secret.names</code>
0619 property. For example, to make a secret password named "password" in a secret store
0620 available to the driver process, set:
0621
0622 <pre>spark.mesos.driver.secret.names=password</pre>
0623 </p>
0624 <p>
0625 Note: To use a secret store, make sure one has been integrated with Mesos via a custom
0626 <a href="http://mesos.apache.org/documentation/latest/secrets/">SecretResolver
0627 module</a>.
0628 </p>
0629 <p>
0630 To specify multiple secrets, provide a comma-separated list:
0631
0632 <pre>spark.mesos.driver.secret.values=guessme,passwd123</pre>
0633
0634 or
0635
0636 <pre>spark.mesos.driver.secret.names=password1,password2</pre>
0637 </p>
0638 </td>
0639 <td>2.3.0</td>
0640 </tr>
0641 <tr>
0642 <td>
0643 <code>spark.mesos.driver.secret.envkeys</code>,
0644 <code>spark.mesos.driver.secret.filenames</code>,
0645 <code>spark.mesos.executor.secret.envkeys</code>,
0646 <code>spark.mesos.executor.secret.filenames</code>,
0647 </td>
0648 <td><code>(none)</code></td>
0649 <td>
0650 <p>
0651 A secret is specified by its contents and destination. These properties
0652 specify a secret's destination. To specify a secret's contents, see the cell above.
0653 </p>
0654 <p>
0655 You can specify a secret's destination in the driver or
0656 executors as either (1) an environment variable or (2) as a file.
0657 </p>
0658 <p>
0659 (1) To make an environment-based secret, set the
0660 <code>spark.mesos.[driver|executor].secret.envkeys</code> property.
0661 The secret will appear as an environment variable with the
0662 given name in the driver or executors. For example, to make a secret password available
0663 to the driver process as $PASSWORD, set:
0664
0665 <pre>spark.mesos.driver.secret.envkeys=PASSWORD</pre>
0666 </p>
0667 <p>
0668 (2) To make a file-based secret, set the
0669 <code>spark.mesos.[driver|executor].secret.filenames</code> property.
0670 The secret will appear in the contents of a file with the given file name in
0671 the driver or executors. For example, to make a secret password available in a
0672 file named "pwdfile" in the driver process, set:
0673
0674 <pre>spark.mesos.driver.secret.filenames=pwdfile</pre>
0675 </p>
0676 <p>
0677 Paths are relative to the container's work directory. Absolute paths must
0678 already exist. Note: File-based secrets require a custom
0679 <a href="http://mesos.apache.org/documentation/latest/secrets/">SecretResolver
0680 module</a>.
0681 </p>
0682 <p>
0683 To specify env vars or file names corresponding to multiple secrets,
0684 provide a comma-separated list:
0685
0686 <pre>spark.mesos.driver.secret.envkeys=PASSWORD1,PASSWORD2</pre>
0687
0688 or
0689
0690 <pre>spark.mesos.driver.secret.filenames=pwdfile1,pwdfile2</pre>
0691 </p>
0692 </td>
0693 <td>2.3.0</td>
0694 </tr>
0695 <tr>
0696 <td><code>spark.mesos.driverEnv.[EnvironmentVariableName]</code></td>
0697 <td><code>(none)</code></td>
0698 <td>
0699 This only affects drivers submitted in cluster mode. Add the
0700 environment variable specified by EnvironmentVariableName to the
0701 driver process. The user can specify multiple of these to set
0702 multiple environment variables.
0703 </td>
0704 <td>2.1.0</td>
0705 </tr>
0706 <tr>
0707 <td><code>spark.mesos.dispatcher.webui.url</code></td>
0708 <td><code>(none)</code></td>
0709 <td>
0710 Set the Spark Mesos dispatcher webui_url for interacting with the framework.
0711 If unset it will point to Spark's internal web UI.
0712 </td>
0713 <td>2.0.0</td>
0714 </tr>
0715 <tr>
0716 <td><code>spark.mesos.dispatcher.driverDefault.[PropertyName]</code></td>
0717 <td><code>(none)</code></td>
0718 <td>
0719 Set default properties for drivers submitted through the
0720 dispatcher. For example,
0721 spark.mesos.dispatcher.driverProperty.spark.executor.memory=32g
0722 results in the executors for all drivers submitted in cluster mode
0723 to run in 32g containers.
0724 </td>
0725 <td>2.1.0</td>
0726 </tr>
0727 <tr>
0728 <td><code>spark.mesos.dispatcher.historyServer.url</code></td>
0729 <td><code>(none)</code></td>
0730 <td>
0731 Set the URL of the <a href="monitoring.html#viewing-after-the-fact">history
0732 server</a>. The dispatcher will then link each driver to its entry
0733 in the history server.
0734 </td>
0735 <td>2.1.0</td>
0736 </tr>
0737 <tr>
0738 <td><code>spark.mesos.gpus.max</code></td>
0739 <td><code>0</code></td>
0740 <td>
0741 Set the maximum number GPU resources to acquire for this job. Note that executors will still launch when no GPU resources are found
0742 since this configuration is just an upper limit and not a guaranteed amount.
0743 </td>
0744 <td>2.1.0</td>
0745 </tr>
0746 <tr>
0747 <td><code>spark.mesos.network.name</code></td>
0748 <td><code>(none)</code></td>
0749 <td>
0750 Attach containers to the given named network. If this job is
0751 launched in cluster mode, also launch the driver in the given named
0752 network. See
0753 <a href="http://mesos.apache.org/documentation/latest/cni/">the Mesos CNI docs</a>
0754 for more details.
0755 </td>
0756 <td>2.1.0</td>
0757 </tr>
0758 <tr>
0759 <td><code>spark.mesos.network.labels</code></td>
0760 <td><code>(none)</code></td>
0761 <td>
0762 Pass network labels to CNI plugins. This is a comma-separated list
0763 of key-value pairs, where each key-value pair has the format key:value.
0764 Example:
0765
0766 <pre>key1:val1,key2:val2</pre>
0767 See
0768 <a href="http://mesos.apache.org/documentation/latest/cni/#mesos-meta-data-to-cni-plugins">the Mesos CNI docs</a>
0769 for more details.
0770 </td>
0771 <td>2.3.0</td>
0772 </tr>
0773 <tr>
0774 <td><code>spark.mesos.fetcherCache.enable</code></td>
0775 <td><code>false</code></td>
0776 <td>
0777 If set to `true`, all URIs (example: `spark.executor.uri`,
0778 `spark.mesos.uris`) will be cached by the <a
0779 href="http://mesos.apache.org/documentation/latest/fetcher/">Mesos
0780 Fetcher Cache</a>
0781 </td>
0782 <td>2.1.0</td>
0783 </tr>
0784 <tr>
0785 <td><code>spark.mesos.driver.failoverTimeout</code></td>
0786 <td><code>0.0</code></td>
0787 <td>
0788 The amount of time (in seconds) that the master will wait for the
0789 driver to reconnect, after being temporarily disconnected, before
0790 it tears down the driver framework by killing all its
0791 executors. The default value is zero, meaning no timeout: if the
0792 driver disconnects, the master immediately tears down the framework.
0793 </td>
0794 <td>2.3.0</td>
0795 </tr>
0796 <tr>
0797 <td><code>spark.mesos.rejectOfferDuration</code></td>
0798 <td><code>120s</code></td>
0799 <td>
0800 Time to consider unused resources refused, serves as a fallback of
0801 `spark.mesos.rejectOfferDurationForUnmetConstraints`,
0802 `spark.mesos.rejectOfferDurationForReachedMaxCores`
0803 </td>
0804 <td>2.2.0</td>
0805 </tr>
0806 <tr>
0807 <td><code>spark.mesos.rejectOfferDurationForUnmetConstraints</code></td>
0808 <td><code>spark.mesos.rejectOfferDuration</code></td>
0809 <td>
0810 Time to consider unused resources refused with unmet constraints
0811 </td>
0812 <td>1.6.0</td>
0813 </tr>
0814 <tr>
0815 <td><code>spark.mesos.rejectOfferDurationForReachedMaxCores</code></td>
0816 <td><code>spark.mesos.rejectOfferDuration</code></td>
0817 <td>
0818 Time to consider unused resources refused when maximum number of cores
0819 <code>spark.cores.max</code> is reached
0820 </td>
0821 <td>2.0.0</td>
0822 </tr>
0823 <tr>
0824 <td><code>spark.mesos.appJar.local.resolution.mode</code></td>
0825 <td><code>host</code></td>
0826 <td>
0827 Provides support for the `local:///` scheme to reference the app jar resource in cluster mode.
0828 If user uses a local resource (`local:///path/to/jar`) and the config option is not used it defaults to `host` eg.
0829 the mesos fetcher tries to get the resource from the host's file system.
0830 If the value is unknown it prints a warning msg in the dispatcher logs and defaults to `host`.
0831 If the value is `container` then spark submit in the container will use the jar in the container's path:
0832 `/path/to/jar`.
0833 </td>
0834 <td>2.4.0</td>
0835 </tr>
0836 </table>
0837
0838 # Troubleshooting and Debugging
0839
0840 A few places to look during debugging:
0841
0842 - Mesos master on port `:5050`
0843 - Slaves should appear in the slaves tab
0844 - Spark applications should appear in the frameworks tab
0845 - Tasks should appear in the details of a framework
0846 - Check the stdout and stderr of the sandbox of failed tasks
0847 - Mesos logs
0848 - Master and slave logs are both in `/var/log/mesos` by default
0849
0850 And common pitfalls:
0851
0852 - Spark assembly not reachable/accessible
0853 - Slaves must be able to download the Spark binary package from the `http://`, `hdfs://` or `s3n://` URL you gave
0854 - Firewall blocking communications
0855 - Check for messages about failed connections
0856 - Temporarily disable firewalls for debugging and then poke appropriate holes