the-tree/docs/hardware-provisioning.md

0001 ---
0002 layout: global
0003 title: Hardware Provisioning
0004 license: |
0005   Licensed to the Apache Software Foundation (ASF) under one or more
0006   contributor license agreements.  See the NOTICE file distributed with
0007   this work for additional information regarding copyright ownership.
0008   The ASF licenses this file to You under the Apache License, Version 2.0
0009   (the "License"); you may not use this file except in compliance with
0010   the License.  You may obtain a copy of the License at
0011
0012      http://www.apache.org/licenses/LICENSE-2.0
0013
0014   Unless required by applicable law or agreed to in writing, software
0015   distributed under the License is distributed on an "AS IS" BASIS,
0016   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0017   See the License for the specific language governing permissions and
0018   limitations under the License.
0019 ---
0020
0021 A common question received by Spark developers is how to configure hardware for it. While the right
0022 hardware will depend on the situation, we make the following recommendations.
0023
0024 # Storage Systems
0025
0026 Because most Spark jobs will likely have to read input data from an external storage system (e.g.
0027 the Hadoop File System, or HBase), it is important to place it **as close to this system as
0028 possible**. We recommend the following:
0029
0030 * If at all possible, run Spark on the same nodes as HDFS. The simplest way is to set up a Spark
0031 [standalone mode cluster](spark-standalone.html) on the same nodes, and configure Spark and
0032 Hadoop's memory and CPU usage to avoid interference (for Hadoop, the relevant options are
0033 `mapred.child.java.opts` for the per-task memory and `mapreduce.tasktracker.map.tasks.maximum`
0034 and `mapreduce.tasktracker.reduce.tasks.maximum` for number of tasks). Alternatively, you can run
0035 Hadoop and Spark on a common cluster manager like [Mesos](running-on-mesos.html) or
0036 [Hadoop YARN](running-on-yarn.html).
0037
0038 * If this is not possible, run Spark on different nodes in the same local-area network as HDFS.
0039
0040 * For low-latency data stores like HBase, it may be preferable to run computing jobs on different
0041 nodes than the storage system to avoid interference.
0042
0043 # Local Disks
0044
0045 While Spark can perform a lot of its computation in memory, it still uses local disks to store
0046 data that doesn't fit in RAM, as well as to preserve intermediate output between stages. We
0047 recommend having **4-8 disks** per node, configured _without_ RAID (just as separate mount points).
0048 In Linux, mount the disks with the `noatime` option
0049 to reduce unnecessary writes. In Spark, [configure](configuration.html) the `spark.local.dir`
0050 variable to be a comma-separated list of the local disks. If you are running HDFS, it's fine to
0051 use the same disks as HDFS.
0052
0053 # Memory
0054
0055 In general, Spark can run well with anywhere from **8 GiB to hundreds of gigabytes** of memory per
0056 machine. In all cases, we recommend allocating only at most 75% of the memory for Spark; leave the
0057 rest for the operating system and buffer cache.
0058
0059 How much memory you will need will depend on your application. To determine how much your
0060 application uses for a certain dataset size, load part of your dataset in a Spark RDD and use the
0061 Storage tab of Spark's monitoring UI (`http://<driver-node>:4040`) to see its size in memory.
0062 Note that memory usage is greatly affected by storage level and serialization format -- see
0063 the [tuning guide](tuning.html) for tips on how to reduce it.
0064
0065 Finally, note that the Java VM does not always behave well with more than 200 GiB of RAM. If you
0066 purchase machines with more RAM than this, you can launch multiple executors in a single node. In
0067 Spark's [standalone mode](spark-standalone.html), a worker is responsible for launching multiple
0068 executors according to its available memory and cores, and each executor will be launched in a
0069 separate Java VM.
0070
0071 # Network
0072
0073 In our experience, when the data is in memory, a lot of Spark applications are network-bound.
0074 Using a **10 Gigabit** or higher network is the best way to make these applications faster.
0075 This is especially true for "distributed reduce" applications such as group-bys, reduce-bys, and
0076 SQL joins. In any given application, you can see how much data Spark shuffles across the network
0077 from the application's monitoring UI (`http://<driver-node>:4040`).
0078
0079 # CPU Cores
0080
0081 Spark scales well to tens of CPU cores per machine because it performs minimal sharing between
0082 threads. You should likely provision at least **8-16 cores** per machine. Depending on the CPU
0083 cost of your workload, you may also need more: once data is in memory, most applications are
0084 either CPU- or network-bound.