the-tree/R/WINDOWS.md

0001 ---
0002 license: |
0003   Licensed to the Apache Software Foundation (ASF) under one or more
0004   contributor license agreements.  See the NOTICE file distributed with
0005   this work for additional information regarding copyright ownership.
0006   The ASF licenses this file to You under the Apache License, Version 2.0
0007   (the "License"); you may not use this file except in compliance with
0008   the License.  You may obtain a copy of the License at
0009
0010      http://www.apache.org/licenses/LICENSE-2.0
0011
0012   Unless required by applicable law or agreed to in writing, software
0013   distributed under the License is distributed on an "AS IS" BASIS,
0014   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0015   See the License for the specific language governing permissions and
0016   limitations under the License.
0017 ---
0018
0019 ## Building SparkR on Windows
0020
0021 To build SparkR on Windows, the following steps are required
0022
0023 1. Make sure `bash` is available and in `PATH` if you already have a built-in `bash` on Windows. If you do not have, install [Cygwin](https://www.cygwin.com/).
0024
0025 2. Install R (>= 3.1) and [Rtools](https://cloud.r-project.org/bin/windows/Rtools/). Make sure to
0026 include Rtools and R in `PATH`. Note that support for R prior to version 3.4 is deprecated as of Spark 3.0.0.
0027
0028 3. Install JDK that SparkR supports (see `R/pkg/DESCRIPTION`), and set `JAVA_HOME` in the system environment variables.
0029
0030 4. Download and install [Maven](https://maven.apache.org/download.html). Also include the `bin`
0031 directory in Maven in `PATH`.
0032
0033 5. Set `MAVEN_OPTS` as described in [Building Spark](https://spark.apache.org/docs/latest/building-spark.html).
0034
0035 6. Open a command shell (`cmd`) in the Spark directory and build Spark with [Maven](https://spark.apache.org/docs/latest/building-spark.html#buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
0036
0037     ```bash
0038     mvn.cmd -DskipTests -Psparkr package
0039     ```
0040
0041     Note that `.\build\mvn` is a shell script so `mvn.cmd` on the system should be used directly on Windows.
0042     Make sure your Maven version is matched to `maven.version` in `./pom.xml`.
0043
0044 Note that it is a workaround for SparkR developers on Windows. Apache Spark does not officially support to _build_ on Windows yet whereas it supports to _run_ on Windows.
0045
0046 ##  Unit tests
0047
0048 To run the SparkR unit tests on Windows, the following steps are required —assuming you are in the Spark root directory and do not have Apache Hadoop installed already:
0049
0050 1. Create a folder to download Hadoop related files for Windows. For example, `cd ..` and `mkdir hadoop`.
0051
0052 2. Download the relevant Hadoop bin package from [steveloughran/winutils](https://github.com/steveloughran/winutils). While these are not official ASF artifacts, they are built from the ASF release git hashes by a Hadoop PMC member on a dedicated Windows VM. For further reading, consult [Windows Problems on the Hadoop wiki](https://wiki.apache.org/hadoop/WindowsProblems).
0053
0054 3. Install the files into `hadoop\bin`; make sure that `winutils.exe` and `hadoop.dll` are present.
0055
0056 4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop` directory.
0057
0058 5. Run unit tests for SparkR by running the command below. You need to install the needed packages following the instructions under [Running R Tests](https://spark.apache.org/docs/latest/building-spark.html#running-r-tests) first:
0059
0060     ```
0061     .\bin\spark-submit2.cmd --conf spark.hadoop.fs.defaultFS="file:///" R\pkg\tests\run-all.R
0062     ```
0063