0001 ---
0002 license: |
0003 Licensed to the Apache Software Foundation (ASF) under one or more
0004 contributor license agreements. See the NOTICE file distributed with
0005 this work for additional information regarding copyright ownership.
0006 The ASF licenses this file to You under the Apache License, Version 2.0
0007 (the "License"); you may not use this file except in compliance with
0008 the License. You may obtain a copy of the License at
0009
0010 http://www.apache.org/licenses/LICENSE-2.0
0011
0012 Unless required by applicable law or agreed to in writing, software
0013 distributed under the License is distributed on an "AS IS" BASIS,
0014 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0015 See the License for the specific language governing permissions and
0016 limitations under the License.
0017 ---
0018
0019 Welcome to the Spark documentation!
0020
0021 This readme will walk you through navigating and building the Spark documentation, which is included
0022 here with the Spark source code. You can also find documentation specific to release versions of
0023 Spark at https://spark.apache.org/documentation.html.
0024
0025 Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the
0026 documentation yourself. Why build it yourself? So that you have the docs that correspond to
0027 whichever version of Spark you currently have checked out of revision control.
0028
0029 ## Prerequisites
0030
0031 The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
0032 Python, R and SQL.
0033
0034 You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
0035 [Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
0036 installed. Also install the following libraries:
0037
0038 ```sh
0039 $ sudo gem install jekyll jekyll-redirect-from rouge
0040 ```
0041
0042 Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to replace gem with gem2.0.
0043
0044 ### R Documentation
0045
0046 If you'd like to generate R documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
0047 and install these libraries:
0048
0049 ```sh
0050 $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
0051 $ sudo Rscript -e 'devtools::install_version("roxygen2", version = "5.0.1", repos="https://cloud.r-project.org/")'
0052 ```
0053
0054 Note: Other versions of roxygen2 might work in SparkR documentation generation but `RoxygenNote` field in `$SPARK_HOME/R/pkg/DESCRIPTION` is 5.0.1, which is updated if the version is mismatched.
0055
0056 ### API Documentation
0057
0058 To generate API docs for any language, you'll need to install these libraries:
0059
0060 ```sh
0061 $ sudo pip install sphinx mkdocs numpy
0062 ```
0063
0064 ## Generating the Documentation HTML
0065
0066 We include the Spark documentation as part of the source (as opposed to using a hosted wiki, such as
0067 the github wiki, as the definitive documentation) to enable the documentation to evolve along with
0068 the source code and be captured by revision control (currently git). This way the code automatically
0069 includes the version of the documentation that is relevant regardless of which version or release
0070 you have checked out or downloaded.
0071
0072 In this directory you will find text files formatted using Markdown, with an ".md" suffix. You can
0073 read those text files directly if you want. Start with `index.md`.
0074
0075 Execute `jekyll build` from the `docs/` directory to compile the site. Compiling the site with
0076 Jekyll will create a directory called `_site` containing `index.html` as well as the rest of the
0077 compiled files.
0078
0079 ```sh
0080 $ cd docs
0081 $ jekyll build
0082 ```
0083
0084 You can modify the default Jekyll build as follows:
0085
0086 ```sh
0087 # Skip generating API docs (which takes a while)
0088 $ SKIP_API=1 jekyll build
0089
0090 # Serve content locally on port 4000
0091 $ jekyll serve --watch
0092
0093 # Build the site with extra features used on the live page
0094 $ PRODUCTION=1 jekyll build
0095 ```
0096
0097 ## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
0098
0099 You can build just the Spark scaladoc and javadoc by running `./build/sbt unidoc` from the `$SPARK_HOME` directory.
0100
0101 Similarly, you can build just the PySpark docs by running `make html` from the
0102 `$SPARK_HOME/python/docs` directory. Documentation is only generated for classes that are listed as
0103 public in `__init__.py`. The SparkR docs can be built by running `$SPARK_HOME/R/create-docs.sh`, and
0104 the SQL docs can be built by running `$SPARK_HOME/sql/create-docs.sh`
0105 after [building Spark](https://github.com/apache/spark#building-spark) first.
0106
0107 When you run `jekyll build` in the `docs` directory, it will also copy over the scaladoc and javadoc for the various
0108 Spark subprojects into the `docs` directory (and then also into the `_site` directory). We use a
0109 jekyll plugin to run `./build/sbt unidoc` before building the site so if you haven't run it (recently) it
0110 may take some time as it generates all of the scaladoc and javadoc using [Unidoc](https://github.com/sbt/sbt-unidoc).
0111 The jekyll plugin also generates the PySpark docs using [Sphinx](http://sphinx-doc.org/), SparkR docs
0112 using [roxygen2](https://cran.r-project.org/web/packages/roxygen2/index.html) and SQL docs
0113 using [MkDocs](https://www.mkdocs.org/).
0114
0115 NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1
0116 jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
0117 to skip a single step of the corresponding language. `SKIP_SCALADOC` indicates skipping both the Scala and Java docs.
0118
0119 ### Automatically Rebuilding API Docs
0120
0121 `jekyll serve --watch` will only watch what's in `docs/`, and it won't follow symlinks. That means it won't monitor your API docs under `python/docs` or elsewhere.
0122
0123 To work around this limitation for Python, install [`entr`](http://eradman.com/entrproject/) and run the following in a separate shell:
0124
0125 ```sh
0126 cd "$SPARK_HOME/python/docs"
0127 find .. -type f -name '*.py' \
0128 | entr -s 'make html && cp -r _build/html/. ../../docs/api/python'
0129 ```
0130
0131 Whenever there is a change to your Python code, `entr` will automatically rebuild the Python API docs and copy them to `docs/`, thus triggering a Jekyll update.