Back to home page

OSCL-LXR

 
 

    


0001 ---
0002 layout: global
0003 title: Spark on Kubernetes Integration Tests
0004 ---
0005 
0006 # Running the Kubernetes Integration Tests
0007 
0008 Note that the integration test framework is currently being heavily revised and
0009 is subject to change.
0010 
0011 The simplest way to run the integration tests is to install and run Minikube, then run the following from this
0012 directory:
0013 
0014     ./dev/dev-run-integration-tests.sh
0015 
0016 To run tests with Java 11 instead of Java 8, use `--java-image-tag` to specify the base image.
0017 
0018     ./dev/dev-run-integration-tests.sh --java-image-tag 11-jre-slim
0019 
0020 To run tests with Hadoop 3.2 instead of Hadoop 2.7, use `--hadoop-profile`.
0021 
0022     ./dev/dev-run-integration-tests.sh --hadoop-profile hadoop-3.2
0023 
0024 The minimum tested version of Minikube is 0.23.0. The kube-dns addon must be enabled. Minikube should
0025 run with a minimum of 4 CPUs and 6G of memory:
0026 
0027     minikube start --cpus 4 --memory 6144
0028 
0029 You can download Minikube [here](https://github.com/kubernetes/minikube/releases).
0030 
0031 # Integration test customization
0032 
0033 Configuration of the integration test runtime is done through passing different arguments to the test script. 
0034 The main useful options are outlined below.
0035 
0036 ## Using a different backend
0037 
0038 The integration test backend i.e. the K8S cluster used for testing is controlled by the `--deploy-mode` option.  By 
0039 default this is set to `minikube`, the available backends are their prerequisites are as follows.  
0040 
0041 ### `minikube`
0042 
0043 Uses the local `minikube` cluster, this requires that `minikube` 0.23.0 or greater be installed and that it be allocated 
0044 at least 4 CPUs and 6GB memory (some users have reported success with as few as 3 CPUs and 4GB memory).  The tests will 
0045 check if `minikube` is started and abort early if it isn't currently running.
0046 
0047 ### `docker-for-desktop`
0048 
0049 Since July 2018 Docker for Desktop provide an optional Kubernetes cluster that can be enabled as described in this 
0050 [blog post](https://blog.docker.com/2018/07/kubernetes-is-now-available-in-docker-desktop-stable-channel/).  Assuming 
0051 this is enabled using this backend will auto-configure itself from the `docker-for-desktop` context that Docker creates 
0052 in your `~/.kube/config` file. If your config file is in a different location you should set the `KUBECONFIG` 
0053 environment variable appropriately.
0054 
0055 ### `cloud` 
0056 
0057 The cloud backend configures the tests to use an arbitrary Kubernetes cluster running in the cloud or otherwise.
0058 
0059 The `cloud` backend auto-configures the cluster to use from your K8S config file, this is assumed to be `~/.kube/config`
0060 unless the `KUBECONFIG` environment variable is set to override this location.  By default this will use whatever your 
0061 current context is in the config file, to use an alternative context from your config file you can specify the 
0062 `--context <context>` flag with the desired context.
0063 
0064 You can optionally use a different K8S master URL than the one your K8S config file specified, this should be supplied 
0065 via the `--spark-master <master-url>` flag.
0066 
0067 ## Re-using Docker Images
0068 
0069 By default, the test framework will build new Docker images on every test execution. A unique image tag is generated,
0070 and it is written to file at `target/imageTag.txt`. To reuse the images built in a previous run, or to use a Docker 
0071 image tag that you have built by other means already, pass the tag to the test script:
0072 
0073     ./dev/dev-run-integration-tests.sh --image-tag <tag>
0074 
0075 where if you still want to use images that were built before by the test framework:
0076 
0077     ./dev/dev-run-integration-tests.sh --image-tag $(cat target/imageTag.txt)
0078     
0079 ### Customising the Image Names
0080 
0081 If your image names do not follow the standard Spark naming convention - `spark`, `spark-py` and `spark-r` - then you can customise the names using several options.
0082 
0083 If you use the same basic pattern but a different prefix for the name e.g. `apache-spark` you can just set `--base-image-name <base-name>` e.g.
0084 
0085     ./dev/dev-run-integration-tests.sh --base-image-name apache-spark
0086     
0087 Alternatively if you use completely custom names then you can set each individually via the `--jvm-image-name <name>`, `--python-image-name <name>` and `--r-image-name <name>` arguments e.g.
0088 
0089     ./dev/dev-run-integration-tests.sh --jvm-image-name jvm-spark --python-image-name pyspark --r-image-name sparkr
0090 
0091 ## Spark Distribution Under Test
0092 
0093 The Spark code to test is handed to the integration test system via a tarball. Here is the option that is used to 
0094 specify the tarball:
0095 
0096 * `--spark-tgz <path-to-tgz>` - set `<path-to-tgz>` to point to a tarball containing the Spark distribution to test.
0097 
0098 This Tarball should be created by first running `dev/make-distribution.sh` passing the `--tgz` flag and `-Pkubernetes` 
0099 as one of the options to ensure that Kubernetes support is included in the distribution.  For more details on building a
0100 runnable distribution please see the 
0101 [Building Spark](https://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution) 
0102 documentation.
0103 
0104 **TODO:** Don't require the packaging of the built Spark artifacts into this tarball, just read them out of the current 
0105 tree.
0106 
0107 ## Customizing the Namespace and Service Account
0108 
0109 If no namespace is specified then a temporary namespace will be created and deleted during the test run.  Similarly if 
0110 no service account is specified then the `default` service account for the namespace will be used.
0111 
0112 Using the `--namespace <namespace>` flag sets `<namespace>` to the namespace in which the tests should be run.  If this 
0113 is supplied then the tests assume this namespace exists in the K8S cluster and will not attempt to create it.  
0114 Additionally this namespace must have an appropriately authorized service account which can be customised via the 
0115 `--service-account` flag.
0116 
0117 The `--service-account <service account name>` flag sets `<service account name>` to the name of the Kubernetes service 
0118 account to use in the namespace specified by the `--namespace` flag. The service account is expected to have permissions
0119 to get, list, watch, and create pods. For clusters with RBAC turned on, it's important that the right permissions are 
0120 granted to the service account in the namespace through an appropriate role and role binding. A reference RBAC 
0121 configuration is provided in `dev/spark-rbac.yaml`.
0122 
0123 # Running the Test Directly
0124 
0125 If you prefer to run just the integration tests directly, then you can customise the behaviour via passing system 
0126 properties to Maven.  For example:
0127 
0128     mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
0129                             -Pkubernetes -Pkubernetes-integration-tests \ 
0130                             -Phadoop-2.7 -Dhadoop.version=2.7.4 \
0131                             -Dspark.kubernetes.test.sparkTgz=spark-3.0.0-SNAPSHOT-bin-example.tgz \
0132                             -Dspark.kubernetes.test.imageTag=sometag \
0133                             -Dspark.kubernetes.test.imageRepo=docker.io/somerepo \
0134                             -Dspark.kubernetes.test.namespace=spark-int-tests \
0135                             -Dspark.kubernetes.test.deployMode=docker-for-desktop \
0136                             -Dtest.include.tags=k8s
0137                             
0138                             
0139 ## Available Maven Properties
0140 
0141 The following are the available Maven properties that can be passed.  For the most part these correspond to flags passed 
0142 to the wrapper scripts and using the wrapper scripts will simply set these appropriately behind the scenes.
0143 
0144 <table>
0145   <tr>
0146     <th>Property</th>
0147     <th>Description</th>
0148     <th>Default</th>
0149   </tr>
0150   <tr>
0151     <td><code>spark.kubernetes.test.sparkTgz</code></td>
0152     <td>
0153       A runnable Spark distribution to test.
0154     </td>
0155     <td></td>
0156   </tr>
0157   <tr>
0158     <td><code>spark.kubernetes.test.unpackSparkDir</code></td>
0159     <td>
0160       The directory where the runnable Spark distribution will be unpacked.
0161     </td>
0162     <td><code>${project.build.directory}/spark-dist-unpacked</code></td>
0163   </tr>
0164   <tr>
0165     <td><code>spark.kubernetes.test.deployMode</code></td>
0166     <td>
0167       The integration test backend to use.  Acceptable values are <code>minikube</code>, 
0168       <code>docker-for-desktop</code> and <code>cloud</code>.
0169     <td><code>minikube</code></td>
0170   </tr>
0171   <tr>
0172     <td><code>spark.kubernetes.test.kubeConfigContext</code></td>
0173     <td>
0174       When using the <code>cloud</code> backend specifies the context from the users K8S config file that should be used
0175       as the target cluster for integration testing.  If not set and using the <code>cloud</code> backend then your 
0176       current context will be used.
0177     </td>
0178     <td></td>
0179   </tr>
0180   <tr>
0181     <td><code>spark.kubernetes.test.master</code></td>
0182     <td>
0183       When using the <code>cloud-url</code> backend must be specified to indicate the K8S master URL to communicate 
0184       with.
0185     </td>
0186     <td></td>
0187   </tr>
0188   <tr>
0189     <td><code>spark.kubernetes.test.imageTag</code></td>
0190     <td>
0191       A specific image tag to use, when set assumes images with those tags are already built and available in the 
0192       specified image repository.  When set to <code>N/A</code> (the default) fresh images will be built.
0193     </td>
0194     <td><code>N/A</code></td>
0195   </tr>
0196   <tr>
0197     <td><code>spark.kubernetes.test.javaImageTag</code></td>
0198     <td>
0199       A specific OpenJDK base image tag to use, when set uses it instead of 8-jre-slim.
0200     </td>
0201     <td><code>8-jre-slim</code></td>
0202   </tr>
0203   <tr>
0204     <td><code>spark.kubernetes.test.imageTagFile</code></td>
0205     <td>
0206       A file containing the image tag to use, if no specific image tag is set then fresh images will be built with a 
0207       generated tag and that tag written to this file.
0208     </td>
0209     <td><code>${project.build.directory}/imageTag.txt</code></td>
0210   </tr>
0211   <tr>
0212     <td><code>spark.kubernetes.test.imageRepo</code></td>
0213     <td>
0214       The Docker image repository that contains the images to be used if a specific image tag is set or to which the 
0215       images will be pushed to if fresh images are being built.
0216     </td>
0217     <td><code>docker.io/kubespark</code></td>
0218   </tr>
0219   <tr>
0220     <td><code>spark.kubernetes.test.jvmImage</code></td>
0221     <td>
0222       The image name for the JVM based Spark image to test
0223     </td>
0224     <td><code>spark</code></td>
0225   </tr>
0226   <tr>
0227     <td><code>spark.kubernetes.test.pythonImage</code></td>
0228     <td>
0229       The image name for the Python based Spark image to test
0230     </td>
0231     <td><code>spark-py</code></td>
0232   </tr>
0233   <tr>
0234     <td><code>spark.kubernetes.test.rImage</code></td>
0235     <td>
0236       The image name for the R based Spark image to test
0237     </td>
0238     <td><code>spark-r</code></td>
0239   </tr>
0240   <tr>
0241     <td><code>spark.kubernetes.test.namespace</code></td>
0242     <td>
0243       A specific Kubernetes namespace to run the tests in.  If specified then the tests assume that this namespace 
0244       already exists. When not specified a temporary namespace for the tests will be created and deleted as part of the
0245       test run.
0246     </td>
0247     <td></td>
0248   </tr>
0249   <tr>
0250     <td><code>spark.kubernetes.test.serviceAccountName</code></td>
0251     <td>
0252       A specific Kubernetes service account to use for running the tests.  If not specified then the namespaces default
0253       service account will be used and that must have sufficient permissions or the tests will fail.
0254     </td>
0255     <td></td>
0256   </tr>
0257 </table>