the-tree/docs/sql-distributed-sql-engine.md

0001 ---
0002 layout: global
0003 title: Distributed SQL Engine
0004 displayTitle: Distributed SQL Engine
0005 license: |
0006   Licensed to the Apache Software Foundation (ASF) under one or more
0007   contributor license agreements.  See the NOTICE file distributed with
0008   this work for additional information regarding copyright ownership.
0009   The ASF licenses this file to You under the Apache License, Version 2.0
0010   (the "License"); you may not use this file except in compliance with
0011   the License.  You may obtain a copy of the License at
0012
0013      http://www.apache.org/licenses/LICENSE-2.0
0014
0015   Unless required by applicable law or agreed to in writing, software
0016   distributed under the License is distributed on an "AS IS" BASIS,
0017   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018   See the License for the specific language governing permissions and
0019   limitations under the License.
0020 ---
0021
0022 * Table of contents
0023 {:toc}
0024
0025 Spark SQL can also act as a distributed query engine using its JDBC/ODBC or command-line interface.
0026 In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries,
0027 without the need to write any code.
0028
0029 ## Running the Thrift JDBC/ODBC server
0030
0031 The Thrift JDBC/ODBC server implemented here corresponds to the [`HiveServer2`](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2)
0032 in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or compatible Hive.
0033
0034 To start the JDBC/ODBC server, run the following in the Spark directory:
0035
0036     ./sbin/start-thriftserver.sh
0037
0038 This script accepts all `bin/spark-submit` command line options, plus a `--hiveconf` option to
0039 specify Hive properties. You may run `./sbin/start-thriftserver.sh --help` for a complete list of
0040 all available options. By default, the server listens on localhost:10000. You may override this
0041 behaviour via either environment variables, i.e.:
0042
0043 {% highlight bash %}
0044 export HIVE_SERVER2_THRIFT_PORT=<listening-port>
0045 export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
0046 ./sbin/start-thriftserver.sh \
0047   --master <master-uri> \
0048   ...
0049 {% endhighlight %}
0050
0051 or system properties:
0052
0053 {% highlight bash %}
0054 ./sbin/start-thriftserver.sh \
0055   --hiveconf hive.server2.thrift.port=<listening-port> \
0056   --hiveconf hive.server2.thrift.bind.host=<listening-host> \
0057   --master <master-uri>
0058   ...
0059 {% endhighlight %}
0060
0061 Now you can use beeline to test the Thrift JDBC/ODBC server:
0062
0063     ./bin/beeline
0064
0065 Connect to the JDBC/ODBC server in beeline with:
0066
0067     beeline> !connect jdbc:hive2://localhost:10000
0068
0069 Beeline will ask you for a username and password. In non-secure mode, simply enter the username on
0070 your machine and a blank password. For secure mode, please follow the instructions given in the
0071 [beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients).
0072
0073 Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
0074
0075 You may also use the beeline script that comes with Hive.
0076
0077 Thrift JDBC server also supports sending thrift RPC messages over HTTP transport.
0078 Use the following setting to enable HTTP mode as system property or in `hive-site.xml` file in `conf/`:
0079
0080     hive.server2.transport.mode - Set this to value: http
0081     hive.server2.thrift.http.port - HTTP port number to listen on; default is 10001
0082     hive.server2.http.endpoint - HTTP endpoint; default is cliservice
0083
0084 To test, use beeline to connect to the JDBC/ODBC server in http mode with:
0085
0086     beeline> !connect jdbc:hive2://<host>:<port>/<database>?hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>
0087
0088 If you closed a session and do CTAS, you must set `fs.%s.impl.disable.cache` to true in `hive-site.xml`.
0089 See more details in [[SPARK-21067]](https://issues.apache.org/jira/browse/SPARK-21067).
0090
0091 ## Running the Spark SQL CLI
0092
0093 The Spark SQL CLI is a convenient tool to run the Hive metastore service in local mode and execute
0094 queries input from the command line. Note that the Spark SQL CLI cannot talk to the Thrift JDBC server.
0095
0096 To start the Spark SQL CLI, run the following in the Spark directory:
0097
0098     ./bin/spark-sql
0099
0100 Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
0101 You may run `./bin/spark-sql --help` for a complete list of all available options.