Back to home page

OSCL-LXR

 
 

    


0001 ---
0002 layout: global
0003 title: Distributed SQL Engine
0004 displayTitle: Distributed SQL Engine
0005 license: |
0006   Licensed to the Apache Software Foundation (ASF) under one or more
0007   contributor license agreements.  See the NOTICE file distributed with
0008   this work for additional information regarding copyright ownership.
0009   The ASF licenses this file to You under the Apache License, Version 2.0
0010   (the "License"); you may not use this file except in compliance with
0011   the License.  You may obtain a copy of the License at
0012  
0013      http://www.apache.org/licenses/LICENSE-2.0
0014  
0015   Unless required by applicable law or agreed to in writing, software
0016   distributed under the License is distributed on an "AS IS" BASIS,
0017   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018   See the License for the specific language governing permissions and
0019   limitations under the License.
0020 ---
0021 
0022 * Table of contents
0023 {:toc}
0024 
0025 Spark SQL can also act as a distributed query engine using its JDBC/ODBC or command-line interface.
0026 In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries,
0027 without the need to write any code.
0028 
0029 ## Running the Thrift JDBC/ODBC server
0030 
0031 The Thrift JDBC/ODBC server implemented here corresponds to the [`HiveServer2`](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2)
0032 in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or compatible Hive.
0033 
0034 To start the JDBC/ODBC server, run the following in the Spark directory:
0035 
0036     ./sbin/start-thriftserver.sh
0037 
0038 This script accepts all `bin/spark-submit` command line options, plus a `--hiveconf` option to
0039 specify Hive properties. You may run `./sbin/start-thriftserver.sh --help` for a complete list of
0040 all available options. By default, the server listens on localhost:10000. You may override this
0041 behaviour via either environment variables, i.e.:
0042 
0043 {% highlight bash %}
0044 export HIVE_SERVER2_THRIFT_PORT=<listening-port>
0045 export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
0046 ./sbin/start-thriftserver.sh \
0047   --master <master-uri> \
0048   ...
0049 {% endhighlight %}
0050 
0051 or system properties:
0052 
0053 {% highlight bash %}
0054 ./sbin/start-thriftserver.sh \
0055   --hiveconf hive.server2.thrift.port=<listening-port> \
0056   --hiveconf hive.server2.thrift.bind.host=<listening-host> \
0057   --master <master-uri>
0058   ...
0059 {% endhighlight %}
0060 
0061 Now you can use beeline to test the Thrift JDBC/ODBC server:
0062 
0063     ./bin/beeline
0064 
0065 Connect to the JDBC/ODBC server in beeline with:
0066 
0067     beeline> !connect jdbc:hive2://localhost:10000
0068 
0069 Beeline will ask you for a username and password. In non-secure mode, simply enter the username on
0070 your machine and a blank password. For secure mode, please follow the instructions given in the
0071 [beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients).
0072 
0073 Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
0074 
0075 You may also use the beeline script that comes with Hive.
0076 
0077 Thrift JDBC server also supports sending thrift RPC messages over HTTP transport.
0078 Use the following setting to enable HTTP mode as system property or in `hive-site.xml` file in `conf/`:
0079 
0080     hive.server2.transport.mode - Set this to value: http
0081     hive.server2.thrift.http.port - HTTP port number to listen on; default is 10001
0082     hive.server2.http.endpoint - HTTP endpoint; default is cliservice
0083 
0084 To test, use beeline to connect to the JDBC/ODBC server in http mode with:
0085 
0086     beeline> !connect jdbc:hive2://<host>:<port>/<database>?hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>
0087 
0088 If you closed a session and do CTAS, you must set `fs.%s.impl.disable.cache` to true in `hive-site.xml`.
0089 See more details in [[SPARK-21067]](https://issues.apache.org/jira/browse/SPARK-21067).
0090 
0091 ## Running the Spark SQL CLI
0092 
0093 The Spark SQL CLI is a convenient tool to run the Hive metastore service in local mode and execute
0094 queries input from the command line. Note that the Spark SQL CLI cannot talk to the Thrift JDBC server.
0095 
0096 To start the Spark SQL CLI, run the following in the Spark directory:
0097 
0098     ./bin/spark-sql
0099 
0100 Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
0101 You may run `./bin/spark-sql --help` for a complete list of all available options.