Back to home page

OSCL-LXR

 
 

    


0001 ---
0002 layout: global
0003 title: Integration with Hive UDFs/UDAFs/UDTFs
0004 displayTitle: Integration with Hive UDFs/UDAFs/UDTFs
0005 license: |
0006   Licensed to the Apache Software Foundation (ASF) under one or more
0007   contributor license agreements.  See the NOTICE file distributed with
0008   this work for additional information regarding copyright ownership.
0009   The ASF licenses this file to You under the Apache License, Version 2.0
0010   (the "License"); you may not use this file except in compliance with
0011   the License.  You may obtain a copy of the License at
0012 
0013      http://www.apache.org/licenses/LICENSE-2.0
0014 
0015   Unless required by applicable law or agreed to in writing, software
0016   distributed under the License is distributed on an "AS IS" BASIS,
0017   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018   See the License for the specific language governing permissions and
0019   limitations under the License.
0020 ---
0021 
0022 ### Description
0023 
0024 Spark SQL supports integration of Hive UDFs, UDAFs and UDTFs. Similar to Spark UDFs and UDAFs, Hive UDFs work on a single row as input and generate a single row as output, while Hive UDAFs operate on multiple rows and return a single aggregated row as a result. In addition, Hive also supports UDTFs (User Defined Tabular Functions) that act on one row as input and return multiple rows as output. To use Hive UDFs/UDAFs/UTFs, the user should register them in Spark, and then use them in Spark SQL queries.
0025 
0026 ### Examples
0027 
0028 Hive has two UDF interfaces: [UDF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDF.java) and [GenericUDF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java).
0029 An example below uses [GenericUDFAbs](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAbs.java) derived from `GenericUDF`.
0030 
0031 ```sql
0032 -- Register `GenericUDFAbs` and use it in Spark SQL.
0033 -- Note that, if you use your own programmed one, you need to add a JAR containing it
0034 -- into a classpath,
0035 -- e.g., ADD JAR yourHiveUDF.jar;
0036 CREATE TEMPORARY FUNCTION testUDF AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs';
0037 
0038 SELECT * FROM t;
0039 +-----+
0040 |value|
0041 +-----+
0042 | -1.0|
0043 |  2.0|
0044 | -3.0|
0045 +-----+
0046 
0047 SELECT testUDF(value) FROM t;
0048 +--------------+
0049 |testUDF(value)|
0050 +--------------+
0051 |           1.0|
0052 |           2.0|
0053 |           3.0|
0054 +--------------+
0055 ```
0056 
0057 
0058 An example below uses [GenericUDTFExplode](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplode.java) derived from [GenericUDTF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java).
0059 
0060 ```sql
0061 -- Register `GenericUDTFExplode` and use it in Spark SQL
0062 CREATE TEMPORARY FUNCTION hiveUDTF
0063     AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode';
0064 
0065 SELECT * FROM t;
0066 +------+
0067 | value|
0068 +------+
0069 |[1, 2]|
0070 |[3, 4]|
0071 +------+
0072 
0073 SELECT hiveUDTF(value) FROM t;
0074 +---+
0075 |col|
0076 +---+
0077 |  1|
0078 |  2|
0079 |  3|
0080 |  4|
0081 +---+
0082 ```
0083 
0084 Hive has two UDAF interfaces: [UDAF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java) and [GenericUDAFResolver](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver.java).
0085 An example below uses [GenericUDAFSum](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java) derived from `GenericUDAFResolver`.
0086 
0087 ```sql
0088 -- Register `GenericUDAFSum` and use it in Spark SQL
0089 CREATE TEMPORARY FUNCTION hiveUDAF
0090     AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum';
0091 
0092 SELECT * FROM t;
0093 +---+-----+
0094 |key|value|
0095 +---+-----+
0096 |  a|    1|
0097 |  a|    2|
0098 |  b|    3|
0099 +---+-----+
0100 
0101 SELECT key, hiveUDAF(value) FROM t GROUP BY key;
0102 +---+---------------+
0103 |key|hiveUDAF(value)|
0104 +---+---------------+
0105 |  b|              3|
0106 |  a|              3|
0107 +---+---------------+
0108 ```