0001 ---
0002 layout: global
0003 title: Integration with Hive UDFs/UDAFs/UDTFs
0004 displayTitle: Integration with Hive UDFs/UDAFs/UDTFs
0005 license: |
0006 Licensed to the Apache Software Foundation (ASF) under one or more
0007 contributor license agreements. See the NOTICE file distributed with
0008 this work for additional information regarding copyright ownership.
0009 The ASF licenses this file to You under the Apache License, Version 2.0
0010 (the "License"); you may not use this file except in compliance with
0011 the License. You may obtain a copy of the License at
0012
0013 http://www.apache.org/licenses/LICENSE-2.0
0014
0015 Unless required by applicable law or agreed to in writing, software
0016 distributed under the License is distributed on an "AS IS" BASIS,
0017 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018 See the License for the specific language governing permissions and
0019 limitations under the License.
0020 ---
0021
0022 ### Description
0023
0024 Spark SQL supports integration of Hive UDFs, UDAFs and UDTFs. Similar to Spark UDFs and UDAFs, Hive UDFs work on a single row as input and generate a single row as output, while Hive UDAFs operate on multiple rows and return a single aggregated row as a result. In addition, Hive also supports UDTFs (User Defined Tabular Functions) that act on one row as input and return multiple rows as output. To use Hive UDFs/UDAFs/UTFs, the user should register them in Spark, and then use them in Spark SQL queries.
0025
0026 ### Examples
0027
0028 Hive has two UDF interfaces: [UDF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDF.java) and [GenericUDF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java).
0029 An example below uses [GenericUDFAbs](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAbs.java) derived from `GenericUDF`.
0030
0031 ```sql
0032 -- Register `GenericUDFAbs` and use it in Spark SQL.
0033 -- Note that, if you use your own programmed one, you need to add a JAR containing it
0034 -- into a classpath,
0035 -- e.g., ADD JAR yourHiveUDF.jar;
0036 CREATE TEMPORARY FUNCTION testUDF AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs';
0037
0038 SELECT * FROM t;
0039 +-----+
0040 |value|
0041 +-----+
0042 | -1.0|
0043 | 2.0|
0044 | -3.0|
0045 +-----+
0046
0047 SELECT testUDF(value) FROM t;
0048 +--------------+
0049 |testUDF(value)|
0050 +--------------+
0051 | 1.0|
0052 | 2.0|
0053 | 3.0|
0054 +--------------+
0055 ```
0056
0057
0058 An example below uses [GenericUDTFExplode](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplode.java) derived from [GenericUDTF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java).
0059
0060 ```sql
0061 -- Register `GenericUDTFExplode` and use it in Spark SQL
0062 CREATE TEMPORARY FUNCTION hiveUDTF
0063 AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode';
0064
0065 SELECT * FROM t;
0066 +------+
0067 | value|
0068 +------+
0069 |[1, 2]|
0070 |[3, 4]|
0071 +------+
0072
0073 SELECT hiveUDTF(value) FROM t;
0074 +---+
0075 |col|
0076 +---+
0077 | 1|
0078 | 2|
0079 | 3|
0080 | 4|
0081 +---+
0082 ```
0083
0084 Hive has two UDAF interfaces: [UDAF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java) and [GenericUDAFResolver](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver.java).
0085 An example below uses [GenericUDAFSum](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java) derived from `GenericUDAFResolver`.
0086
0087 ```sql
0088 -- Register `GenericUDAFSum` and use it in Spark SQL
0089 CREATE TEMPORARY FUNCTION hiveUDAF
0090 AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum';
0091
0092 SELECT * FROM t;
0093 +---+-----+
0094 |key|value|
0095 +---+-----+
0096 | a| 1|
0097 | a| 2|
0098 | b| 3|
0099 +---+-----+
0100
0101 SELECT key, hiveUDAF(value) FROM t GROUP BY key;
0102 +---+---------------+
0103 |key|hiveUDAF(value)|
0104 +---+---------------+
0105 | b| 3|
0106 | a| 3|
0107 +---+---------------+
0108 ```