0001 ---
0002 layout: global
0003 title: GROUP BY Clause
0004 displayTitle: GROUP BY Clause
0005 license: |
0006 Licensed to the Apache Software Foundation (ASF) under one or more
0007 contributor license agreements. See the NOTICE file distributed with
0008 this work for additional information regarding copyright ownership.
0009 The ASF licenses this file to You under the Apache License, Version 2.0
0010 (the "License"); you may not use this file except in compliance with
0011 the License. You may obtain a copy of the License at
0012
0013 http://www.apache.org/licenses/LICENSE-2.0
0014
0015 Unless required by applicable law or agreed to in writing, software
0016 distributed under the License is distributed on an "AS IS" BASIS,
0017 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018 See the License for the specific language governing permissions and
0019 limitations under the License.
0020 ---
0021
0022 ### Description
0023
0024 The `GROUP BY` clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on
0025 the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple
0026 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses.
0027 When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
0028
0029 ### Syntax
0030
0031 ```sql
0032 GROUP BY group_expression [ , group_expression [ , ... ] ]
0033 [ { WITH ROLLUP | WITH CUBE | GROUPING SETS (grouping_set [ , ...]) } ]
0034
0035 GROUP BY GROUPING SETS (grouping_set [ , ...])
0036 ```
0037
0038 While aggregate functions are defined as
0039 ```sql
0040 aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_expression ) ]
0041 ```
0042
0043 ### Parameters
0044
0045 * **GROUPING SETS**
0046
0047 Groups the rows for each subset of the expressions specified in the grouping sets. For example,
0048 `GROUP BY GROUPING SETS (warehouse, product)` is semantically equivalent
0049 to union of results of `GROUP BY warehouse` and `GROUP BY product`. This clause
0050 is a shorthand for a `UNION ALL` where each leg of the `UNION ALL`
0051 operator performs aggregation of subset of the columns specified in the `GROUPING SETS` clause.
0052
0053 * **grouping_set**
0054
0055 A grouping set is specified by zero or more comma-separated expressions in parentheses.
0056
0057 **Syntax:** `( [ expression [ , ... ] ] )`
0058
0059 * **grouping_expression**
0060
0061 Specifies the critieria based on which the rows are grouped together. The grouping of rows is performed based on
0062 result values of the grouping expressions. A grouping expression may be a column alias, a column position
0063 or an expression.
0064
0065 * **ROLLUP**
0066
0067 Specifies multiple levels of aggregations in a single statement. This clause is used to compute aggregations
0068 based on multiple grouping sets. `ROLLUP` is a shorthand for `GROUPING SETS`. For example,
0069 `GROUP BY warehouse, product WITH ROLLUP` is equivalent to `GROUP BY GROUPING SETS
0070 ((warehouse, product), (warehouse), ())`.
0071 The N elements of a `ROLLUP` specification results in N+1 `GROUPING SETS`.
0072
0073 * **CUBE**
0074
0075 `CUBE` clause is used to perform aggregations based on combination of grouping columns specified in the
0076 `GROUP BY` clause. `CUBE` is a shorthand for `GROUPING SETS`. For example,
0077 `GROUP BY warehouse, product WITH CUBE` is equivalent to `GROUP BY GROUPING SETS
0078 ((warehouse, product), (warehouse), (product), ())`.
0079 The N elements of a `CUBE` specification results in 2^N `GROUPING SETS`.
0080
0081 * **aggregate_name**
0082
0083 Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
0084
0085 * **DISTINCT**
0086
0087 Removes duplicates in input rows before they are passed to aggregate functions.
0088
0089 * **FILTER**
0090
0091 Filters the input rows for which the `boolean_expression` in the `WHERE` clause evaluates
0092 to true are passed to the aggregate function; other rows are discarded.
0093
0094 ### Examples
0095
0096 ```sql
0097 CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
0098 INSERT INTO dealer VALUES
0099 (100, 'Fremont', 'Honda Civic', 10),
0100 (100, 'Fremont', 'Honda Accord', 15),
0101 (100, 'Fremont', 'Honda CRV', 7),
0102 (200, 'Dublin', 'Honda Civic', 20),
0103 (200, 'Dublin', 'Honda Accord', 10),
0104 (200, 'Dublin', 'Honda CRV', 3),
0105 (300, 'San Jose', 'Honda Civic', 5),
0106 (300, 'San Jose', 'Honda Accord', 8);
0107
0108 -- Sum of quantity per dealership. Group by `id`.
0109 SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;
0110 +---+-------------+
0111 | id|sum(quantity)|
0112 +---+-------------+
0113 |100| 32|
0114 |200| 33|
0115 |300| 13|
0116 +---+-------------+
0117
0118 -- Use column position in GROUP by clause.
0119 SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;
0120 +---+-------------+
0121 | id|sum(quantity)|
0122 +---+-------------+
0123 |100| 32|
0124 |200| 33|
0125 |300| 13|
0126 +---+-------------+
0127
0128 -- Multiple aggregations.
0129 -- 1. Sum of quantity per dealership.
0130 -- 2. Max quantity per dealership.
0131 SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM dealer GROUP BY id ORDER BY id;
0132 +---+---+---+
0133 | id|sum|max|
0134 +---+---+---+
0135 |100| 32| 15|
0136 |200| 33| 20|
0137 |300| 13| 8|
0138 +---+---+---+
0139
0140 -- Count the number of distinct dealer cities per car_model.
0141 SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
0142 +------------+-----+
0143 | car_model|count|
0144 +------------+-----+
0145 | Honda Civic| 3|
0146 | Honda CRV| 2|
0147 |Honda Accord| 3|
0148 +------------+-----+
0149
0150 -- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
0151 SELECT id, sum(quantity) FILTER (
0152 WHERE car_model IN ('Honda Civic', 'Honda CRV')
0153 ) AS `sum(quantity)` FROM dealer
0154 GROUP BY id ORDER BY id;
0155 +---+-------------+
0156 | id|sum(quantity)|
0157 +---+-------------+
0158 |100| 17|
0159 |200| 23|
0160 |300| 5|
0161 +---+-------------+
0162
0163 -- Aggregations using multiple sets of grouping columns in a single statement.
0164 -- Following performs aggregations based on four sets of grouping columns.
0165 -- 1. city, car_model
0166 -- 2. city
0167 -- 3. car_model
0168 -- 4. Empty grouping set. Returns quantities for all city and car models.
0169 SELECT city, car_model, sum(quantity) AS sum FROM dealer
0170 GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
0171 ORDER BY city;
0172 +---------+------------+---+
0173 | city| car_model|sum|
0174 +---------+------------+---+
0175 | null| null| 78|
0176 | null| HondaAccord| 33|
0177 | null| HondaCRV| 10|
0178 | null| HondaCivic| 35|
0179 | Dublin| null| 33|
0180 | Dublin| HondaAccord| 10|
0181 | Dublin| HondaCRV| 3|
0182 | Dublin| HondaCivic| 20|
0183 | Fremont| null| 32|
0184 | Fremont| HondaAccord| 15|
0185 | Fremont| HondaCRV| 7|
0186 | Fremont| HondaCivic| 10|
0187 | San Jose| null| 13|
0188 | San Jose| HondaAccord| 8|
0189 | San Jose| HondaCivic| 5|
0190 +---------+------------+---+
0191
0192 -- Alternate syntax for `GROUPING SETS` in which both `GROUP BY` and `GROUPING SETS`
0193 -- specifications are present.
0194 SELECT city, car_model, sum(quantity) AS sum FROM dealer
0195 GROUP BY city, car_model GROUPING SETS ((city, car_model), (city), (car_model), ())
0196 ORDER BY city, car_model;
0197 +---------+------------+---+
0198 | city| car_model|sum|
0199 +---------+------------+---+
0200 | null| null| 78|
0201 | null| HondaAccord| 33|
0202 | null| HondaCRV| 10|
0203 | null| HondaCivic| 35|
0204 | Dublin| null| 33|
0205 | Dublin| HondaAccord| 10|
0206 | Dublin| HondaCRV| 3|
0207 | Dublin| HondaCivic| 20|
0208 | Fremont| null| 32|
0209 | Fremont| HondaAccord| 15|
0210 | Fremont| HondaCRV| 7|
0211 | Fremont| HondaCivic| 10|
0212 | San Jose| null| 13|
0213 | San Jose| HondaAccord| 8|
0214 | San Jose| HondaCivic| 5|
0215 +---------+------------+---+
0216
0217 -- Group by processing with `ROLLUP` clause.
0218 -- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), ())
0219 SELECT city, car_model, sum(quantity) AS sum FROM dealer
0220 GROUP BY city, car_model WITH ROLLUP
0221 ORDER BY city, car_model;
0222 +---------+------------+---+
0223 | city| car_model|sum|
0224 +---------+------------+---+
0225 | null| null| 78|
0226 | Dublin| null| 33|
0227 | Dublin| HondaAccord| 10|
0228 | Dublin| HondaCRV| 3|
0229 | Dublin| HondaCivic| 20|
0230 | Fremont| null| 32|
0231 | Fremont| HondaAccord| 15|
0232 | Fremont| HondaCRV| 7|
0233 | Fremont| HondaCivic| 10|
0234 | San Jose| null| 13|
0235 | San Jose| HondaAccord| 8|
0236 | San Jose| HondaCivic| 5|
0237 +---------+------------+---+
0238
0239 -- Group by processing with `CUBE` clause.
0240 -- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
0241 SELECT city, car_model, sum(quantity) AS sum FROM dealer
0242 GROUP BY city, car_model WITH CUBE
0243 ORDER BY city, car_model;
0244 +---------+------------+---+
0245 | city| car_model|sum|
0246 +---------+------------+---+
0247 | null| null| 78|
0248 | null| HondaAccord| 33|
0249 | null| HondaCRV| 10|
0250 | null| HondaCivic| 35|
0251 | Dublin| null| 33|
0252 | Dublin| HondaAccord| 10|
0253 | Dublin| HondaCRV| 3|
0254 | Dublin| HondaCivic| 20|
0255 | Fremont| null| 32|
0256 | Fremont| HondaAccord| 15|
0257 | Fremont| HondaCRV| 7|
0258 | Fremont| HondaCivic| 10|
0259 | San Jose| null| 13|
0260 | San Jose| HondaAccord| 8|
0261 | San Jose| HondaCivic| 5|
0262 +---------+------------+---+
0263 ```
0264
0265 ### Related Statements
0266
0267 * [SELECT Main](sql-ref-syntax-qry-select.html)
0268 * [WHERE Clause](sql-ref-syntax-qry-select-where.html)
0269 * [HAVING Clause](sql-ref-syntax-qry-select-having.html)
0270 * [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
0271 * [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
0272 * [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
0273 * [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
0274 * [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)