Back to home page

OSCL-LXR

 
 

    


0001 ---
0002 layout: global
0003 title: GROUP BY Clause
0004 displayTitle: GROUP BY Clause
0005 license: |
0006   Licensed to the Apache Software Foundation (ASF) under one or more
0007   contributor license agreements.  See the NOTICE file distributed with
0008   this work for additional information regarding copyright ownership.
0009   The ASF licenses this file to You under the Apache License, Version 2.0
0010   (the "License"); you may not use this file except in compliance with
0011   the License.  You may obtain a copy of the License at
0012 
0013      http://www.apache.org/licenses/LICENSE-2.0
0014 
0015   Unless required by applicable law or agreed to in writing, software
0016   distributed under the License is distributed on an "AS IS" BASIS,
0017   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018   See the License for the specific language governing permissions and
0019   limitations under the License.
0020 ---
0021 
0022 ### Description
0023 
0024 The `GROUP BY` clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on
0025 the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple
0026 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses.
0027 When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
0028 
0029 ### Syntax
0030 
0031 ```sql
0032 GROUP BY group_expression [ , group_expression [ , ... ] ]
0033     [ { WITH ROLLUP | WITH CUBE | GROUPING SETS (grouping_set [ , ...]) } ]
0034 
0035 GROUP BY GROUPING SETS (grouping_set [ , ...])
0036 ```
0037 
0038 While aggregate functions are defined as
0039 ```sql
0040 aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_expression ) ]
0041 ```
0042 
0043 ### Parameters
0044 
0045 * **GROUPING SETS**
0046 
0047     Groups the rows for each subset of the expressions specified in the grouping sets. For example,
0048     `GROUP BY GROUPING SETS (warehouse, product)` is semantically equivalent
0049     to union of results of `GROUP BY warehouse` and `GROUP BY product`. This clause
0050     is a shorthand for a `UNION ALL` where each leg of the `UNION ALL`
0051     operator performs aggregation of subset of the columns specified in the `GROUPING SETS` clause.
0052 
0053 * **grouping_set**
0054 
0055     A grouping set is specified by zero or more comma-separated expressions in parentheses.
0056 
0057     **Syntax:** `( [ expression [ , ... ] ] )`
0058 
0059 * **grouping_expression**
0060 
0061     Specifies the critieria based on which the rows are grouped together. The grouping of rows is performed based on
0062     result values of the grouping expressions. A grouping expression may be a column alias, a column position
0063     or an expression.
0064 
0065 * **ROLLUP**
0066 
0067     Specifies multiple levels of aggregations in a single statement. This clause is used to compute aggregations
0068     based on multiple grouping sets. `ROLLUP` is a shorthand for `GROUPING SETS`. For example,
0069     `GROUP BY warehouse, product WITH ROLLUP` is equivalent to `GROUP BY GROUPING SETS
0070     ((warehouse, product), (warehouse), ())`.
0071     The N elements of a `ROLLUP` specification results in N+1 `GROUPING SETS`.
0072 
0073 * **CUBE**
0074 
0075     `CUBE` clause is used to perform aggregations based on combination of grouping columns specified in the
0076     `GROUP BY` clause. `CUBE` is a shorthand for `GROUPING SETS`. For example,
0077     `GROUP BY warehouse, product WITH CUBE` is equivalent to `GROUP BY GROUPING SETS
0078     ((warehouse, product), (warehouse), (product), ())`.
0079     The N elements of a `CUBE` specification results in 2^N `GROUPING SETS`.
0080 
0081 * **aggregate_name**
0082 
0083     Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
0084 
0085 * **DISTINCT**
0086 
0087     Removes duplicates in input rows before they are passed to aggregate functions.
0088 
0089 * **FILTER**
0090 
0091     Filters the input rows for which the `boolean_expression` in the `WHERE` clause evaluates
0092     to true are passed to the aggregate function; other rows are discarded.
0093 
0094 ### Examples
0095 
0096 ```sql
0097 CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
0098 INSERT INTO dealer VALUES
0099     (100, 'Fremont', 'Honda Civic', 10),
0100     (100, 'Fremont', 'Honda Accord', 15),
0101     (100, 'Fremont', 'Honda CRV', 7),
0102     (200, 'Dublin', 'Honda Civic', 20),
0103     (200, 'Dublin', 'Honda Accord', 10),
0104     (200, 'Dublin', 'Honda CRV', 3),
0105     (300, 'San Jose', 'Honda Civic', 5),
0106     (300, 'San Jose', 'Honda Accord', 8);
0107 
0108 -- Sum of quantity per dealership. Group by `id`.
0109 SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;
0110 +---+-------------+
0111 | id|sum(quantity)|
0112 +---+-------------+
0113 |100|           32|
0114 |200|           33|
0115 |300|           13|
0116 +---+-------------+
0117 
0118 -- Use column position in GROUP by clause.
0119 SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;
0120 +---+-------------+
0121 | id|sum(quantity)|
0122 +---+-------------+
0123 |100|           32|
0124 |200|           33|
0125 |300|           13|
0126 +---+-------------+
0127 
0128 -- Multiple aggregations.
0129 -- 1. Sum of quantity per dealership.
0130 -- 2. Max quantity per dealership.
0131 SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM dealer GROUP BY id ORDER BY id;
0132 +---+---+---+
0133 | id|sum|max|
0134 +---+---+---+
0135 |100| 32| 15|
0136 |200| 33| 20|
0137 |300| 13|  8|
0138 +---+---+---+
0139 
0140 -- Count the number of distinct dealer cities per car_model.
0141 SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
0142 +------------+-----+
0143 |   car_model|count|
0144 +------------+-----+
0145 | Honda Civic|    3|
0146 |   Honda CRV|    2|
0147 |Honda Accord|    3|
0148 +------------+-----+
0149 
0150 -- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
0151 SELECT id, sum(quantity) FILTER (
0152             WHERE car_model IN ('Honda Civic', 'Honda CRV')
0153         ) AS `sum(quantity)` FROM dealer
0154     GROUP BY id ORDER BY id;
0155 +---+-------------+
0156 | id|sum(quantity)|
0157 +---+-------------+
0158 |100|           17|
0159 |200|           23|
0160 |300|            5|
0161 +---+-------------+
0162 
0163 -- Aggregations using multiple sets of grouping columns in a single statement.
0164 -- Following performs aggregations based on four sets of grouping columns.
0165 -- 1. city, car_model
0166 -- 2. city
0167 -- 3. car_model
0168 -- 4. Empty grouping set. Returns quantities for all city and car models.
0169 SELECT city, car_model, sum(quantity) AS sum FROM dealer
0170     GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
0171     ORDER BY city;
0172 +---------+------------+---+
0173 |     city|   car_model|sum|
0174 +---------+------------+---+
0175 |     null|        null| 78|
0176 |     null| HondaAccord| 33|
0177 |     null|    HondaCRV| 10|
0178 |     null|  HondaCivic| 35|
0179 |   Dublin|        null| 33|
0180 |   Dublin| HondaAccord| 10|
0181 |   Dublin|    HondaCRV|  3|
0182 |   Dublin|  HondaCivic| 20|
0183 |  Fremont|        null| 32|
0184 |  Fremont| HondaAccord| 15|
0185 |  Fremont|    HondaCRV|  7|
0186 |  Fremont|  HondaCivic| 10|
0187 | San Jose|        null| 13|
0188 | San Jose| HondaAccord|  8|
0189 | San Jose|  HondaCivic|  5|
0190 +---------+------------+---+
0191 
0192 -- Alternate syntax for `GROUPING SETS` in which both `GROUP BY` and `GROUPING SETS`
0193 -- specifications are present.
0194 SELECT city, car_model, sum(quantity) AS sum FROM dealer
0195     GROUP BY city, car_model GROUPING SETS ((city, car_model), (city), (car_model), ())
0196     ORDER BY city, car_model;
0197 +---------+------------+---+
0198 |     city|   car_model|sum|
0199 +---------+------------+---+
0200 |     null|        null| 78|
0201 |     null| HondaAccord| 33|
0202 |     null|    HondaCRV| 10|
0203 |     null|  HondaCivic| 35|
0204 |   Dublin|        null| 33|
0205 |   Dublin| HondaAccord| 10|
0206 |   Dublin|    HondaCRV|  3|
0207 |   Dublin|  HondaCivic| 20|
0208 |  Fremont|        null| 32|
0209 |  Fremont| HondaAccord| 15|
0210 |  Fremont|    HondaCRV|  7|
0211 |  Fremont|  HondaCivic| 10|
0212 | San Jose|        null| 13|
0213 | San Jose| HondaAccord|  8|
0214 | San Jose|  HondaCivic|  5|
0215 +---------+------------+---+
0216 
0217 -- Group by processing with `ROLLUP` clause.
0218 -- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), ())
0219 SELECT city, car_model, sum(quantity) AS sum FROM dealer
0220     GROUP BY city, car_model WITH ROLLUP
0221     ORDER BY city, car_model;
0222 +---------+------------+---+
0223 |     city|   car_model|sum|
0224 +---------+------------+---+
0225 |     null|        null| 78|
0226 |   Dublin|        null| 33|
0227 |   Dublin| HondaAccord| 10|
0228 |   Dublin|    HondaCRV|  3|
0229 |   Dublin|  HondaCivic| 20|
0230 |  Fremont|        null| 32|
0231 |  Fremont| HondaAccord| 15|
0232 |  Fremont|    HondaCRV|  7|
0233 |  Fremont|  HondaCivic| 10|
0234 | San Jose|        null| 13|
0235 | San Jose| HondaAccord|  8|
0236 | San Jose|  HondaCivic|  5|
0237 +---------+------------+---+
0238 
0239 -- Group by processing with `CUBE` clause.
0240 -- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
0241 SELECT city, car_model, sum(quantity) AS sum FROM dealer
0242     GROUP BY city, car_model WITH CUBE
0243     ORDER BY city, car_model;
0244 +---------+------------+---+
0245 |     city|   car_model|sum|
0246 +---------+------------+---+
0247 |     null|        null| 78|
0248 |     null| HondaAccord| 33|
0249 |     null|    HondaCRV| 10|
0250 |     null|  HondaCivic| 35|
0251 |   Dublin|        null| 33|
0252 |   Dublin| HondaAccord| 10|
0253 |   Dublin|    HondaCRV|  3|
0254 |   Dublin|  HondaCivic| 20|
0255 |  Fremont|        null| 32|
0256 |  Fremont| HondaAccord| 15|
0257 |  Fremont|    HondaCRV|  7|
0258 |  Fremont|  HondaCivic| 10|
0259 | San Jose|        null| 13|
0260 | San Jose| HondaAccord|  8|
0261 | San Jose|  HondaCivic|  5|
0262 +---------+------------+---+
0263 ```
0264 
0265 ### Related Statements
0266 
0267 * [SELECT Main](sql-ref-syntax-qry-select.html)
0268 * [WHERE Clause](sql-ref-syntax-qry-select-where.html)
0269 * [HAVING Clause](sql-ref-syntax-qry-select-having.html)
0270 * [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
0271 * [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
0272 * [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
0273 * [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
0274 * [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)