0001 ---
0002 layout: global
0003 title: Window Functions
0004 displayTitle: Window Functions
0005 license: |
0006 Licensed to the Apache Software Foundation (ASF) under one or more
0007 contributor license agreements. See the NOTICE file distributed with
0008 this work for additional information regarding copyright ownership.
0009 The ASF licenses this file to You under the Apache License, Version 2.0
0010 (the "License"); you may not use this file except in compliance with
0011 the License. You may obtain a copy of the License at
0012
0013 http://www.apache.org/licenses/LICENSE-2.0
0014
0015 Unless required by applicable law or agreed to in writing, software
0016 distributed under the License is distributed on an "AS IS" BASIS,
0017 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018 See the License for the specific language governing permissions and
0019 limitations under the License.
0020 ---
0021
0022 ### Description
0023
0024 Window functions operate on a group of rows, referred to as a window, and calculate a return value for each row based on the group of rows. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row.
0025
0026 ### Syntax
0027
0028 ```sql
0029 window_function OVER
0030 ( [ { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ , ... ] ) ]
0031 { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ]
0032 [ window_frame ] )
0033 ```
0034
0035 ### Parameters
0036
0037 * **window_function**
0038
0039 * Ranking Functions
0040
0041 **Syntax:** `RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER`
0042
0043 * Analytic Functions
0044
0045 **Syntax:** `CUME_DIST | LAG | LEAD`
0046
0047 * Aggregate Functions
0048
0049 **Syntax:** `MAX | MIN | COUNT | SUM | AVG | ...`
0050
0051 Please refer to the [Built-in Aggregation Functions](sql-ref-functions-builtin.html#aggregate-functions) document for a complete list of Spark aggregate functions.
0052
0053 * **window_frame**
0054
0055 Specifies which row to start the window on and where to end it.
0056
0057 **Syntax:**
0058
0059 `{ RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end }`
0060
0061 * `frame_start` and `frame_end` have the following syntax:
0062
0063 **Syntax:**
0064
0065 `UNBOUNDED PRECEDING | offset PRECEDING | CURRENT ROW | offset FOLLOWING | UNBOUNDED FOLLOWING`
0066
0067 `offset:` specifies the `offset` from the position of the current row.
0068
0069 **Note:** If `frame_end` is omitted it defaults to `CURRENT ROW`.
0070
0071 ### Examples
0072
0073 ```sql
0074 CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT);
0075
0076 INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35);
0077 INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38);
0078 INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28);
0079 INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33);
0080 INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33);
0081 INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28);
0082 INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38);
0083 INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23);
0084 INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25);
0085
0086 SELECT * FROM employees;
0087 +-----+-----------+------+-----+
0088 | name| dept|salary| age|
0089 +-----+-----------+------+-----+
0090 |Chloe|Engineering| 23000| 25|
0091 | Fred|Engineering| 21000| 28|
0092 | Paul|Engineering| 29000| 23|
0093 |Helen| Marketing| 29000| 40|
0094 | Tom|Engineering| 23000| 33|
0095 | Jane| Marketing| 29000| 28|
0096 | Jeff| Marketing| 35000| 38|
0097 | Evan| Sales| 32000| 38|
0098 | Lisa| Sales| 10000| 35|
0099 | Alex| Sales| 30000| 33|
0100 +-----+-----------+------+-----+
0101
0102 SELECT name, dept, RANK() OVER (PARTITION BY dept ORDER BY salary) AS rank FROM employees;
0103 +-----+-----------+------+----+
0104 | name| dept|salary|rank|
0105 +-----+-----------+------+----+
0106 | Lisa| Sales| 10000| 1|
0107 | Alex| Sales| 30000| 2|
0108 | Evan| Sales| 32000| 3|
0109 | Fred|Engineering| 21000| 1|
0110 | Tom|Engineering| 23000| 2|
0111 |Chloe|Engineering| 23000| 2|
0112 | Paul|Engineering| 29000| 4|
0113 |Helen| Marketing| 29000| 1|
0114 | Jane| Marketing| 29000| 1|
0115 | Jeff| Marketing| 35000| 3|
0116 +-----+-----------+------+----+
0117
0118 SELECT name, dept, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary ROWS BETWEEN
0119 UNBOUNDED PRECEDING AND CURRENT ROW) AS dense_rank FROM employees;
0120 +-----+-----------+------+----------+
0121 | name| dept|salary|dense_rank|
0122 +-----+-----------+------+----------+
0123 | Lisa| Sales| 10000| 1|
0124 | Alex| Sales| 30000| 2|
0125 | Evan| Sales| 32000| 3|
0126 | Fred|Engineering| 21000| 1|
0127 | Tom|Engineering| 23000| 2|
0128 |Chloe|Engineering| 23000| 2|
0129 | Paul|Engineering| 29000| 3|
0130 |Helen| Marketing| 29000| 1|
0131 | Jane| Marketing| 29000| 1|
0132 | Jeff| Marketing| 35000| 2|
0133 +-----+-----------+------+----------+
0134
0135 SELECT name, dept, age, CUME_DIST() OVER (PARTITION BY dept ORDER BY age
0136 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cume_dist FROM employees;
0137 +-----+-----------+------+------------------+
0138 | name| dept|age | cume_dist|
0139 +-----+-----------+------+------------------+
0140 | Alex| Sales| 33|0.3333333333333333|
0141 | Lisa| Sales| 35|0.6666666666666666|
0142 | Evan| Sales| 38| 1.0|
0143 | Paul|Engineering| 23| 0.25|
0144 |Chloe|Engineering| 25| 0.75|
0145 | Fred|Engineering| 28| 0.25|
0146 | Tom|Engineering| 33| 1.0|
0147 | Jane| Marketing| 28|0.3333333333333333|
0148 | Jeff| Marketing| 38|0.6666666666666666|
0149 |Helen| Marketing| 40| 1.0|
0150 +-----+-----------+------+------------------+
0151
0152 SELECT name, dept, salary, MIN(salary) OVER (PARTITION BY dept ORDER BY salary) AS min
0153 FROM employees;
0154 +-----+-----------+------+-----+
0155 | name| dept|salary| min|
0156 +-----+-----------+------+-----+
0157 | Lisa| Sales| 10000|10000|
0158 | Alex| Sales| 30000|10000|
0159 | Evan| Sales| 32000|10000|
0160 |Helen| Marketing| 29000|29000|
0161 | Jane| Marketing| 29000|29000|
0162 | Jeff| Marketing| 35000|29000|
0163 | Fred|Engineering| 21000|21000|
0164 | Tom|Engineering| 23000|21000|
0165 |Chloe|Engineering| 23000|21000|
0166 | Paul|Engineering| 29000|21000|
0167 +-----+-----------+------+-----+
0168
0169 SELECT name, salary,
0170 LAG(salary) OVER (PARTITION BY dept ORDER BY salary) AS lag,
0171 LEAD(salary, 1, 0) OVER (PARTITION BY dept ORDER BY salary) AS lead
0172 FROM employees;
0173 +-----+-----------+------+-----+-----+
0174 | name| dept|salary| lag| lead|
0175 +-----+-----------+------+-----+-----+
0176 | Lisa| Sales| 10000|NULL |30000|
0177 | Alex| Sales| 30000|10000|32000|
0178 | Evan| Sales| 32000|30000| 0|
0179 | Fred|Engineering| 21000| NULL|23000|
0180 |Chloe|Engineering| 23000|21000|23000|
0181 | Tom|Engineering| 23000|23000|29000|
0182 | Paul|Engineering| 29000|23000| 0|
0183 |Helen| Marketing| 29000| NULL|29000|
0184 | Jane| Marketing| 29000|29000|35000|
0185 | Jeff| Marketing| 35000|29000| 0|
0186 +-----+-----------+------+-----+-----+
0187 ```
0188
0189 ### Related Statements
0190
0191 * [SELECT](sql-ref-syntax-qry-select.html)