Back to home page

OSCL-LXR

 
 

    


0001 ---
0002 layout: global
0003 title: Window Functions
0004 displayTitle: Window Functions
0005 license: |
0006   Licensed to the Apache Software Foundation (ASF) under one or more
0007   contributor license agreements.  See the NOTICE file distributed with
0008   this work for additional information regarding copyright ownership.
0009   The ASF licenses this file to You under the Apache License, Version 2.0
0010   (the "License"); you may not use this file except in compliance with
0011   the License.  You may obtain a copy of the License at
0012  
0013      http://www.apache.org/licenses/LICENSE-2.0
0014  
0015   Unless required by applicable law or agreed to in writing, software
0016   distributed under the License is distributed on an "AS IS" BASIS,
0017   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018   See the License for the specific language governing permissions and
0019   limitations under the License.
0020 ---
0021 
0022 ### Description
0023 
0024 Window functions operate on a group of rows, referred to as a window, and calculate a return value for each row based on the group of rows. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row.
0025 
0026 ### Syntax
0027 
0028 ```sql
0029 window_function OVER
0030 ( [  { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ , ... ] ) ]
0031   { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ]
0032   [ window_frame ] )
0033 ```
0034 
0035 ### Parameters
0036 
0037 * **window_function**
0038 
0039     * Ranking Functions
0040 
0041       **Syntax:** `RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER`
0042 
0043     * Analytic Functions
0044 
0045       **Syntax:** `CUME_DIST | LAG | LEAD`
0046 
0047     * Aggregate Functions
0048 
0049       **Syntax:** `MAX | MIN | COUNT | SUM | AVG | ...`
0050 
0051       Please refer to the [Built-in Aggregation Functions](sql-ref-functions-builtin.html#aggregate-functions) document for a complete list of Spark aggregate functions.
0052 
0053 * **window_frame**
0054 
0055     Specifies which row to start the window on and where to end it.
0056 
0057     **Syntax:**
0058 
0059     `{ RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end }`
0060 
0061     * `frame_start` and `frame_end` have the following syntax:
0062 
0063       **Syntax:**
0064 
0065       `UNBOUNDED PRECEDING | offset PRECEDING | CURRENT ROW | offset FOLLOWING | UNBOUNDED FOLLOWING`
0066 
0067       `offset:` specifies the `offset` from the position of the current row.
0068 
0069     **Note:** If `frame_end` is omitted it defaults to `CURRENT ROW`.
0070 
0071 ### Examples
0072 
0073 ```sql
0074 CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT);
0075 
0076 INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35);
0077 INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38);
0078 INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28);
0079 INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33);
0080 INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33);
0081 INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28);
0082 INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38);
0083 INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23);
0084 INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25);
0085 
0086 SELECT * FROM employees;
0087 +-----+-----------+------+-----+
0088 | name|       dept|salary|  age|
0089 +-----+-----------+------+-----+
0090 |Chloe|Engineering| 23000|   25|
0091 | Fred|Engineering| 21000|   28|
0092 | Paul|Engineering| 29000|   23|
0093 |Helen|  Marketing| 29000|   40|
0094 |  Tom|Engineering| 23000|   33|
0095 | Jane|  Marketing| 29000|   28|
0096 | Jeff|  Marketing| 35000|   38|
0097 | Evan|      Sales| 32000|   38|
0098 | Lisa|      Sales| 10000|   35|
0099 | Alex|      Sales| 30000|   33|
0100 +-----+-----------+------+-----+
0101 
0102 SELECT name, dept, RANK() OVER (PARTITION BY dept ORDER BY salary) AS rank FROM employees;
0103 +-----+-----------+------+----+
0104 | name|       dept|salary|rank|
0105 +-----+-----------+------+----+
0106 | Lisa|      Sales| 10000|   1|
0107 | Alex|      Sales| 30000|   2|
0108 | Evan|      Sales| 32000|   3|
0109 | Fred|Engineering| 21000|   1|
0110 |  Tom|Engineering| 23000|   2|
0111 |Chloe|Engineering| 23000|   2|
0112 | Paul|Engineering| 29000|   4|
0113 |Helen|  Marketing| 29000|   1|
0114 | Jane|  Marketing| 29000|   1|
0115 | Jeff|  Marketing| 35000|   3|
0116 +-----+-----------+------+----+
0117 
0118 SELECT name, dept, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary ROWS BETWEEN
0119     UNBOUNDED PRECEDING AND CURRENT ROW) AS dense_rank FROM employees;
0120 +-----+-----------+------+----------+
0121 | name|       dept|salary|dense_rank|
0122 +-----+-----------+------+----------+
0123 | Lisa|      Sales| 10000|         1|
0124 | Alex|      Sales| 30000|         2|
0125 | Evan|      Sales| 32000|         3|
0126 | Fred|Engineering| 21000|         1|
0127 |  Tom|Engineering| 23000|         2|
0128 |Chloe|Engineering| 23000|         2|
0129 | Paul|Engineering| 29000|         3|
0130 |Helen|  Marketing| 29000|         1|
0131 | Jane|  Marketing| 29000|         1|
0132 | Jeff|  Marketing| 35000|         2|
0133 +-----+-----------+------+----------+
0134 
0135 SELECT name, dept, age, CUME_DIST() OVER (PARTITION BY dept ORDER BY age
0136     RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cume_dist FROM employees;
0137 +-----+-----------+------+------------------+
0138 | name|       dept|age   |         cume_dist|
0139 +-----+-----------+------+------------------+
0140 | Alex|      Sales|    33|0.3333333333333333|
0141 | Lisa|      Sales|    35|0.6666666666666666|
0142 | Evan|      Sales|    38|               1.0|
0143 | Paul|Engineering|    23|              0.25|
0144 |Chloe|Engineering|    25|              0.75|
0145 | Fred|Engineering|    28|              0.25|
0146 |  Tom|Engineering|    33|               1.0|
0147 | Jane|  Marketing|    28|0.3333333333333333|
0148 | Jeff|  Marketing|    38|0.6666666666666666|
0149 |Helen|  Marketing|    40|               1.0|
0150 +-----+-----------+------+------------------+
0151 
0152 SELECT name, dept, salary, MIN(salary) OVER (PARTITION BY dept ORDER BY salary) AS min
0153     FROM employees;
0154 +-----+-----------+------+-----+
0155 | name|       dept|salary|  min|
0156 +-----+-----------+------+-----+
0157 | Lisa|      Sales| 10000|10000|
0158 | Alex|      Sales| 30000|10000|
0159 | Evan|      Sales| 32000|10000|
0160 |Helen|  Marketing| 29000|29000|
0161 | Jane|  Marketing| 29000|29000|
0162 | Jeff|  Marketing| 35000|29000|
0163 | Fred|Engineering| 21000|21000|
0164 |  Tom|Engineering| 23000|21000|
0165 |Chloe|Engineering| 23000|21000|
0166 | Paul|Engineering| 29000|21000|
0167 +-----+-----------+------+-----+
0168 
0169 SELECT name, salary,
0170     LAG(salary) OVER (PARTITION BY dept ORDER BY salary) AS lag,
0171     LEAD(salary, 1, 0) OVER (PARTITION BY dept ORDER BY salary) AS lead
0172     FROM employees;
0173 +-----+-----------+------+-----+-----+
0174 | name|       dept|salary|  lag| lead|
0175 +-----+-----------+------+-----+-----+
0176 | Lisa|      Sales| 10000|NULL |30000|
0177 | Alex|      Sales| 30000|10000|32000|
0178 | Evan|      Sales| 32000|30000|    0|
0179 | Fred|Engineering| 21000| NULL|23000|
0180 |Chloe|Engineering| 23000|21000|23000|
0181 |  Tom|Engineering| 23000|23000|29000|
0182 | Paul|Engineering| 29000|23000|    0|
0183 |Helen|  Marketing| 29000| NULL|29000|
0184 | Jane|  Marketing| 29000|29000|35000|
0185 | Jeff|  Marketing| 35000|29000|    0|
0186 +-----+-----------+------+-----+-----+
0187 ```
0188 
0189 ### Related Statements
0190 
0191 * [SELECT](sql-ref-syntax-qry-select.html)