0001 ---
0002 layout: global
0003 title: SELECT
0004 displayTitle: SELECT
0005 license: |
0006 Licensed to the Apache Software Foundation (ASF) under one or more
0007 contributor license agreements. See the NOTICE file distributed with
0008 this work for additional information regarding copyright ownership.
0009 The ASF licenses this file to You under the Apache License, Version 2.0
0010 (the "License"); you may not use this file except in compliance with
0011 the License. You may obtain a copy of the License at
0012
0013 http://www.apache.org/licenses/LICENSE-2.0
0014
0015 Unless required by applicable law or agreed to in writing, software
0016 distributed under the License is distributed on an "AS IS" BASIS,
0017 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018 See the License for the specific language governing permissions and
0019 limitations under the License.
0020 ---
0021
0022 ### Description
0023
0024 Spark supports a `SELECT` statement and conforms to the ANSI SQL standard. Queries are
0025 used to retrieve result sets from one or more tables. The following section
0026 describes the overall query syntax and the sub-sections cover different constructs
0027 of a query along with examples.
0028
0029 ### Syntax
0030
0031 ```sql
0032 [ WITH with_query [ , ... ] ]
0033 select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
0034 [ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
0035 [ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
0036 [ CLUSTER BY { expression [ , ...] } ]
0037 [ DISTRIBUTE BY { expression [, ...] } ]
0038 [ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
0039 [ LIMIT { ALL | expression } ]
0040 ```
0041
0042 While `select_statement` is defined as
0043 ```sql
0044 SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
0045 FROM { from_item [ , ...] }
0046 [ WHERE boolean_expression ]
0047 [ GROUP BY expression [ , ...] ]
0048 [ HAVING boolean_expression ]
0049 ```
0050
0051 ### Parameters
0052
0053 * **with_query**
0054
0055 Specifies the [common table expressions (CTEs)](sql-ref-syntax-qry-select-cte.html) before the main query block.
0056 These table expressions are allowed to be referenced later in the FROM clause. This is useful to abstract
0057 out repeated subquery blocks in the FROM clause and improves readability of the query.
0058
0059 * **hints**
0060
0061 Hints can be specified to help spark optimizer make better planning decisions. Currently spark supports hints
0062 that influence selection of join strategies and repartitioning of the data.
0063
0064 * **ALL**
0065
0066 Select all matching rows from the relation and is enabled by default.
0067
0068 * **DISTINCT**
0069
0070 Select all matching rows from the relation after removing duplicates in results.
0071
0072 * **named_expression**
0073
0074 An expression with an assigned name. In general, it denotes a column expression.
0075
0076 **Syntax:** `expression [AS] [alias]`
0077
0078 * **from_item**
0079
0080 Specifies a source of input for the query. It can be one of the following:
0081 * Table relation
0082 * [Join relation](sql-ref-syntax-qry-select-join.html)
0083 * [Table-value function](sql-ref-syntax-qry-select-tvf.html)
0084 * [Inline table](sql-ref-syntax-qry-select-inline-table.html)
0085 * Subquery
0086
0087
0088 * **WHERE**
0089
0090 Filters the result of the FROM clause based on the supplied predicates.
0091
0092 * **GROUP BY**
0093
0094 Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions
0095 (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group.
0096 When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
0097
0098 * **HAVING**
0099
0100 Specifies the predicates by which the rows produced by GROUP BY are filtered. The HAVING clause is used to
0101 filter rows after the grouping is performed. If HAVING is specified without GROUP BY, it indicates a GROUP BY
0102 without grouping expressions (global aggregate).
0103
0104 * **ORDER BY**
0105
0106 Specifies an ordering of the rows of the complete result set of the query. The output rows are ordered
0107 across the partitions. This parameter is mutually exclusive with `SORT BY`,
0108 `CLUSTER BY` and `DISTRIBUTE BY` and can not be specified together.
0109
0110 * **SORT BY**
0111
0112 Specifies an ordering by which the rows are ordered within each partition. This parameter is mutually
0113 exclusive with `ORDER BY` and `CLUSTER BY` and can not be specified together.
0114
0115 * **CLUSTER BY**
0116
0117 Specifies a set of expressions that is used to repartition and sort the rows. Using this clause has
0118 the same effect of using `DISTRIBUTE BY` and `SORT BY` together.
0119
0120 * **DISTRIBUTE BY**
0121
0122 Specifies a set of expressions by which the result rows are repartitioned. This parameter is mutually
0123 exclusive with `ORDER BY` and `CLUSTER BY` and can not be specified together.
0124
0125 * **LIMIT**
0126
0127 Specifies the maximum number of rows that can be returned by a statement or subquery. This clause
0128 is mostly used in the conjunction with `ORDER BY` to produce a deterministic result.
0129
0130 * **boolean_expression**
0131
0132 Specifies an expression with a return type of boolean.
0133
0134 * **expression**
0135
0136 Specifies a combination of one or more values, operators, and SQL functions that evaluates to a value.
0137
0138 * **named_window**
0139
0140 Specifies aliases for one or more source window specifications. The source window specifications can
0141 be referenced in the widow definitions in the query.
0142
0143 ### Related Statements
0144
0145 * [WHERE Clause](sql-ref-syntax-qry-select-where.html)
0146 * [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
0147 * [HAVING Clause](sql-ref-syntax-qry-select-having.html)
0148 * [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
0149 * [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
0150 * [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
0151 * [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
0152 * [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
0153 * [Common Table Expression](sql-ref-syntax-qry-select-cte.html)
0154 * [Hints](sql-ref-syntax-qry-select-hints.html)
0155 * [Inline Table](sql-ref-syntax-qry-select-inline-table.html)
0156 * [JOIN](sql-ref-syntax-qry-select-join.html)
0157 * [LIKE Predicate](sql-ref-syntax-qry-select-like.html)
0158 * [Set Operators](sql-ref-syntax-qry-select-setops.html)
0159 * [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html)
0160 * [Table-valued Function](sql-ref-syntax-qry-select-tvf.html)
0161 * [Window Function](sql-ref-syntax-qry-select-window.html)