Back to home page

OSCL-LXR

 
 

    


0001 ---
0002 layout: global
0003 title: Datetime patterns
0004 displayTitle: Datetime Patterns for Formatting and Parsing
0005 license: |
0006   Licensed to the Apache Software Foundation (ASF) under one or more
0007   contributor license agreements.  See the NOTICE file distributed with
0008   this work for additional information regarding copyright ownership.
0009   The ASF licenses this file to You under the Apache License, Version 2.0
0010   (the "License"); you may not use this file except in compliance with
0011   the License.  You may obtain a copy of the License at
0012 
0013      http://www.apache.org/licenses/LICENSE-2.0
0014 
0015   Unless required by applicable law or agreed to in writing, software
0016   distributed under the License is distributed on an "AS IS" BASIS,
0017   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
0018   See the License for the specific language governing permissions and
0019   limitations under the License.
0020 ---
0021 
0022 There are several common scenarios for datetime usage in Spark:
0023 
0024 - CSV/JSON datasources use the pattern string for parsing and formatting datetime content.
0025 
0026 - Datetime functions related to convert `StringType` to/from `DateType` or `TimestampType`.
0027   For example, `unix_timestamp`, `date_format`, `to_unix_timestamp`, `from_unixtime`, `to_date`, `to_timestamp`, `from_utc_timestamp`, `to_utc_timestamp`, etc.
0028 
0029 Spark uses pattern letters in the following table for date and timestamp parsing and formatting:
0030 
0031 |Symbol|Meaning|Presentation|Examples|
0032 |------|-------|------------|--------|
0033 |**G**|era|text|AD; Anno Domini|
0034 |**y**|year|year|2020; 20|
0035 |**D**|day-of-year|number(3)|189|
0036 |**M/L**|month-of-year|month|7; 07; Jul; July|
0037 |**d**|day-of-month|number(3)|28|
0038 |**Q/q**|quarter-of-year|number/text|3; 03; Q3; 3rd quarter|
0039 |**E**|day-of-week|text|Tue; Tuesday|
0040 |**F**|week-of-month|number(1)|3|
0041 |**a**|am-pm-of-day|am-pm|PM|
0042 |**h**|clock-hour-of-am-pm (1-12)|number(2)|12|
0043 |**K**|hour-of-am-pm (0-11)|number(2)|0|
0044 |**k**|clock-hour-of-day (1-24)|number(2)|0|
0045 |**H**|hour-of-day (0-23)|number(2)|0|
0046 |**m**|minute-of-hour|number(2)|30|
0047 |**s**|second-of-minute|number(2)|55|
0048 |**S**|fraction-of-second|fraction|978|
0049 |**V**|time-zone ID|zone-id|America/Los_Angeles; Z; -08:30|
0050 |**z**|time-zone name|zone-name|Pacific Standard Time; PST|
0051 |**O**|localized zone-offset|offset-O|GMT+8; GMT+08:00; UTC-08:00;|
0052 |**X**|zone-offset 'Z' for zero|offset-X|Z; -08; -0830; -08:30; -083015; -08:30:15;|
0053 |**x**|zone-offset|offset-x|+0000; -08; -0830; -08:30; -083015; -08:30:15;|
0054 |**Z**|zone-offset|offset-Z|+0000; -0800; -08:00;|
0055 |**'**|escape for text|delimiter| |
0056 |**''**|single quote|literal|'|
0057 |**[**|optional section start| | |
0058 |**]**|optional section end| | |
0059 
0060 The count of pattern letters determines the format.
0061 
0062 - Text: The text style is determined based on the number of pattern letters used. Less than 4 pattern letters will use the short text form, typically an abbreviation, e.g. day-of-week Monday might output "Mon". Exactly 4 pattern letters will use the full text form, typically the full description, e.g, day-of-week Monday might output "Monday". 5 or more letters will fail.
0063 
0064 - Number(n): The n here represents the maximum count of letters this type of datetime pattern can be used. If the count of letters is one, then the value is output using the minimum number of digits and without padding. Otherwise, the count of digits is used as the width of the output field, with the value zero-padded as necessary.
0065 
0066 - Number/Text: If the count of pattern letters is 3 or greater, use the Text rules above. Otherwise use the Number rules above.
0067 
0068 - Fraction: Use one or more (up to 9) contiguous `'S'` characters, e,g `SSSSSS`, to parse and format fraction of second.
0069   For parsing, the acceptable fraction length can be [1, the number of contiguous 'S'].
0070   For formatting, the fraction length would be padded to the number of contiguous 'S' with zeros.
0071   Spark supports datetime of micro-of-second precision, which has up to 6 significant digits, but can parse nano-of-second with exceeded part truncated.
0072 
0073 - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. 11 or more letters will fail.
0074 
0075 - Month: It follows the rule of Number/Text. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. These two forms are different only in some certain languages. For example, in Russian, 'Июль' is the stand-alone form of July, and 'Июля' is the standard form. Here are examples for all supported pattern letters:
0076   - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding.
0077     ```sql
0078     spark-sql> select date_format(date '1970-01-01', "M");
0079     1
0080     spark-sql> select date_format(date '1970-12-01', "L");
0081     12
0082     ```
0083   - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is added for month 1-9.
0084       ```sql
0085       spark-sql> select date_format(date '1970-1-01', "LL");
0086       01
0087       spark-sql> select date_format(date '1970-09-01', "MM");
0088       09
0089       ```
0090   - `'MMM'`: Short textual representation in the standard form. The month pattern should be a part of a date pattern not just a stand-alone month except locales where there is no difference between stand and stand-alone forms like in English.
0091     ```sql
0092     spark-sql> select date_format(date '1970-01-01', "d MMM");
0093     1 Jan
0094     spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'dd MMM', 'locale', 'RU'));
0095     01 янв.
0096     ```
0097   - `'LLL'`: Short textual representation in the stand-alone form. It should be used to format/parse only months without any other date fields.
0098     ```sql
0099     spark-sql> select date_format(date '1970-01-01', "LLL");
0100     Jan
0101     spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'RU'));
0102     янв.
0103     ```
0104   - `'MMMM'`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps.
0105     ```sql
0106     spark-sql> select date_format(date '1970-01-01', "d MMMM");
0107     1 January
0108     spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd MMMM', 'locale', 'RU'));
0109     1 января
0110     ```
0111   - `'LLLL'`: full textual month representation in the stand-alone form. The pattern can be used to format/parse only months.
0112     ```sql
0113     spark-sql> select date_format(date '1970-01-01', "LLLL");
0114     January
0115     spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLLL', 'locale', 'RU'));
0116     январь
0117     ```
0118 
0119 - am-pm: This outputs the am-pm-of-day. Pattern letter count must be 1.
0120 
0121 - Zone ID(V): This outputs the display the time-zone ID. Pattern letter count must be 2.
0122 
0123 - Zone names(z): This outputs the display textual name of the time-zone ID. If the count of letters is one, two or three, then the short name is output. If the count of letters is four, then the full name is output. Five or more letters will fail.
0124 
0125 - Offset X and x: This formats the offset based on the number of pattern letters. One letter outputs just the hour, such as '+01', unless the minute is non-zero in which case the minute is also output, such as '+0130'. Two letters outputs the hour and minute, without a colon, such as '+0130'. Three letters outputs the hour and minute, with a colon, such as '+01:30'. Four letters outputs the hour and minute and optional second, without a colon, such as '+013015'. Five letters outputs the hour and minute and optional second, with a colon, such as '+01:30:15'. Six or more letters will fail. Pattern letter 'X' (upper case) will output 'Z' when the offset to be output would be zero, whereas pattern letter 'x' (lower case) will output '+00', '+0000', or '+00:00'.
0126 
0127 - Offset O: This formats the localized offset based on the number of pattern letters. One letter outputs the short form of the localized offset, which is localized offset text, such as 'GMT', with hour without leading zero, optional 2-digit minute and second if non-zero, and colon, for example 'GMT+8'. Four letters outputs the full form, which is localized offset text, such as 'GMT, with 2-digit hour and minute field, optional second field if non-zero, and colon, for example 'GMT+08:00'. Any other count of letters will fail.
0128 
0129 - Offset Z: This formats the offset based on the number of pattern letters. One, two or three letters outputs the hour and minute, without a colon, such as '+0130'. The output will be '+0000' when the offset is zero. Four letters outputs the full form of localized offset, equivalent to four letters of Offset-O. The output will be the corresponding localized offset text if the offset is zero. Five letters outputs the hour, minute, with optional second if non-zero, with colon. It outputs 'Z' if the offset is zero. Six or more letters will fail.
0130 
0131 - Optional section start and end: Use `[]` to define an optional section and maybe nested.
0132   During formatting, all valid data will be output even it is in the optional section.
0133   During parsing, the whole section may be missing from the parsed string.
0134   An optional section is started by `[` and ended using `]` (or at the end of the pattern).
0135   
0136 - Symbols of 'E', 'F', 'q' and 'Q' can only be used for datetime formatting, e.g. `date_format`. They are not allowed used for datetime parsing, e.g. `to_timestamp`.