187 lines
13 KiB
Markdown
187 lines
13 KiB
Markdown
[lex.ccon]
|
||
|
||
# 5 Lexical conventions [[lex]](./#lex)
|
||
|
||
## 5.13 Literals [[lex.literal]](lex.literal#lex.ccon)
|
||
|
||
### 5.13.3 Character literals [lex.ccon]
|
||
|
||
[character-literal:](#nt:character-literal "5.13.3 Character literals [lex.ccon]")
|
||
[*encoding-prefix*](#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt ' [*c-char-sequence*](#nt:c-char-sequence "5.13.3 Character literals [lex.ccon]") '
|
||
|
||
[encoding-prefix:](#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") one of
|
||
u8 u U L
|
||
|
||
[c-char-sequence:](#nt:c-char-sequence "5.13.3 Character literals [lex.ccon]")
|
||
[*c-char*](#nt:c-char "5.13.3 Character literals [lex.ccon]") [*c-char-sequence*](#nt:c-char-sequence "5.13.3 Character literals [lex.ccon]")opt
|
||
|
||
[c-char:](#nt:c-char "5.13.3 Character literals [lex.ccon]")
|
||
[*basic-c-char*](#nt:basic-c-char "5.13.3 Character literals [lex.ccon]")
|
||
[*escape-sequence*](#nt:escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]")
|
||
|
||
[basic-c-char:](#nt:basic-c-char "5.13.3 Character literals [lex.ccon]")
|
||
any member of the translation character set except the U+0027 apostrophe,
|
||
U+005c reverse solidus, or new-line character
|
||
|
||
[escape-sequence:](#nt:escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
[*simple-escape-sequence*](#nt:simple-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
[*numeric-escape-sequence*](#nt:numeric-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
[*conditional-escape-sequence*](#nt:conditional-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
|
||
[simple-escape-sequence:](#nt:simple-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
\ [*simple-escape-sequence-char*](#nt:simple-escape-sequence-char "5.13.3 Character literals [lex.ccon]")
|
||
|
||
[simple-escape-sequence-char:](#nt:simple-escape-sequence-char "5.13.3 Character literals [lex.ccon]") one of
|
||
' " ? \ a b f n r t v
|
||
|
||
[numeric-escape-sequence:](#nt:numeric-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
[*octal-escape-sequence*](#nt:octal-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
[*hexadecimal-escape-sequence*](#nt:hexadecimal-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
|
||
[simple-octal-digit-sequence:](#nt:simple-octal-digit-sequence "5.13.3 Character literals [lex.ccon]")
|
||
[*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]") [*simple-octal-digit-sequence*](#nt:simple-octal-digit-sequence "5.13.3 Character literals [lex.ccon]")opt
|
||
|
||
[octal-escape-sequence:](#nt:octal-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
\ [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]")
|
||
\ [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]") [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]")
|
||
\ [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]") [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]") [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]")
|
||
\o{ [*simple-octal-digit-sequence*](#nt:simple-octal-digit-sequence "5.13.3 Character literals [lex.ccon]") }
|
||
|
||
[hexadecimal-escape-sequence:](#nt:hexadecimal-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
\x [*simple-hexadecimal-digit-sequence*](lex.universal.char#nt:simple-hexadecimal-digit-sequence "5.3.2 Universal character names [lex.universal.char]")
|
||
\x{ [*simple-hexadecimal-digit-sequence*](lex.universal.char#nt:simple-hexadecimal-digit-sequence "5.3.2 Universal character names [lex.universal.char]") }
|
||
|
||
[conditional-escape-sequence:](#nt:conditional-escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||
\ [*conditional-escape-sequence-char*](#nt:conditional-escape-sequence-char "5.13.3 Character literals [lex.ccon]")
|
||
|
||
[conditional-escape-sequence-char:](#nt:conditional-escape-sequence-char "5.13.3 Character literals [lex.ccon]")
|
||
any member of the basic character set that is not an [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]"), a [*simple-escape-sequence-char*](#nt:simple-escape-sequence-char "5.13.3 Character literals [lex.ccon]"), or the characters N, o, u, U, or x
|
||
|
||
[1](#1)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1505)
|
||
|
||
A [*multicharacter literal*](#def:literal,multicharacter "5.13.3 Character literals [lex.ccon]") is a [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]") whose [*c-char-sequence*](#nt:c-char-sequence "5.13.3 Character literals [lex.ccon]") consists of
|
||
more than one [*c-char*](#nt:c-char "5.13.3 Character literals [lex.ccon]")[.](#1.sentence-1)
|
||
|
||
A multicharacter literal shall not have an [*encoding-prefix*](#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")[.](#1.sentence-2)
|
||
|
||
If a multicharacter literal contains a [*c-char*](#nt:c-char "5.13.3 Character literals [lex.ccon]") that is not encodable as a single code unit in the ordinary literal encoding,
|
||
the program is ill-formed[.](#1.sentence-3)
|
||
|
||
Multicharacter literals are conditionally-supported[.](#1.sentence-4)
|
||
|
||
[2](#2)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1525)
|
||
|
||
The kind of a [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]"),
|
||
its type, and its associated character encoding ([[lex.charset]](lex.charset "5.3.1 Character sets"))
|
||
are determined by
|
||
its [*encoding-prefix*](#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") and its [*c-char-sequence*](#nt:c-char-sequence "5.13.3 Character literals [lex.ccon]") as defined by Table [9](#tab:lex.ccon.literal "Table 9: Character literals")[.](#2.sentence-1)
|
||
|
||
Table [9](#tab:lex.ccon.literal) — Character literals [[tab:lex.ccon.literal]](./tab:lex.ccon.literal)
|
||
|
||
| [ð](#tab:lex.ccon.literal-row-1)<br>**Encoding** | **Kind** | **Type** | **Associated char-** | **Example** |
|
||
| --- | --- | --- | --- | --- |
|
||
| [ð](#tab:lex.ccon.literal-row-2)<br>**prefix** | | | **acter encoding** | |
|
||
| [ð](#tab:lex.ccon.literal-row-3)<br>none | [*ordinary character literal*](#def:literal,character,ordinary "5.13.3 Character literals [lex.ccon]") | char | ordinary literal | 'v' |
|
||
| [ð](#tab:lex.ccon.literal-row-4)<br> | multicharacter literal | int | encoding | 'abcd' |
|
||
| [ð](#tab:lex.ccon.literal-row-5)<br>L | [*wide character literal*](#def:literal,character,wide "5.13.3 Character literals [lex.ccon]") | wchar_t | wide literal | L'w' |
|
||
| [ð](#tab:lex.ccon.literal-row-6) | | | encoding | |
|
||
| [ð](#tab:lex.ccon.literal-row-7)<br>u8 | [*UTF-8 character literal*](#def:literal,character,UTF-8 "5.13.3 Character literals [lex.ccon]") | char8_t | UTF-8 | u8'x' |
|
||
| [ð](#tab:lex.ccon.literal-row-8)<br>u | [*UTF-16 character literal*](#def:literal,character,UTF-16 "5.13.3 Character literals [lex.ccon]") | char16_t | UTF-16 | u'y' |
|
||
| [ð](#tab:lex.ccon.literal-row-9)<br>U | [*UTF-32 character literal*](#def:literal,character,UTF-32 "5.13.3 Character literals [lex.ccon]") | char32_t | UTF-32 | U'z' |
|
||
|
||
[3](#3)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1571)
|
||
|
||
In translation phase 4,
|
||
the value of a [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]") is determined
|
||
using the range of representable values
|
||
of the [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]")'s type in translation phase 7[.](#3.sentence-1)
|
||
|
||
A multicharacter literal has animplementation-defined
|
||
value[.](#3.sentence-2)
|
||
|
||
The value of any other kind of [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]") is determined as follows:
|
||
|
||
- [(3.1)](#3.1)
|
||
|
||
A [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]") with
|
||
a [*c-char-sequence*](#nt:c-char-sequence "5.13.3 Character literals [lex.ccon]") consisting of a single[*basic-c-char*](#nt:basic-c-char "5.13.3 Character literals [lex.ccon]"),[*simple-escape-sequence*](#nt:simple-escape-sequence "5.13.3 Character literals [lex.ccon]"), or[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]") is the code unit value of the specified character
|
||
as encoded in the literal's associated character encoding[.](#3.1.sentence-1)
|
||
If the specified character lacks
|
||
representation in the literal's associated character encoding or
|
||
if it cannot be encoded as a single code unit,
|
||
then the program is ill-formed[.](#3.1.sentence-2)
|
||
|
||
- [(3.2)](#3.2)
|
||
|
||
A [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]") with
|
||
a [*c-char-sequence*](#nt:c-char-sequence "5.13.3 Character literals [lex.ccon]") consisting of
|
||
a single [*numeric-escape-sequence*](#nt:numeric-escape-sequence "5.13.3 Character literals [lex.ccon]") has a value as follows:
|
||
* [(3.2.1)](#3.2.1)
|
||
|
||
Let v be the integer value represented by
|
||
the octal number comprising
|
||
the sequence of [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]")*s* in
|
||
an [*octal-escape-sequence*](#nt:octal-escape-sequence "5.13.3 Character literals [lex.ccon]") or by
|
||
the hexadecimal number comprising
|
||
the sequence of [*hexadecimal-digit*](lex.icon#nt:hexadecimal-digit "5.13.2 Integer literals [lex.icon]")*s* in
|
||
a [*hexadecimal-escape-sequence*](#nt:hexadecimal-escape-sequence "5.13.3 Character literals [lex.ccon]")[.](#3.2.1.sentence-1)
|
||
|
||
* [(3.2.2)](#3.2.2)
|
||
|
||
If v does not exceed
|
||
the range of representable values of the [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]")'s type,
|
||
then the value is v[.](#3.2.2.sentence-1)
|
||
|
||
* [(3.2.3)](#3.2.3)
|
||
|
||
Otherwise,
|
||
if the [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]")'s [*encoding-prefix*](#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") is absent or L, andv does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]")'s type,
|
||
then the value is the unique value of the [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]")'s type T that is congruent to v modulo 2N, where N is the width of T[.](#3.2.3.sentence-1)
|
||
|
||
* [(3.2.4)](#3.2.4)
|
||
|
||
Otherwise, the program is ill-formed[.](#3.2.4.sentence-1)
|
||
|
||
- [(3.3)](#3.3)
|
||
|
||
A [*character-literal*](#nt:character-literal "5.13.3 Character literals [lex.ccon]") with
|
||
a [*c-char-sequence*](#nt:c-char-sequence "5.13.3 Character literals [lex.ccon]") consisting of
|
||
a single [*conditional-escape-sequence*](#nt:conditional-escape-sequence "5.13.3 Character literals [lex.ccon]") is conditionally-supported and
|
||
has an implementation-defined value[.](#3.3.sentence-1)
|
||
|
||
[4](#4)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1629)
|
||
|
||
The character specified by a [*simple-escape-sequence*](#nt:simple-escape-sequence "5.13.3 Character literals [lex.ccon]") is specified in Table [10](#tab:lex.ccon.esc "Table 10: Simple escape sequences")[.](#4.sentence-1)
|
||
|
||
[*Note [1](#note-1)*:
|
||
|
||
Using an escape sequence for a question mark
|
||
is supported for compatibility with C++ 2014 and C[.](#4.sentence-2)
|
||
|
||
â *end note*]
|
||
|
||
Table [10](#tab:lex.ccon.esc) — Simple escape sequences [[tab:lex.ccon.esc]](./tab:lex.ccon.esc)
|
||
|
||
| [ð](#tab:lex.ccon.esc-row-1)<br>**character** | | **[*simple-escape-sequence*](#nt:simple-escape-sequence "5.13.3 Character literals [lex.ccon]")** |
|
||
| --- | --- | --- |
|
||
| [ð](#tab:lex.ccon.esc-row-2)<br>U+000a | line feed | \n |
|
||
| [ð](#tab:lex.ccon.esc-row-3)<br>U+0009 | character tabulation | \t |
|
||
| [ð](#tab:lex.ccon.esc-row-4)<br>U+000b | line tabulation | \v |
|
||
| [ð](#tab:lex.ccon.esc-row-5)<br>U+0008 | backspace | \b |
|
||
| [ð](#tab:lex.ccon.esc-row-6)<br>U+000d | carriage return | \r |
|
||
| [ð](#tab:lex.ccon.esc-row-7)<br>U+000c | form feed | \f |
|
||
| [ð](#tab:lex.ccon.esc-row-8)<br>U+0007 | alert | \a |
|
||
| [ð](#tab:lex.ccon.esc-row-9)<br>U+005c | reverse solidus | \\ |
|
||
| [ð](#tab:lex.ccon.esc-row-10)<br>U+003f | question mark | \? |
|
||
| [ð](#tab:lex.ccon.esc-row-11)<br>U+0027 | apostrophe | \' |
|
||
| [ð](#tab:lex.ccon.esc-row-12)<br>U+0022 | quotation mark | \" |
|