13 KiB
[lex.ccon]
5 Lexical conventions [lex]
5.13 Literals [lex.literal]
5.13.3 Character literals [lex.ccon]
character-literal:
encoding-prefixopt ' c-char-sequence '
encoding-prefix: one of
u8 u U L
c-char-sequence:
c-char c-char-sequenceopt
c-char:
basic-c-char
escape-sequence
universal-character-name
basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character
escape-sequence:
simple-escape-sequence
numeric-escape-sequence
conditional-escape-sequence
simple-escape-sequence:
\ simple-escape-sequence-char
simple-escape-sequence-char: one of
' " ? \ a b f n r t v
numeric-escape-sequence:
octal-escape-sequence
hexadecimal-escape-sequence
simple-octal-digit-sequence:
octal-digit simple-octal-digit-sequenceopt
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
\o{ simple-octal-digit-sequence }
hexadecimal-escape-sequence:
\x simple-hexadecimal-digit-sequence
\x{ simple-hexadecimal-digit-sequence }
conditional-escape-sequence:
\ conditional-escape-sequence-char
conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x
A multicharacter literal is a character-literal whose c-char-sequence consists of more than one c-char.
A multicharacter literal shall not have an encoding-prefix.
If a multicharacter literal contains a c-char that is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.
Multicharacter literals are conditionally-supported.
The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequence as defined by Table 9.
Table 9 — Character literals [tab:lex.ccon.literal]
| ð Encoding |
Kind | Type | Associated char- | Example |
|---|---|---|---|---|
| ð prefix |
acter encoding | |||
| ð none |
ordinary character literal | char | ordinary literal | 'v' |
| ð |
multicharacter literal | int | encoding | 'abcd' |
| ð L |
wide character literal | wchar_t | wide literal | L'w' |
| ð | encoding | |||
| ð u8 |
UTF-8 character literal | char8_t | UTF-8 | u8'x' |
| ð u |
UTF-16 character literal | char16_t | UTF-16 | u'y' |
| ð U |
UTF-32 character literal | char32_t | UTF-32 | U'z' |
In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.
A multicharacter literal has animplementation-defined value.
The value of any other kind of character-literal is determined as follows:
-
A character-literal with a c-char-sequence consisting of a singlebasic-c-char,simple-escape-sequence, oruniversal-character-name is the code unit value of the specified character as encoded in the literal's associated character encoding. If the specified character lacks representation in the literal's associated character encoding or if it cannot be encoded as a single code unit, then the program is ill-formed.
-
A character-literal with a c-char-sequence consisting of a single numeric-escape-sequence has a value as follows:
-
Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
-
If v does not exceed the range of representable values of the character-literal's type, then the value is v.
-
Otherwise, if the character-literal's encoding-prefix is absent or L, andv does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to v modulo 2N, where N is the width of T.
-
Otherwise, the program is ill-formed.
-
-
A character-literal with a c-char-sequence consisting of a single conditional-escape-sequence is conditionally-supported and has an implementation-defined value.
The character specified by a simple-escape-sequence is specified in Table 10.
[Note 1:
Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.
â end note]
Table 10 — Simple escape sequences [tab:lex.ccon.esc]
| ð character |
simple-escape-sequence | |
|---|---|---|
| ð U+000a |
line feed | \n |
| ð U+0009 |
character tabulation | \t |
| ð U+000b |
line tabulation | \v |
| ð U+0008 |
backspace | \b |
| ð U+000d |
carriage return | \r |
| ð U+000c |
form feed | \f |
| ð U+0007 |
alert | \a |
| ð U+005c |
reverse solidus | \ |
| ð U+003f |
question mark | ? |
| ð U+0027 |
apostrophe | ' |
| ð U+0022 |
quotation mark | " |