66 KiB
[lex.literal]
5 Lexical conventions [lex]
5.13 Literals [lex.literal]
5.13.1 Kinds of literals [lex.literal.kinds]
There are several kinds of literals.14
literal:
integer-literal
character-literal
floating-point-literal
string-literal
boolean-literal
pointer-literal
user-defined-literal
[Note 1:
When appearing as an expression, a literal has a type and a value category ([expr.prim.literal]).
â end note]
The term âliteralâ generally designates, in this document, those tokens that are called âconstantsâ in C.
5.13.2 Integer literals [lex.icon]
integer-literal:
binary-literal integer-suffixopt
octal-literal integer-suffixopt
decimal-literal integer-suffixopt
hexadecimal-literal integer-suffixopt
binary-literal:
0b binary-digit
0B binary-digit
binary-literal 'opt binary-digit
octal-literal:
0
octal-literal 'opt octal-digit
decimal-literal:
nonzero-digit
decimal-literal 'opt digit
hexadecimal-literal:
hexadecimal-prefix hexadecimal-digit-sequence
binary-digit: one of
0 1
octal-digit: one of
0 1 2 3 4 5 6 7
nonzero-digit: one of
1 2 3 4 5 6 7 8 9
hexadecimal-prefix: one of
0x 0X
hexadecimal-digit-sequence:
hexadecimal-digit
hexadecimal-digit-sequence 'opt hexadecimal-digit
hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
integer-suffix:
unsigned-suffix long-suffixopt
unsigned-suffix long-long-suffixopt
unsigned-suffix size-suffixopt
long-suffix unsigned-suffixopt
long-long-suffix unsigned-suffixopt
size-suffix unsigned-suffixopt
unsigned-suffix: one of
u U
long-suffix: one of
l L
long-long-suffix: one of
ll LL
size-suffix: one of
z Z
In an integer-literal, the sequence ofbinary-digits,octal-digits,digits, orhexadecimal-digits is interpreted as a base N integer as shown in Table 7; the lexically first digit of the sequence of digits is the most significant.
[Note 1:
The prefix and any optional separating single quotes are ignored when determining the value.
â end note]
Table 7 — Base of integer-literals [tab:lex.icon.base]
| ð Kind of integer-literal |
base N |
|---|---|
| ð binary-literal |
2 |
| ð octal-literal |
8 |
| ð decimal-literal |
10 |
| ð hexadecimal-literal |
16 |
The hexadecimal-digitsa through f and A through F have decimal values ten through fifteen.
[Example 1:
The number twelve can be written 12, 014,0XC, or 0b1100.
The integer-literals 1048576,1'048'576, 0X100000, 0x10'0000, and0'004'000'000 all have the same value.
â end example]
The type of an integer-literal is the first type in the list in Table 8 corresponding to its optional integer-suffix in which its value can be represented.
Table 8 — Types of integer-literals [tab:lex.icon.type]
| ð integer-suffix |
decimal-literal | integer-literal other than decimal-literal |
|---|---|---|
| ð none |
int | int |
| ð | long int | unsigned int |
| ð | long long int | long int |
| ð | unsigned long int | |
| ð | long long int | |
| ð | unsigned long long int | |
| ð u or U |
unsigned int | unsigned int |
| ð | unsigned long int | unsigned long int |
| ð | unsigned long long int | unsigned long long int |
| ð l or L |
long int | long int |
| ð | long long int | unsigned long int |
| ð | long long int | |
| ð | unsigned long long int | |
| ð Both u or U |
unsigned long int | unsigned long int |
| ð and l or L |
unsigned long long int | unsigned long long int |
| ð ll or LL |
long long int | long long int |
| ð | unsigned long long int | |
| ð Both u or U |
unsigned long long int | unsigned long long int |
| ð and ll or LL |
||
| ð z or Z |
the signed integer type corresponding | the signed integer type |
| ð | to std::size_t ([support.types.layout]) | corresponding to std::size_t |
| ð | std::size_t | |
| ð Both u or U |
std::size_t | std::size_t |
| ð and z or Z |
Except for integer-literals containing a size-suffix, if the value of an integer-literal cannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type.
If all of the types in the list for the integer-literal are signed, the extended integer type is signed.
If all of the types in the list for the integer-literal are unsigned, the extended integer type is unsigned.
If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.
If an integer-literal cannot be represented by any of the allowed types, the program is ill-formed.
[Note 2:
An integer-literal with a z or Z suffix is ill-formed if it cannot be represented by std::size_t.
â end note]
5.13.3 Character literals [lex.ccon]
character-literal:
encoding-prefixopt ' c-char-sequence '
encoding-prefix: one of
u8 u U L
c-char-sequence:
c-char c-char-sequenceopt
c-char:
basic-c-char
escape-sequence
universal-character-name
basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character
escape-sequence:
simple-escape-sequence
numeric-escape-sequence
conditional-escape-sequence
simple-escape-sequence:
\ simple-escape-sequence-char
simple-escape-sequence-char: one of
' " ? \ a b f n r t v
numeric-escape-sequence:
octal-escape-sequence
hexadecimal-escape-sequence
simple-octal-digit-sequence:
octal-digit simple-octal-digit-sequenceopt
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
\o{ simple-octal-digit-sequence }
hexadecimal-escape-sequence:
\x simple-hexadecimal-digit-sequence
\x{ simple-hexadecimal-digit-sequence }
conditional-escape-sequence:
\ conditional-escape-sequence-char
conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x
A multicharacter literal is a character-literal whose c-char-sequence consists of more than one c-char.
A multicharacter literal shall not have an encoding-prefix.
If a multicharacter literal contains a c-char that is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.
Multicharacter literals are conditionally-supported.
The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequence as defined by Table 9.
Table 9 — Character literals [tab:lex.ccon.literal]
| ð Encoding |
Kind | Type | Associated char- | Example |
|---|---|---|---|---|
| ð prefix |
acter encoding | |||
| ð none |
ordinary character literal | char | ordinary literal | 'v' |
| ð |
multicharacter literal | int | encoding | 'abcd' |
| ð L |
wide character literal | wchar_t | wide literal | L'w' |
| ð | encoding | |||
| ð u8 |
UTF-8 character literal | char8_t | UTF-8 | u8'x' |
| ð u |
UTF-16 character literal | char16_t | UTF-16 | u'y' |
| ð U |
UTF-32 character literal | char32_t | UTF-32 | U'z' |
In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.
A multicharacter literal has animplementation-defined value.
The value of any other kind of character-literal is determined as follows:
-
A character-literal with a c-char-sequence consisting of a singlebasic-c-char,simple-escape-sequence, oruniversal-character-name is the code unit value of the specified character as encoded in the literal's associated character encoding. If the specified character lacks representation in the literal's associated character encoding or if it cannot be encoded as a single code unit, then the program is ill-formed.
-
A character-literal with a c-char-sequence consisting of a single numeric-escape-sequence has a value as follows:
-
Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
-
If v does not exceed the range of representable values of the character-literal's type, then the value is v.
-
Otherwise, if the character-literal's encoding-prefix is absent or L, andv does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to v modulo 2N, where N is the width of T.
-
Otherwise, the program is ill-formed.
-
-
A character-literal with a c-char-sequence consisting of a single conditional-escape-sequence is conditionally-supported and has an implementation-defined value.
The character specified by a simple-escape-sequence is specified in Table 10.
[Note 1:
Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.
â end note]
Table 10 — Simple escape sequences [tab:lex.ccon.esc]
| ð character |
simple-escape-sequence | |
|---|---|---|
| ð U+000a |
line feed | \n |
| ð U+0009 |
character tabulation | \t |
| ð U+000b |
line tabulation | \v |
| ð U+0008 |
backspace | \b |
| ð U+000d |
carriage return | \r |
| ð U+000c |
form feed | \f |
| ð U+0007 |
alert | \a |
| ð U+005c |
reverse solidus | \ |
| ð U+003f |
question mark | ? |
| ð U+0027 |
apostrophe | ' |
| ð U+0022 |
quotation mark | " |
5.13.4 Floating-point literals [lex.fcon]
floating-point-literal:
decimal-floating-point-literal
hexadecimal-floating-point-literal
decimal-floating-point-literal:
fractional-constant exponent-partopt floating-point-suffixopt
digit-sequence exponent-part floating-point-suffixopt
hexadecimal-floating-point-literal:
hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-point-suffixopt
hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-point-suffixopt
fractional-constant:
digit-sequenceopt . digit-sequence
digit-sequence .
hexadecimal-fractional-constant:
hexadecimal-digit-sequenceopt . hexadecimal-digit-sequence
hexadecimal-digit-sequence .
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
binary-exponent-part:
p signopt digit-sequence
P signopt digit-sequence
sign: one of
digit-sequence:
digit
digit-sequence 'opt digit
floating-point-suffix: one of
f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16
The type of a floating-point-literal ([basic.fundamental], [basic.extended.fp]) is determined by its floating-point-suffix as specified in Table 11.
[Note 1:
The floating-point suffixesf16, f32, f64, f128, bf16,F16, F32, F64, F128, and BF16 are conditionally-supported.
â end note]
Table 11 — Types of floating-point-literals [tab:lex.fcon.type]
| ð floating-point-suffix |
type |
|---|---|
| ð none |
double |
| ð f or F |
float |
| ð l or L |
long double |
| ð f16 or F16 |
std::float16_t |
| ð f32 or F32 |
std::float32_t |
| ð f64 or F64 |
std::float64_t |
| ð f128 or F128 |
std::float128_t |
| ð bf16 or BF16 |
std::bfloat16_t |
The significand of a floating-point-literal is the fractional-constant or digit-sequence of a decimal-floating-point-literal or the hexadecimal-fractional-constant or hexadecimal-digit-sequence of a hexadecimal-floating-point-literal.
In the significand, the sequence of digits or hexadecimal-digits and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal.
[Note 2:
Any optional separating single quotes are ignored when determining the value.
â end note]
If an exponent-part or binary-exponent-part is present, the exponent e of the floating-point-literal is the result of interpreting the sequence of an optional sign and the digits as a base 10 integer.
Otherwise, the exponent e is 0.
The scaled value of the literal issÃ10e for a decimal-floating-point-literal andsÃ2e for a hexadecimal-floating-point-literal.
[Example 1:
The floating-point-literals49.625 and 0xC.68p+2 have the same value.
The floating-point-literals1.602'176'565e-19 and 1.602176565e-19 have the same value.
â end example]
If the scaled value is not in the range of representable values for its type, the program is ill-formed.
Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.
5.13.5 String literals [lex.string]
string-literal:
encoding-prefixopt " s-char-sequenceopt "
encoding-prefixopt R raw-string
s-char-sequence:
s-char s-char-sequenceopt
s-char:
basic-s-char
escape-sequence
universal-character-name
basic-s-char:
any member of the translation character set except the U+0022 quotation mark,
U+005c reverse solidus, or new-line character
raw-string:
" d-char-sequenceopt ( r-char-sequenceopt ) d-char-sequenceopt "
r-char-sequence:
r-char r-char-sequenceopt
r-char:
any member of the translation character set, except a U+0029 right parenthesis followed by
the initial d-char-sequence (which may be empty) followed by a U+0022 quotation mark
d-char-sequence:
d-char d-char-sequenceopt
d-char:
any member of the basic character set except:
U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line
The kind of a string-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding prefix and sequence ofs-chars or r-chars as defined by Table 12 where n is the number of encoded code units that would result from an evaluation of the string-literal (see below).
Table 12 — String literals [tab:lex.string.literal]
| ð Enco- |
Kind | Type | Associated | Examples |
|---|---|---|---|---|
| ð ding |
character | |||
| ð prefix |
encoding | |||
| ð none |
ordinary string literal | array of n const char | ordinary literal encoding | "ordinary string" R"(ordinary raw string)" |
| ð L |
wide string literal | array of n const wchar_t | wide literal encoding | L"wide string" LR"w(wide raw string)w" |
| ð u8 |
UTF-8 string literal | array of n const char8_t | UTF-8 | u8"UTF-8 string" u8R"x(UTF-8 raw string)x" |
| ð u |
UTF-16 string literal | array of n const char16_t | UTF-16 | u"UTF-16 string" uR"y(UTF-16 raw string)y" |
| ð U |
UTF-32 string literal | array of n const char32_t | UTF-32 | U"UTF-32 string" UR"z(UTF-32 raw string)z" |
A string-literal that has an Rin the prefix is a raw string literal.
Thed-char-sequence serves as a delimiter.
The terminatingd-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.
A d-char-sequence shall consist of at most 16 characters.
[Note 1:
The characters '(' and ')' can appear in araw-string.
Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)".
â end note]
[Note 2:
A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.
Assuming no
whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a
b
c)";
assert(std::strcmp(p, "a\\nb\nc") == 0);
â end note]
[Example 1:
The raw stringR"a(
)
a"
)a" is equivalent to "\n)\\na"\n".
The raw stringR"(x = ""y"")" is equivalent to "x = "\"y\""".
â end example]
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.
The string-literals in any sequence of adjacent string-literals shall have at most one unique encoding-prefix among them.
The common encoding-prefix of the sequence is that encoding-prefix, if any.
[Note 3:
A string-literal's rawness has no effect on the determination of the common encoding-prefix.
â end note]
In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated.
The lexical structure and grouping of the contents of the individual string-literals is retained.
[Example 2:
"\xA" "B" represents the code unit '\xA' and the character 'B' after concatenation (and not the single code unit '\xAB').
Similarly,R"(\u00)" "41" represents six characters, starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a universal-character-name).
Table 13 has some examples of valid concatenations.
â end example]
Table 13 — String literal concatenations [tab:lex.string.concat]
| ð Source |
Means | Source | Means | Source | Means | |||
|---|---|---|---|---|---|---|---|---|
| ð u"a" |
u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" |
| ð u"a" |
"b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" |
| ð "a" |
u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" |
Evaluating a string-literal results in a string literal object with static storage duration ([basic.stc]).
[Note 4:
String literal objects are potentially non-unique ([intro.object]).
Whether successive evaluations of astring-literal yield the same or a different object is unspecified.
â end note]
[Note 5:
The effect of attempting to modify a string literal object is undefined.
â end note]
String literal objects are initialized with the sequence of code unit values corresponding to the string-literal's sequence ofs-chars (originally from non-raw string literals) andr-chars (originally from raw string literals), plus a terminating U+0000 null character, in order as follows:
-
The sequence of characters denoted by each contiguous sequence ofbasic-s-chars,r-chars,simple-escape-sequences ([lex.ccon]), anduniversal-character-names ([lex.charset]) is encoded to a code unit sequence using the string-literal's associated character encoding. If a character lacks representation in the associated character encoding, then the program is ill-formed. [Note 6: No character lacks representation in any Unicode encoding form. â end note] When encoding a stateful character encoding, implementations should encode the first such sequence beginning with the initial encoding state and encode subsequent sequences beginning with the final encoding state of the prior sequence. [Note 7: The encoded code unit sequence can differ from the sequence of code units that would be obtained by encoding each character independently. â end note]
-
Each numeric-escape-sequence ([lex.ccon]) contributes a single code unit with a value as follows:
-
Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
-
If v does not exceed the range of representable values of the string-literal's array element type, then the value is v.
-
Otherwise, if the string-literal's encoding-prefix is absent or L, andv does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the string-literal's array element type, then the value is the unique value of the string-literal's array element type T that is congruent to v modulo 2N, where N is the width of T.
-
Otherwise, the program is ill-formed.
When encoding a stateful character encoding, these sequences should have no effect on encoding state.
-
-
Each conditional-escape-sequence ([lex.ccon]) contributes animplementation-defined code unit sequence. When encoding a stateful character encoding, it isimplementation-defined what effect these sequences have on encoding state.
5.13.6 Unevaluated strings [lex.string.uneval]
unevaluated-string:
string-literal
An unevaluated-string shall have no encoding-prefix.
Each universal-character-name and each simple-escape-sequence in an unevaluated-string is replaced by the member of the translation character set it denotes.
An unevaluated-string that contains a numeric-escape-sequence or a conditional-escape-sequence is ill-formed.
An unevaluated-string is never evaluated and its interpretation depends on the context in which it appears.
5.13.7 Boolean literals [lex.bool]
boolean-literal:
false
true
The Boolean literals are the keywords false and true.
Such literals have type bool.
5.13.8 Pointer literals [lex.nullptr]
pointer-literal:
nullptr
The pointer literal is the keyword nullptr.
It has typestd::nullptr_t.
[Note 1:
std::nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value.
See [conv.ptr] and [conv.mem].
â end note]
5.13.9 User-defined literals [lex.ext]
user-defined-literal:
user-defined-integer-literal
user-defined-floating-point-literal
user-defined-string-literal
user-defined-character-literal
user-defined-integer-literal:
decimal-literal ud-suffix
octal-literal ud-suffix
hexadecimal-literal ud-suffix
binary-literal ud-suffix
user-defined-floating-point-literal:
fractional-constant exponent-partopt ud-suffix
digit-sequence exponent-part ud-suffix
hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
user-defined-string-literal:
string-literal ud-suffix
user-defined-character-literal:
character-literal ud-suffix
If a token matches both user-defined-literal and another literal kind, it is treated as the latter.
[Example 1:
123_km is a user-defined-literal, but 12LL is aninteger-literal.
â end example]
The syntactic non-terminal preceding the ud-suffix in auser-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.
A user-defined-literal is treated as a call to a literal operator or literal operator template ([over.literal]).
To determine the form of this call for a given user-defined-literal L with ud-suffix X, first let S be the set of declarations found by unqualified lookup for the literal-operator-id whose literal suffix identifier is X ([basic.lookup.unqual]).
S shall not be empty.
If L is a user-defined-integer-literal, let n be the literal without its ud-suffix.
If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the formoperator ""X(nULL)
Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.
If S contains a raw literal operator, the literal L is treated as a call of the formoperator ""X("n")
Otherwise (S contains a numeric literal operator template),L is treated as a call of the formoperator ""X<'c1', 'c2', ... 'ck'>() where n is the source character sequence c1c2...ck.
[Note 1:
The sequencec1c2...ck can only contain characters from the basic character set.
â end note]
If L is a user-defined-floating-point-literal, let f be the literal without its ud-suffix.
If S contains a literal operator with parameter type long double, the literal L is treated as a call of the formoperator ""X(fL)
Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.
If S contains a raw literal operator, the literal L is treated as a call of the formoperator ""X("f")
Otherwise (S contains a numeric literal operator template),L is treated as a call of the formoperator ""X<'c1', 'c2', ... 'ck'>() where f is the source character sequence c1c2...ck.
[Note 2:
The sequencec1c2...ck can only contain characters from the basic character set.
â end note]
If L is a user-defined-string-literal, let str be the literal without its ud-suffix and let len be the number of code units in str (i.e., its length excluding the terminating null character).
If S contains a literal operator template with a constant template parameter for which str is a well-formed template-argument, the literal L is treated as a call of the formoperator ""X<str>()
Otherwise, the literal L is treated as a call of the formoperator ""X(str, len)
If L is a user-defined-character-literal, let ch be the literal without its ud-suffix.
S shall contain a literal operator whose only parameter has the type of ch and the literal L is treated as a call of the formoperator ""X(ch)
[Example 2: long double operator ""_w(long double); std::string operator ""_w(const char16_t*, std::size_t);unsigned operator ""_w(const char*);int main() {1.2_w; // calls operator ""_w(1.2L)u"one"_w; // calls operator ""_w(u"one", 3)12_w; // calls operator ""_w("12")"two"_w; // error: no applicable literal operator} â end example]
In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated anduser-defined-string-literals are considered string-literals for that purpose.
During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string].
At the end of phase 6, if a string-literal is the result of a concatenation involving at least oneuser-defined-string-literal, all the participatinguser-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.
[Example 3: int main() {L"A" "B" "C"_x; // OK, same as L"ABC"_x"P"_x "Q" "R"_y; // error: two different ud-suffixes} â end example]