Files
2025-10-25 03:02:53 +03:00

66 KiB
Raw Permalink Blame History

[lex.literal]

5 Lexical conventions [lex]

5.13 Literals [lex.literal]

5.13.1 Kinds of literals [lex.literal.kinds]

1

#

There are several kinds of literals.14

literal:
integer-literal
character-literal
floating-point-literal
string-literal
boolean-literal
pointer-literal
user-defined-literal

[Note 1:

When appearing as an expression, a literal has a type and a value category ([expr.prim.literal]).

— end note]

14)14)

The term “literal” generally designates, in this document, those tokens that are called “constants” in C.

5.13.2 Integer literals [lex.icon]

integer-literal:
binary-literal integer-suffixopt
octal-literal integer-suffixopt
decimal-literal integer-suffixopt
hexadecimal-literal integer-suffixopt

binary-literal:
0b binary-digit
0B binary-digit
binary-literal 'opt binary-digit

octal-literal:
0
octal-literal 'opt octal-digit

decimal-literal:
nonzero-digit
decimal-literal 'opt digit

hexadecimal-literal:
hexadecimal-prefix hexadecimal-digit-sequence

binary-digit: one of
0 1

octal-digit: one of
0 1 2 3 4 5 6 7

nonzero-digit: one of
1 2 3 4 5 6 7 8 9

hexadecimal-prefix: one of
0x 0X

hexadecimal-digit-sequence:
hexadecimal-digit
hexadecimal-digit-sequence 'opt hexadecimal-digit

hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F

integer-suffix:
unsigned-suffix long-suffixopt
unsigned-suffix long-long-suffixopt
unsigned-suffix size-suffixopt
long-suffix unsigned-suffixopt
long-long-suffix unsigned-suffixopt
size-suffix unsigned-suffixopt

unsigned-suffix: one of
u U

long-suffix: one of
l L

long-long-suffix: one of
ll LL

size-suffix: one of
z Z

1

#

In an integer-literal, the sequence ofbinary-digits,octal-digits,digits, orhexadecimal-digits is interpreted as a base N integer as shown in Table 7; the lexically first digit of the sequence of digits is the most significant.

[Note 1:

The prefix and any optional separating single quotes are ignored when determining the value.

— end note]

Table 7 — Base of integer-literals [tab:lex.icon.base]

🔗
Kind of integer-literal
base N
🔗
binary-literal
2
🔗
octal-literal
8
🔗
decimal-literal
10
🔗
hexadecimal-literal
16

2

#

The hexadecimal-digitsa through f and A through F have decimal values ten through fifteen.

[Example 1:

The number twelve can be written 12, 014,0XC, or 0b1100.

The integer-literals 1048576,1'048'576, 0X100000, 0x10'0000, and0'004'000'000 all have the same value.

— end example]

3

#

The type of an integer-literal is the first type in the list in Table 8 corresponding to its optional integer-suffix in which its value can be represented.

Table 8 — Types of integer-literals [tab:lex.icon.type]

🔗
integer-suffix
decimal-literal integer-literal other than decimal-literal
🔗
none
int int
🔗 long int unsigned int
🔗 long long int long int
🔗 unsigned long int
🔗 long long int
🔗 unsigned long long int
🔗
u or U
unsigned int unsigned int
🔗 unsigned long int unsigned long int
🔗 unsigned long long int unsigned long long int
🔗
l or L
long int long int
🔗 long long int unsigned long int
🔗 long long int
🔗 unsigned long long int
🔗
Both u or U
unsigned long int unsigned long int
🔗
and l or L
unsigned long long int unsigned long long int
🔗
ll or LL
long long int long long int
🔗 unsigned long long int
🔗
Both u or U
unsigned long long int unsigned long long int
🔗
and ll or LL
🔗
z or Z
the signed integer type corresponding the signed integer type
🔗 to std::size_t ([support.types.layout]) corresponding to std::size_t
🔗 std::size_t
🔗
Both u or U
std::size_t std::size_t
🔗
and z or Z

4

#

Except for integer-literals containing a size-suffix, if the value of an integer-literal cannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type.

If all of the types in the list for the integer-literal are signed, the extended integer type is signed.

If all of the types in the list for the integer-literal are unsigned, the extended integer type is unsigned.

If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.

If an integer-literal cannot be represented by any of the allowed types, the program is ill-formed.

[Note 2:

An integer-literal with a z or Z suffix is ill-formed if it cannot be represented by std::size_t.

— end note]

5.13.3 Character literals [lex.ccon]

character-literal:
encoding-prefixopt ' c-char-sequence '

encoding-prefix: one of
u8 u U L

c-char-sequence:
c-char c-char-sequenceopt

c-char:
basic-c-char
escape-sequence
universal-character-name

basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character

escape-sequence:
simple-escape-sequence
numeric-escape-sequence
conditional-escape-sequence

simple-escape-sequence:
\ simple-escape-sequence-char

simple-escape-sequence-char: one of
' " ? \ a b f n r t v

numeric-escape-sequence:
octal-escape-sequence
hexadecimal-escape-sequence

simple-octal-digit-sequence:
octal-digit simple-octal-digit-sequenceopt

octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
\o{ simple-octal-digit-sequence }

hexadecimal-escape-sequence:
\x simple-hexadecimal-digit-sequence
\x{ simple-hexadecimal-digit-sequence }

conditional-escape-sequence:
\ conditional-escape-sequence-char

conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x

1

#

A multicharacter literal is a character-literal whose c-char-sequence consists of more than one c-char.

A multicharacter literal shall not have an encoding-prefix.

If a multicharacter literal contains a c-char that is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.

Multicharacter literals are conditionally-supported.

2

#

The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequence as defined by Table 9.

Table 9 — Character literals [tab:lex.ccon.literal]

🔗
Encoding
Kind Type Associated char- Example
🔗
prefix
acter encoding
🔗
none
ordinary character literal char ordinary literal 'v'
🔗
multicharacter literal int encoding 'abcd'
🔗
L
wide character literal wchar_t wide literal L'w'
🔗 encoding
🔗
u8
UTF-8 character literal char8_t UTF-8 u8'x'
🔗
u
UTF-16 character literal char16_t UTF-16 u'y'
🔗
U
UTF-32 character literal char32_t UTF-32 U'z'

3

#

In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.

A multicharacter literal has animplementation-defined value.

The value of any other kind of character-literal is determined as follows:

4

#

The character specified by a simple-escape-sequence is specified in Table 10.

[Note 1:

Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.

— end note]

Table 10 — Simple escape sequences [tab:lex.ccon.esc]

🔗
character
simple-escape-sequence
🔗
U+000a
line feed \n
🔗
U+0009
character tabulation \t
🔗
U+000b
line tabulation \v
🔗
U+0008
backspace \b
🔗
U+000d
carriage return \r
🔗
U+000c
form feed \f
🔗
U+0007
alert \a
🔗
U+005c
reverse solidus \
🔗
U+003f
question mark ?
🔗
U+0027
apostrophe '
🔗
U+0022
quotation mark "

5.13.4 Floating-point literals [lex.fcon]

floating-point-literal:
decimal-floating-point-literal
hexadecimal-floating-point-literal

decimal-floating-point-literal:
fractional-constant exponent-partopt floating-point-suffixopt
digit-sequence exponent-part floating-point-suffixopt

hexadecimal-floating-point-literal:
hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-point-suffixopt
hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-point-suffixopt

fractional-constant:
digit-sequenceopt . digit-sequence
digit-sequence .

hexadecimal-fractional-constant:
hexadecimal-digit-sequenceopt . hexadecimal-digit-sequence
hexadecimal-digit-sequence .

exponent-part:
e signopt digit-sequence
E signopt digit-sequence

binary-exponent-part:
p signopt digit-sequence
P signopt digit-sequence

sign: one of

digit-sequence:
digit
digit-sequence 'opt digit

floating-point-suffix: one of
f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16

1

#

The type of a floating-point-literal ([basic.fundamental], [basic.extended.fp]) is determined by its floating-point-suffix as specified in Table 11.

[Note 1:

The floating-point suffixesf16, f32, f64, f128, bf16,F16, F32, F64, F128, and BF16 are conditionally-supported.

See [basic.extended.fp].

— end note]

Table 11 — Types of floating-point-literals [tab:lex.fcon.type]

🔗
floating-point-suffix
type
🔗
none
double
🔗
f or F
float
🔗
l or L
long double
🔗
f16 or F16
std::float16_t
🔗
f32 or F32
std::float32_t
🔗
f64 or F64
std::float64_t
🔗
f128 or F128
std::float128_t
🔗
bf16 or BF16
std::bfloat16_t

2

#

The significand of a floating-point-literal is the fractional-constant or digit-sequence of a decimal-floating-point-literal or the hexadecimal-fractional-constant or hexadecimal-digit-sequence of a hexadecimal-floating-point-literal.

In the significand, the sequence of digits or hexadecimal-digits and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal.

[Note 2:

Any optional separating single quotes are ignored when determining the value.

— end note]

If an exponent-part or binary-exponent-part is present, the exponent e of the floating-point-literal is the result of interpreting the sequence of an optional sign and the digits as a base 10 integer.

Otherwise, the exponent e is 0.

The scaled value of the literal iss×10e for a decimal-floating-point-literal ands×2e for a hexadecimal-floating-point-literal.

[Example 1:

The floating-point-literals49.625 and 0xC.68p+2 have the same value.

The floating-point-literals1.602'176'565e-19 and 1.602176565e-19 have the same value.

— end example]

3

#

If the scaled value is not in the range of representable values for its type, the program is ill-formed.

Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

5.13.5 String literals [lex.string]

string-literal:
encoding-prefixopt " s-char-sequenceopt "
encoding-prefixopt R raw-string

s-char-sequence:
s-char s-char-sequenceopt

s-char:
basic-s-char
escape-sequence
universal-character-name

basic-s-char:
any member of the translation character set except the U+0022 quotation mark,
U+005c reverse solidus, or new-line character

raw-string:
" d-char-sequenceopt ( r-char-sequenceopt ) d-char-sequenceopt "

r-char-sequence:
r-char r-char-sequenceopt

r-char:
any member of the translation character set, except a U+0029 right parenthesis followed by
the initial d-char-sequence (which may be empty) followed by a U+0022 quotation mark

d-char-sequence:
d-char d-char-sequenceopt

d-char:
any member of the basic character set except:
U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line

1

#

The kind of a string-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding prefix and sequence ofs-chars or r-chars as defined by Table 12 where n is the number of encoded code units that would result from an evaluation of the string-literal (see below).

Table 12 — String literals [tab:lex.string.literal]

🔗
Enco-
Kind Type Associated Examples
🔗
ding
character
🔗
prefix
encoding
🔗
none
ordinary string literal array of n const char ordinary literal encoding "ordinary string" R"(ordinary raw string)"
🔗
L
wide string literal array of n const wchar_t wide literal encoding L"wide string" LR"w(wide raw string)w"
🔗
u8
UTF-8 string literal array of n const char8_t UTF-8 u8"UTF-8 string" u8R"x(UTF-8 raw string)x"
🔗
u
UTF-16 string literal array of n const char16_t UTF-16 u"UTF-16 string" uR"y(UTF-16 raw string)y"
🔗
U
UTF-32 string literal array of n const char32_t UTF-32 U"UTF-32 string" UR"z(UTF-32 raw string)z"

2

#

A string-literal that has an Rin the prefix is a raw string literal.

Thed-char-sequence serves as a delimiter.

The terminatingd-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.

A d-char-sequence shall consist of at most 16 characters.

3

#

[Note 1:

The characters '(' and ')' can appear in araw-string.

Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)".

— end note]

4

#

[Note 2:

A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.

Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a
b c)"; assert(std::strcmp(p, "a\\nb\nc") == 0);

— end note]

5

#

[Example 1:

The raw stringR"a( )
a" )a" is equivalent to "\n)\\na"\n".

The raw stringR"(x = ""y"")" is equivalent to "x = "\"y\""".

— end example]

6

#

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.

7

#

The string-literals in any sequence of adjacent string-literals shall have at most one unique encoding-prefix among them.

The common encoding-prefix of the sequence is that encoding-prefix, if any.

[Note 3:

A string-literal's rawness has no effect on the determination of the common encoding-prefix.

— end note]

8

#

In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated.

The lexical structure and grouping of the contents of the individual string-literals is retained.

[Example 2:

"\xA" "B" represents the code unit '\xA' and the character 'B' after concatenation (and not the single code unit '\xAB').

Similarly,R"(\u00)" "41" represents six characters, starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a universal-character-name).

Table 13 has some examples of valid concatenations.

— end example]

Table 13 — String literal concatenations [tab:lex.string.concat]

🔗
Source
Means Source Means Source Means
🔗
u"a"
u"b" u"ab" U"a" U"b" U"ab" L"a" L"b" L"ab"
🔗
u"a"
"b" u"ab" U"a" "b" U"ab" L"a" "b" L"ab"
🔗
"a"
u"b" u"ab" "a" U"b" U"ab" "a" L"b" L"ab"

9

#

Evaluating a string-literal results in a string literal object with static storage duration ([basic.stc]).

[Note 4:

String literal objects are potentially non-unique ([intro.object]).

Whether successive evaluations of astring-literal yield the same or a different object is unspecified.

— end note]

[Note 5:

The effect of attempting to modify a string literal object is undefined.

— end note]

10

#

String literal objects are initialized with the sequence of code unit values corresponding to the string-literal's sequence ofs-chars (originally from non-raw string literals) andr-chars (originally from raw string literals), plus a terminating U+0000 null character, in order as follows:

  • (10.1)

    The sequence of characters denoted by each contiguous sequence ofbasic-s-chars,r-chars,simple-escape-sequences ([lex.ccon]), anduniversal-character-names ([lex.charset]) is encoded to a code unit sequence using the string-literal's associated character encoding. If a character lacks representation in the associated character encoding, then the program is ill-formed. [Note 6: No character lacks representation in any Unicode encoding form. — end note] When encoding a stateful character encoding, implementations should encode the first such sequence beginning with the initial encoding state and encode subsequent sequences beginning with the final encoding state of the prior sequence. [Note 7: The encoded code unit sequence can differ from the sequence of code units that would be obtained by encoding each character independently. — end note]

  • (10.2)

    Each numeric-escape-sequence ([lex.ccon]) contributes a single code unit with a value as follows:

    When encoding a stateful character encoding, these sequences should have no effect on encoding state.

  • (10.3)

    Each conditional-escape-sequence ([lex.ccon]) contributes animplementation-defined code unit sequence. When encoding a stateful character encoding, it isimplementation-defined what effect these sequences have on encoding state.

5.13.6 Unevaluated strings [lex.string.uneval]

unevaluated-string:
string-literal

1

#

An unevaluated-string shall have no encoding-prefix.

2

#

Each universal-character-name and each simple-escape-sequence in an unevaluated-string is replaced by the member of the translation character set it denotes.

An unevaluated-string that contains a numeric-escape-sequence or a conditional-escape-sequence is ill-formed.

3

#

An unevaluated-string is never evaluated and its interpretation depends on the context in which it appears.

5.13.7 Boolean literals [lex.bool]

boolean-literal:
false
true

1

#

The Boolean literals are the keywords false and true.

Such literals have type bool.

5.13.8 Pointer literals [lex.nullptr]

pointer-literal:
nullptr

1

#

The pointer literal is the keyword nullptr.

It has typestd::nullptr_t.

[Note 1:

std::nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value.

See [conv.ptr] and [conv.mem].

— end note]

5.13.9 User-defined literals [lex.ext]

user-defined-literal:
user-defined-integer-literal
user-defined-floating-point-literal
user-defined-string-literal
user-defined-character-literal

user-defined-integer-literal:
decimal-literal ud-suffix
octal-literal ud-suffix
hexadecimal-literal ud-suffix
binary-literal ud-suffix

user-defined-floating-point-literal:
fractional-constant exponent-partopt ud-suffix
digit-sequence exponent-part ud-suffix
hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix

user-defined-string-literal:
string-literal ud-suffix

user-defined-character-literal:
character-literal ud-suffix

ud-suffix:
identifier

1

#

If a token matches both user-defined-literal and another literal kind, it is treated as the latter.

[Example 1:

123_km is a user-defined-literal, but 12LL is aninteger-literal.

— end example]

The syntactic non-terminal preceding the ud-suffix in auser-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.

2

#

A user-defined-literal is treated as a call to a literal operator or literal operator template ([over.literal]).

To determine the form of this call for a given user-defined-literal L with ud-suffix X, first let S be the set of declarations found by unqualified lookup for the literal-operator-id whose literal suffix identifier is X ([basic.lookup.unqual]).

S shall not be empty.

3

#

If L is a user-defined-integer-literal, let n be the literal without its ud-suffix.

If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the formoperator ""X(nULL)

Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.

If S contains a raw literal operator, the literal L is treated as a call of the formoperator ""X("n")

Otherwise (S contains a numeric literal operator template),L is treated as a call of the formoperator ""X<'c1', 'c2', ... 'ck'>() where n is the source character sequence c1c2...ck.

[Note 1:

The sequencec1c2...ck can only contain characters from the basic character set.

— end note]

4

#

If L is a user-defined-floating-point-literal, let f be the literal without its ud-suffix.

If S contains a literal operator with parameter type long double, the literal L is treated as a call of the formoperator ""X(fL)

Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.

If S contains a raw literal operator, the literal L is treated as a call of the formoperator ""X("f")

Otherwise (S contains a numeric literal operator template),L is treated as a call of the formoperator ""X<'c1', 'c2', ... 'ck'>() where f is the source character sequence c1c2...ck.

[Note 2:

The sequencec1c2...ck can only contain characters from the basic character set.

— end note]

5

#

If L is a user-defined-string-literal, let str be the literal without its ud-suffix and let len be the number of code units in str (i.e., its length excluding the terminating null character).

If S contains a literal operator template with a constant template parameter for which str is a well-formed template-argument, the literal L is treated as a call of the formoperator ""X<str>()

Otherwise, the literal L is treated as a call of the formoperator ""X(str, len)

6

#

If L is a user-defined-character-literal, let ch be the literal without its ud-suffix.

S shall contain a literal operator whose only parameter has the type of ch and the literal L is treated as a call of the formoperator ""X(ch)

7

#

[Example 2: long double operator ""_w(long double); std::string operator ""_w(const char16_t*, std::size_t);unsigned operator ""_w(const char*);int main() {1.2_w; // calls operator ""_w(1.2L)u"one"_w; // calls operator ""_w(u"one", 3)12_w; // calls operator ""_w("12")"two"_w; // error: no applicable literal operator} — end example]

8

#

In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated anduser-defined-string-literals are considered string-literals for that purpose.

During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string].

At the end of phase 6, if a string-literal is the result of a concatenation involving at least oneuser-defined-string-literal, all the participatinguser-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.

9

#

[Example 3: int main() {L"A" "B" "C"_x; // OK, same as L"ABC"_x"P"_x "Q" "R"_y; // error: two different ud-suffixes} — end example]