Files
2025-10-25 03:02:53 +03:00

274 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[lex.string]
# 5 Lexical conventions [[lex]](./#lex)
## 5.13 Literals [[lex.literal]](lex.literal#lex.string)
### 5.13.5 String literals [lex.string]
[string-literal:](#nt:string-literal "5.13.5String literals[lex.string]")
[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3Character literals[lex.ccon]")opt " [*s-char-sequence*](#nt:s-char-sequence "5.13.5String literals[lex.string]")opt "
[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3Character literals[lex.ccon]")opt R [*raw-string*](#nt:raw-string "5.13.5String literals[lex.string]")
[s-char-sequence:](#nt:s-char-sequence "5.13.5String literals[lex.string]")
[*s-char*](#nt:s-char "5.13.5String literals[lex.string]") [*s-char-sequence*](#nt:s-char-sequence "5.13.5String literals[lex.string]")opt
[s-char:](#nt:s-char "5.13.5String literals[lex.string]")
[*basic-s-char*](#nt:basic-s-char "5.13.5String literals[lex.string]")
[*escape-sequence*](lex.ccon#nt:escape-sequence "5.13.3Character literals[lex.ccon]")
[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2Universal character names[lex.universal.char]")
[basic-s-char:](#nt:basic-s-char "5.13.5String literals[lex.string]")
any member of the translation character set except the U+0022 quotation mark,
U+005c reverse solidus, or new-line character
[raw-string:](#nt:raw-string "5.13.5String literals[lex.string]")
" [*d-char-sequence*](#nt:d-char-sequence "5.13.5String literals[lex.string]")opt ( [*r-char-sequence*](#nt:r-char-sequence "5.13.5String literals[lex.string]")opt ) [*d-char-sequence*](#nt:d-char-sequence "5.13.5String literals[lex.string]")opt "
[r-char-sequence:](#nt:r-char-sequence "5.13.5String literals[lex.string]")
[*r-char*](#nt:r-char "5.13.5String literals[lex.string]") [*r-char-sequence*](#nt:r-char-sequence "5.13.5String literals[lex.string]")opt
[r-char:](#nt:r-char "5.13.5String literals[lex.string]")
any member of the translation character set, except a U+0029 right parenthesis followed by
the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5String literals[lex.string]") (which may be empty) followed by a U+0022 quotation mark
[d-char-sequence:](#nt:d-char-sequence "5.13.5String literals[lex.string]")
[*d-char*](#nt:d-char "5.13.5String literals[lex.string]") [*d-char-sequence*](#nt:d-char-sequence "5.13.5String literals[lex.string]")opt
[d-char:](#nt:d-char "5.13.5String literals[lex.string]")
any member of the basic character set except:
U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line
[1](#1)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1850)
The kind of a [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]"),
its type, and
its associated character encoding ([[lex.charset]](lex.charset "5.3.1Character sets"))
are determined by its encoding prefix and sequence of[*s-char*](#nt:s-char "5.13.5String literals[lex.string]")*s* or [*r-char*](#nt:r-char "5.13.5String literals[lex.string]")*s* as defined by Table [12](#tab:lex.string.literal "Table 12: String literals") where n is the number of encoded code units
that would result from an evaluation of the [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]") (see below)[.](#1.sentence-1)
Table [12](#tab:lex.string.literal) — String literals [[tab:lex.string.literal]](./tab:lex.string.literal)
| [🔗](#tab:lex.string.literal-row-1)<br>**Enco-** | **Kind** | **Type** | **Associated** | **Examples** |
| --- | --- | --- | --- | --- |
| [🔗](#tab:lex.string.literal-row-2)<br>**ding** | | | **character** | |
| [🔗](#tab:lex.string.literal-row-3)<br>**prefix** | | | **encoding** | |
| [🔗](#tab:lex.string.literal-row-4)<br>none | [*ordinary string literal*](#def:literal,string,ordinary "5.13.5String literals[lex.string]") | array of n const char | ordinary literal encoding | "ordinary string" R"(ordinary raw string)" |
| [🔗](#tab:lex.string.literal-row-5)<br>L | [*wide string literal*](#def:literal,string,wide "5.13.5String literals[lex.string]") | array of n const wchar_t | wide literal encoding | L"wide string" LR"w(wide raw string)w" |
| [🔗](#tab:lex.string.literal-row-6)<br>u8 | [*UTF-8 string literal*](#def:literal,string,UTF-8 "5.13.5String literals[lex.string]") | array of n const char8_t | UTF-8 | u8"UTF-8 string" u8R"x(UTF-8 raw string)x" |
| [🔗](#tab:lex.string.literal-row-7)<br>u | [*UTF-16 string literal*](#def:literal,string,UTF-16 "5.13.5String literals[lex.string]") | array of n const char16_t | UTF-16 | u"UTF-16 string" uR"y(UTF-16 raw string)y" |
| [🔗](#tab:lex.string.literal-row-8)<br>U | [*UTF-32 string literal*](#def:literal,string,UTF-32 "5.13.5String literals[lex.string]") | array of n const char32_t | UTF-32 | U"UTF-32 string" UR"z(UTF-32 raw string)z" |
[2](#2)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1909)
A [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]") that has an Rin the prefix is a [*raw string literal*](#def:raw_string_literal "5.13.5String literals[lex.string]")[.](#2.sentence-1)
The[*d-char-sequence*](#nt:d-char-sequence "5.13.5String literals[lex.string]") serves as a delimiter[.](#2.sentence-2)
The terminating[*d-char-sequence*](#nt:d-char-sequence "5.13.5String literals[lex.string]") of a [*raw-string*](#nt:raw-string "5.13.5String literals[lex.string]") is the same sequence of
characters as the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5String literals[lex.string]")[.](#2.sentence-3)
A [*d-char-sequence*](#nt:d-char-sequence "5.13.5String literals[lex.string]") shall consist of at most 16 characters[.](#2.sentence-4)
[3](#3)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1919)
[*Note [1](#note-1)*:
The characters '(' and ')' can appear in a[*raw-string*](#nt:raw-string "5.13.5String literals[lex.string]")[.](#3.sentence-1)
Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)"[.](#3.sentence-2)
— *end note*]
[4](#4)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1926)
[*Note [2](#note-2)*:
A source-file new-line in a raw string literal results in a new-line in the
resulting execution string literal[.](#4.sentence-1)
Assuming no
whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
— *end note*]
[5](#5)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1939)
[*Example [1](#example-1)*:
The raw stringR"a(
)\
a"
)a" is equivalent to "\n)\\\na\"\n"[.](#5.sentence-1)
The raw stringR"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\""[.](#5.sentence-2)
— *end example*]
[6](#6)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1955)
Ordinary string literals and UTF-8 string literals are
also referred to as [*narrow string literals*](#def:literal,string,narrow "5.13.5String literals[lex.string]")[.](#6.sentence-1)
[7](#7)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1960)
The [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")*s* in
any sequence of adjacent [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")*s* shall have at most one unique [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3Character literals[lex.ccon]") among them[.](#7.sentence-1)
The common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3Character literals[lex.ccon]") of the sequence is
that [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3Character literals[lex.ccon]"), if any[.](#7.sentence-2)
[*Note [3](#note-3)*:
A [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")'s rawness has
no effect on the determination of the common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3Character literals[lex.ccon]")[.](#7.sentence-3)
— *end note*]
[8](#8)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1972)
In translation phase 6 ([[lex.phases]](lex.phases "5.2Phases of translation")),
adjacent [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")*s* are concatenated[.](#8.sentence-1)
The lexical structure and grouping of
the contents of the individual [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")*s* is retained[.](#8.sentence-2)
[*Example [2](#example-2)*:
"\xA" "B" represents
the code unit '\xA' and the character 'B' after concatenation
(and not the single code unit '\xAB')[.](#8.sentence-3)
Similarly,R"(\u00)" "41" represents six characters,
starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a [*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2Universal character names[lex.universal.char]"))[.](#8.sentence-4)
Table [13](#tab:lex.string.concat "Table 13: String literal concatenations") has some examples of valid concatenations[.](#8.sentence-5)
— *end example*]
Table [13](#tab:lex.string.concat) — String literal concatenations [[tab:lex.string.concat]](./tab:lex.string.concat)
| [🔗](#tab:lex.string.concat-row-1)<br>Source | | Means | Source | | Means | Source | | Means |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| [🔗](#tab:lex.string.concat-row-2)<br>u"a" | u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" |
| [🔗](#tab:lex.string.concat-row-3)<br>u"a" | "b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" |
| [🔗](#tab:lex.string.concat-row-4)<br>"a" | u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" |
[9](#9)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2017)
Evaluating a [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]") results in a string literal object
with static storage duration ([[basic.stc]](basic.stc "6.8.6Storage duration"))[.](#9.sentence-1)
[*Note [4](#note-4)*:
String literal objects are potentially non-unique ([[intro.object]](intro.object "6.8.2Object model"))[.](#9.sentence-2)
Whether successive evaluations of a[*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]") yield the same or a different object is
unspecified[.](#9.sentence-3)
— *end note*]
[*Note [5](#note-5)*:
The effect of attempting to modify a string literal object is undefined[.](#9.sentence-4)
— *end note*]
[10](#10)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2031)
String literal objects are initialized with
the sequence of code unit values
corresponding to the [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")'s sequence of[*s-char*](#nt:s-char "5.13.5String literals[lex.string]")*s* (originally from non-raw string literals) and[*r-char*](#nt:r-char "5.13.5String literals[lex.string]")*s* (originally from raw string literals),
plus a terminating U+0000 null character,
in order as follows:
- [(10.1)](#10.1)
The sequence of characters denoted by each contiguous sequence of[*basic-s-char*](#nt:basic-s-char "5.13.5String literals[lex.string]")*s*,[*r-char*](#nt:r-char "5.13.5String literals[lex.string]")*s*,[*simple-escape-sequence*](lex.ccon#nt:simple-escape-sequence "5.13.3Character literals[lex.ccon]")*s* ([[lex.ccon]](lex.ccon "5.13.3Character literals")), and[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2Universal character names[lex.universal.char]")*s* ([[lex.charset]](lex.charset "5.3.1Character sets"))
is encoded to a code unit sequence
using the [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")'s associated character encoding[.](#10.1.sentence-1)
If a character lacks representation in the associated character encoding,
then the program is ill-formed[.](#10.1.sentence-2)
[*Note [6](#note-6)*:
No character lacks representation in any Unicode encoding form[.](#10.1.sentence-3)
— *end note*]
When encoding a stateful character encoding,
implementations should encode the first such sequence
beginning with the initial encoding state and
encode subsequent sequences
beginning with the final encoding state of the prior sequence[.](#10.1.sentence-4)
[*Note [7](#note-7)*:
The encoded code unit sequence can differ from
the sequence of code units that would be obtained by
encoding each character independently[.](#10.1.sentence-5)
— *end note*]
- [(10.2)](#10.2)
Each [*numeric-escape-sequence*](lex.ccon#nt:numeric-escape-sequence "5.13.3Character literals[lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3Character literals"))
contributes a single code unit with a value as follows:
* [(10.2.1)](#10.2.1)
Let v be the integer value represented by
the octal number comprising
the sequence of [*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]")*s* in
an [*octal-escape-sequence*](lex.ccon#nt:octal-escape-sequence "5.13.3Character literals[lex.ccon]") or by
the hexadecimal number comprising
the sequence of [*hexadecimal-digit*](lex.icon#nt:hexadecimal-digit "5.13.2Integer literals[lex.icon]")*s* in
a [*hexadecimal-escape-sequence*](lex.ccon#nt:hexadecimal-escape-sequence "5.13.3Character literals[lex.ccon]")[.](#10.2.1.sentence-1)
* [(10.2.2)](#10.2.2)
If v does not exceed the range of representable values of
the [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")'s array element type,
then the value is v[.](#10.2.2.sentence-1)
* [(10.2.3)](#10.2.3)
Otherwise,
if the [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")'s [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3Character literals[lex.ccon]") is absent or L, andv does not exceed the range of representable values of
the corresponding unsigned type for the underlying type of
the [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")'s array element type,
then the value is the unique value of
the [*string-literal*](#nt:string-literal "5.13.5String literals[lex.string]")'s array element type T that is congruent to v modulo 2N, where N is the width of T[.](#10.2.3.sentence-1)
* [(10.2.4)](#10.2.4)
Otherwise, the program is ill-formed[.](#10.2.4.sentence-1)
When encoding a stateful character encoding,
these sequences should have no effect on encoding state[.](#10.2.sentence-2)
- [(10.3)](#10.3)
Each [*conditional-escape-sequence*](lex.ccon#nt:conditional-escape-sequence "5.13.3Character literals[lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3Character literals"))
contributes animplementation-defined
code unit sequence[.](#10.3.sentence-1)
When encoding a stateful character encoding,
it isimplementation-defined
what effect these sequences have on encoding state[.](#10.3.sentence-2)