Init

2025-10-25 03:02:53 +03:00
commit 043225d523
3416 changed files with 681196 additions and 0 deletions
--- a/cppdraft/lex/string.md
+++ b/cppdraft/lex/string.md
@@ -0,0 +1,273 @@
+[lex.string]
+
+# 5 Lexical conventions [[lex]](./#lex)
+
+## 5.13 Literals [[lex.literal]](lex.literal#lex.string)
+
+### 5.13.5 String literals [lex.string]
+
+[string-literal:](#nt:string-literal "5.13.5 String literals [lex.string]")  
+[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt " [*s-char-sequence*](#nt:s-char-sequence "5.13.5 String literals [lex.string]")opt "  
+[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt R [*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]")
+
+[s-char-sequence:](#nt:s-char-sequence "5.13.5 String literals [lex.string]")  
+[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]") [*s-char-sequence*](#nt:s-char-sequence "5.13.5 String literals [lex.string]")opt
+
+[s-char:](#nt:s-char "5.13.5 String literals [lex.string]")  
+[*basic-s-char*](#nt:basic-s-char "5.13.5 String literals [lex.string]")  
+[*escape-sequence*](lex.ccon#nt:escape-sequence "5.13.3 Character literals [lex.ccon]")  
+[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]")
+
+[basic-s-char:](#nt:basic-s-char "5.13.5 String literals [lex.string]")  
+any member of the translation character set except the U+0022 quotation mark,  
+ U+005c reverse solidus, or new-line character
+
+[raw-string:](#nt:raw-string "5.13.5 String literals [lex.string]")  
+" [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt ( [*r-char-sequence*](#nt:r-char-sequence "5.13.5 String literals [lex.string]")opt ) [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt "
+
+[r-char-sequence:](#nt:r-char-sequence "5.13.5 String literals [lex.string]")  
+[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]") [*r-char-sequence*](#nt:r-char-sequence "5.13.5 String literals [lex.string]")opt
+
+[r-char:](#nt:r-char "5.13.5 String literals [lex.string]")  
+any member of the translation character set, except a U+0029 right parenthesis followed by  
+ the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") (which may be empty) followed by a U+0022 quotation mark
+
+[d-char-sequence:](#nt:d-char-sequence "5.13.5 String literals [lex.string]")  
+[*d-char*](#nt:d-char "5.13.5 String literals [lex.string]") [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt
+
+[d-char:](#nt:d-char "5.13.5 String literals [lex.string]")  
+any member of the basic character set except:  
+ U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,  
+ U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line
+
+[1](#1)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1850)
+
+The kind of a [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]"),
+its type, and
+its associated character encoding ([[lex.charset]](lex.charset "5.3.1 Character sets"))
+are determined by its encoding prefix and sequence of[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]")*s* or [*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s* as defined by Table [12](#tab:lex.string.literal "Table 12: String literals") where n is the number of encoded code units
+that would result from an evaluation of the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") (see below)[.](#1.sentence-1)
+
+Table [12](#tab:lex.string.literal) — String literals [[tab:lex.string.literal]](./tab:lex.string.literal)  
+
+| [ð](#tab:lex.string.literal-row-1)<br>**Enco-** | **Kind** | **Type** | **Associated** | **Examples** |
+| --- | --- | --- | --- | --- |
+| [ð](#tab:lex.string.literal-row-2)<br>**ding** |  |  | **character** |  |
+| [ð](#tab:lex.string.literal-row-3)<br>**prefix** |  |  | **encoding** |  |
+| [ð](#tab:lex.string.literal-row-4)<br>none | [*ordinary string literal*](#def:literal,string,ordinary "5.13.5 String literals [lex.string]") | array of n   const char | ordinary literal encoding | "ordinary string"   R"(ordinary raw string)" |
+| [ð](#tab:lex.string.literal-row-5)<br>L | [*wide string literal*](#def:literal,string,wide "5.13.5 String literals [lex.string]") | array of n   const wchar_t | wide literal   encoding | L"wide string"   LR"w(wide raw string)w" |
+| [ð](#tab:lex.string.literal-row-6)<br>u8 | [*UTF-8 string literal*](#def:literal,string,UTF-8 "5.13.5 String literals [lex.string]") | array of n   const char8_t | UTF-8 | u8"UTF-8 string"   u8R"x(UTF-8 raw string)x" |
+| [ð](#tab:lex.string.literal-row-7)<br>u | [*UTF-16 string literal*](#def:literal,string,UTF-16 "5.13.5 String literals [lex.string]") | array of n   const char16_t | UTF-16 | u"UTF-16 string"   uR"y(UTF-16 raw string)y" |
+| [ð](#tab:lex.string.literal-row-8)<br>U | [*UTF-32 string literal*](#def:literal,string,UTF-32 "5.13.5 String literals [lex.string]") | array of n   const char32_t | UTF-32 | U"UTF-32 string"   UR"z(UTF-32 raw string)z" |
+
+[2](#2)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1909)
+
+A [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") that has an Rin the prefix is a [*raw string literal*](#def:raw_string_literal "5.13.5 String literals [lex.string]")[.](#2.sentence-1)
+
+The[*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") serves as a delimiter[.](#2.sentence-2)
+
+The terminating[*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") of a [*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]") is the same sequence of
+characters as the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")[.](#2.sentence-3)
+
+A [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") shall consist of at most 16 characters[.](#2.sentence-4)
+
+[3](#3)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1919)
+
+[*Note [1](#note-1)*:
+
+The characters '(' and ')' can appear in a[*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]")[.](#3.sentence-1)
+
+Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)"[.](#3.sentence-2)
+
+â *end note*]
+
+[4](#4)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1926)
+
+[*Note [2](#note-2)*:
+
+A source-file new-line in a raw string literal results in a new-line in the
+resulting execution string literal[.](#4.sentence-1)
+
+Assuming no
+whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a\
+b
+c)";
+assert(std::strcmp(p, "a\\\nb\nc") == 0);
+
+â *end note*]
+
+[5](#5)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1939)
+
+[*Example [1](#example-1)*:
+
+The raw stringR"a(
+)\
+a"
+)a" is equivalent to "\n)\\\na\"\n"[.](#5.sentence-1)
+
+The raw stringR"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\""[.](#5.sentence-2)
+
+â *end example*]
+
+[6](#6)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1955)
+
+Ordinary string literals and UTF-8 string literals are
+also referred to as [*narrow string literals*](#def:literal,string,narrow "5.13.5 String literals [lex.string]")[.](#6.sentence-1)
+
+[7](#7)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1960)
+
+The [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* in
+any sequence of adjacent [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* shall have at most one unique [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") among them[.](#7.sentence-1)
+
+The common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") of the sequence is
+that [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]"), if any[.](#7.sentence-2)
+
+[*Note [3](#note-3)*:
+
+A [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s rawness has
+no effect on the determination of the common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")[.](#7.sentence-3)
+
+â *end note*]
+
+[8](#8)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1972)
+
+In translation phase 6 ([[lex.phases]](lex.phases "5.2 Phases of translation")),
+adjacent [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* are concatenated[.](#8.sentence-1)
+
+The lexical structure and grouping of
+the contents of the individual [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* is retained[.](#8.sentence-2)
+
+[*Example [2](#example-2)*:
+
+"\xA" "B" represents
+the code unit '\xA' and the character 'B' after concatenation
+(and not the single code unit '\xAB')[.](#8.sentence-3)
+
+Similarly,R"(\u00)" "41" represents six characters,
+starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a [*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]"))[.](#8.sentence-4)
+
+Table [13](#tab:lex.string.concat "Table 13: String literal concatenations") has some examples of valid concatenations[.](#8.sentence-5)
+
+â *end example*]
+
+Table [13](#tab:lex.string.concat) — String literal concatenations [[tab:lex.string.concat]](./tab:lex.string.concat)  
+
+| [ð](#tab:lex.string.concat-row-1)<br>Source | | Means | Source | | Means | Source | | Means |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| [ð](#tab:lex.string.concat-row-2)<br>u"a" | u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" |
+| [ð](#tab:lex.string.concat-row-3)<br>u"a" | "b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" |
+| [ð](#tab:lex.string.concat-row-4)<br>"a" | u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" |
+
+[9](#9)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2017)
+
+Evaluating a [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") results in a string literal object
+with static storage duration ([[basic.stc]](basic.stc "6.8.6 Storage duration"))[.](#9.sentence-1)
+
+[*Note [4](#note-4)*:
+
+String literal objects are potentially non-unique ([[intro.object]](intro.object "6.8.2 Object model"))[.](#9.sentence-2)
+
+Whether successive evaluations of a[*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") yield the same or a different object is
+unspecified[.](#9.sentence-3)
+
+â *end note*]
+
+[*Note [5](#note-5)*:
+
+The effect of attempting to modify a string literal object is undefined[.](#9.sentence-4)
+
+â *end note*]
+
+[10](#10)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2031)
+
+String literal objects are initialized with
+the sequence of code unit values
+corresponding to the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s sequence of[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]")*s* (originally from non-raw string literals) and[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s* (originally from raw string literals),
+plus a terminating U+0000 null character,
+in order as follows:
+
+- [(10.1)](#10.1)
+
+  The sequence of characters denoted by each contiguous sequence of[*basic-s-char*](#nt:basic-s-char "5.13.5 String literals [lex.string]")*s*,[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s*,[*simple-escape-sequence*](lex.ccon#nt:simple-escape-sequence "5.13.3 Character literals [lex.ccon]")*s* ([[lex.ccon]](lex.ccon "5.13.3 Character literals")), and[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]")*s* ([[lex.charset]](lex.charset "5.3.1 Character sets"))
+is encoded to a code unit sequence
+using the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s associated character encoding[.](#10.1.sentence-1)
+  If a character lacks representation in the associated character encoding,
+then the program is ill-formed[.](#10.1.sentence-2)
+  [*Note [6](#note-6)*:
+  No character lacks representation in any Unicode encoding form[.](#10.1.sentence-3)
+ â *end note*]
+   When encoding a stateful character encoding,
+implementations should encode the first such sequence
+beginning with the initial encoding state and
+encode subsequent sequences
+beginning with the final encoding state of the prior sequence[.](#10.1.sentence-4)
+  [*Note [7](#note-7)*:
+  The encoded code unit sequence can differ from
+the sequence of code units that would be obtained by
+encoding each character independently[.](#10.1.sentence-5)
+ â *end note*]
+
+- [(10.2)](#10.2)
+
+  Each [*numeric-escape-sequence*](lex.ccon#nt:numeric-escape-sequence "5.13.3 Character literals [lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3 Character literals"))
+contributes a single code unit with a value as follows:
+  * [(10.2.1)](#10.2.1)
+
+      Let v be the integer value represented by
+the octal number comprising
+the sequence of [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]")*s* in
+an [*octal-escape-sequence*](lex.ccon#nt:octal-escape-sequence "5.13.3 Character literals [lex.ccon]") or by
+the hexadecimal number comprising
+the sequence of [*hexadecimal-digit*](lex.icon#nt:hexadecimal-digit "5.13.2 Integer literals [lex.icon]")*s* in
+a [*hexadecimal-escape-sequence*](lex.ccon#nt:hexadecimal-escape-sequence "5.13.3 Character literals [lex.ccon]")[.](#10.2.1.sentence-1)
+
+  * [(10.2.2)](#10.2.2)
+
+      If v does not exceed the range of representable values of
+the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type,
+then the value is v[.](#10.2.2.sentence-1)
+
+  * [(10.2.3)](#10.2.3)
+
+      Otherwise,
+if the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") is absent or L, andv does not exceed the range of representable values of
+the corresponding unsigned type for the underlying type of
+the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type,
+then the value is the unique value of
+the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type T that is congruent to v modulo 2N, where N is the width of T[.](#10.2.3.sentence-1)
+
+  * [(10.2.4)](#10.2.4)
+
+      Otherwise, the program is ill-formed[.](#10.2.4.sentence-1)
+
+   When encoding a stateful character encoding,
+these sequences should have no effect on encoding state[.](#10.2.sentence-2)
+
+- [(10.3)](#10.3)
+
+  Each [*conditional-escape-sequence*](lex.ccon#nt:conditional-escape-sequence "5.13.3 Character literals [lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3 Character literals"))
+contributes animplementation-defined
+code unit sequence[.](#10.3.sentence-1)
+  When encoding a stateful character encoding,
+it isimplementation-defined
+what effect these sequences have on encoding state[.](#10.3.sentence-2)