Init
This commit is contained in:
273
cppdraft/lex/string.md
Normal file
273
cppdraft/lex/string.md
Normal file
@@ -0,0 +1,273 @@
|
||||
[lex.string]
|
||||
|
||||
# 5 Lexical conventions [[lex]](./#lex)
|
||||
|
||||
## 5.13 Literals [[lex.literal]](lex.literal#lex.string)
|
||||
|
||||
### 5.13.5 String literals [lex.string]
|
||||
|
||||
[string-literal:](#nt:string-literal "5.13.5 String literals [lex.string]")
|
||||
[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt " [*s-char-sequence*](#nt:s-char-sequence "5.13.5 String literals [lex.string]")opt "
|
||||
[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt R [*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]")
|
||||
|
||||
[s-char-sequence:](#nt:s-char-sequence "5.13.5 String literals [lex.string]")
|
||||
[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]") [*s-char-sequence*](#nt:s-char-sequence "5.13.5 String literals [lex.string]")opt
|
||||
|
||||
[s-char:](#nt:s-char "5.13.5 String literals [lex.string]")
|
||||
[*basic-s-char*](#nt:basic-s-char "5.13.5 String literals [lex.string]")
|
||||
[*escape-sequence*](lex.ccon#nt:escape-sequence "5.13.3 Character literals [lex.ccon]")
|
||||
[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]")
|
||||
|
||||
[basic-s-char:](#nt:basic-s-char "5.13.5 String literals [lex.string]")
|
||||
any member of the translation character set except the U+0022 quotation mark,
|
||||
U+005c reverse solidus, or new-line character
|
||||
|
||||
[raw-string:](#nt:raw-string "5.13.5 String literals [lex.string]")
|
||||
" [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt ( [*r-char-sequence*](#nt:r-char-sequence "5.13.5 String literals [lex.string]")opt ) [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt "
|
||||
|
||||
[r-char-sequence:](#nt:r-char-sequence "5.13.5 String literals [lex.string]")
|
||||
[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]") [*r-char-sequence*](#nt:r-char-sequence "5.13.5 String literals [lex.string]")opt
|
||||
|
||||
[r-char:](#nt:r-char "5.13.5 String literals [lex.string]")
|
||||
any member of the translation character set, except a U+0029 right parenthesis followed by
|
||||
the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") (which may be empty) followed by a U+0022 quotation mark
|
||||
|
||||
[d-char-sequence:](#nt:d-char-sequence "5.13.5 String literals [lex.string]")
|
||||
[*d-char*](#nt:d-char "5.13.5 String literals [lex.string]") [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt
|
||||
|
||||
[d-char:](#nt:d-char "5.13.5 String literals [lex.string]")
|
||||
any member of the basic character set except:
|
||||
U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
|
||||
U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line
|
||||
|
||||
[1](#1)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1850)
|
||||
|
||||
The kind of a [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]"),
|
||||
its type, and
|
||||
its associated character encoding ([[lex.charset]](lex.charset "5.3.1 Character sets"))
|
||||
are determined by its encoding prefix and sequence of[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]")*s* or [*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s* as defined by Table [12](#tab:lex.string.literal "Table 12: String literals") where n is the number of encoded code units
|
||||
that would result from an evaluation of the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") (see below)[.](#1.sentence-1)
|
||||
|
||||
Table [12](#tab:lex.string.literal) — String literals [[tab:lex.string.literal]](./tab:lex.string.literal)
|
||||
|
||||
| [ð](#tab:lex.string.literal-row-1)<br>**Enco-** | **Kind** | **Type** | **Associated** | **Examples** |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| [ð](#tab:lex.string.literal-row-2)<br>**ding** | | | **character** | |
|
||||
| [ð](#tab:lex.string.literal-row-3)<br>**prefix** | | | **encoding** | |
|
||||
| [ð](#tab:lex.string.literal-row-4)<br>none | [*ordinary string literal*](#def:literal,string,ordinary "5.13.5 String literals [lex.string]") | array of n const char | ordinary literal encoding | "ordinary string" R"(ordinary raw string)" |
|
||||
| [ð](#tab:lex.string.literal-row-5)<br>L | [*wide string literal*](#def:literal,string,wide "5.13.5 String literals [lex.string]") | array of n const wchar_t | wide literal encoding | L"wide string" LR"w(wide raw string)w" |
|
||||
| [ð](#tab:lex.string.literal-row-6)<br>u8 | [*UTF-8 string literal*](#def:literal,string,UTF-8 "5.13.5 String literals [lex.string]") | array of n const char8_t | UTF-8 | u8"UTF-8 string" u8R"x(UTF-8 raw string)x" |
|
||||
| [ð](#tab:lex.string.literal-row-7)<br>u | [*UTF-16 string literal*](#def:literal,string,UTF-16 "5.13.5 String literals [lex.string]") | array of n const char16_t | UTF-16 | u"UTF-16 string" uR"y(UTF-16 raw string)y" |
|
||||
| [ð](#tab:lex.string.literal-row-8)<br>U | [*UTF-32 string literal*](#def:literal,string,UTF-32 "5.13.5 String literals [lex.string]") | array of n const char32_t | UTF-32 | U"UTF-32 string" UR"z(UTF-32 raw string)z" |
|
||||
|
||||
[2](#2)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1909)
|
||||
|
||||
A [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") that has an Rin the prefix is a [*raw string literal*](#def:raw_string_literal "5.13.5 String literals [lex.string]")[.](#2.sentence-1)
|
||||
|
||||
The[*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") serves as a delimiter[.](#2.sentence-2)
|
||||
|
||||
The terminating[*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") of a [*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]") is the same sequence of
|
||||
characters as the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")[.](#2.sentence-3)
|
||||
|
||||
A [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") shall consist of at most 16 characters[.](#2.sentence-4)
|
||||
|
||||
[3](#3)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1919)
|
||||
|
||||
[*Note [1](#note-1)*:
|
||||
|
||||
The characters '(' and ')' can appear in a[*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]")[.](#3.sentence-1)
|
||||
|
||||
Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)"[.](#3.sentence-2)
|
||||
|
||||
â *end note*]
|
||||
|
||||
[4](#4)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1926)
|
||||
|
||||
[*Note [2](#note-2)*:
|
||||
|
||||
A source-file new-line in a raw string literal results in a new-line in the
|
||||
resulting execution string literal[.](#4.sentence-1)
|
||||
|
||||
Assuming no
|
||||
whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a\
|
||||
b
|
||||
c)";
|
||||
assert(std::strcmp(p, "a\\\nb\nc") == 0);
|
||||
|
||||
â *end note*]
|
||||
|
||||
[5](#5)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1939)
|
||||
|
||||
[*Example [1](#example-1)*:
|
||||
|
||||
The raw stringR"a(
|
||||
)\
|
||||
a"
|
||||
)a" is equivalent to "\n)\\\na\"\n"[.](#5.sentence-1)
|
||||
|
||||
The raw stringR"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\""[.](#5.sentence-2)
|
||||
|
||||
â *end example*]
|
||||
|
||||
[6](#6)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1955)
|
||||
|
||||
Ordinary string literals and UTF-8 string literals are
|
||||
also referred to as [*narrow string literals*](#def:literal,string,narrow "5.13.5 String literals [lex.string]")[.](#6.sentence-1)
|
||||
|
||||
[7](#7)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1960)
|
||||
|
||||
The [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* in
|
||||
any sequence of adjacent [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* shall have at most one unique [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") among them[.](#7.sentence-1)
|
||||
|
||||
The common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") of the sequence is
|
||||
that [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]"), if any[.](#7.sentence-2)
|
||||
|
||||
[*Note [3](#note-3)*:
|
||||
|
||||
A [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s rawness has
|
||||
no effect on the determination of the common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")[.](#7.sentence-3)
|
||||
|
||||
â *end note*]
|
||||
|
||||
[8](#8)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1972)
|
||||
|
||||
In translation phase 6 ([[lex.phases]](lex.phases "5.2 Phases of translation")),
|
||||
adjacent [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* are concatenated[.](#8.sentence-1)
|
||||
|
||||
The lexical structure and grouping of
|
||||
the contents of the individual [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* is retained[.](#8.sentence-2)
|
||||
|
||||
[*Example [2](#example-2)*:
|
||||
|
||||
"\xA" "B" represents
|
||||
the code unit '\xA' and the character 'B' after concatenation
|
||||
(and not the single code unit '\xAB')[.](#8.sentence-3)
|
||||
|
||||
Similarly,R"(\u00)" "41" represents six characters,
|
||||
starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a [*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]"))[.](#8.sentence-4)
|
||||
|
||||
Table [13](#tab:lex.string.concat "Table 13: String literal concatenations") has some examples of valid concatenations[.](#8.sentence-5)
|
||||
|
||||
â *end example*]
|
||||
|
||||
Table [13](#tab:lex.string.concat) — String literal concatenations [[tab:lex.string.concat]](./tab:lex.string.concat)
|
||||
|
||||
| [ð](#tab:lex.string.concat-row-1)<br>Source | | Means | Source | | Means | Source | | Means |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| [ð](#tab:lex.string.concat-row-2)<br>u"a" | u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" |
|
||||
| [ð](#tab:lex.string.concat-row-3)<br>u"a" | "b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" |
|
||||
| [ð](#tab:lex.string.concat-row-4)<br>"a" | u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" |
|
||||
|
||||
[9](#9)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2017)
|
||||
|
||||
Evaluating a [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") results in a string literal object
|
||||
with static storage duration ([[basic.stc]](basic.stc "6.8.6 Storage duration"))[.](#9.sentence-1)
|
||||
|
||||
[*Note [4](#note-4)*:
|
||||
|
||||
String literal objects are potentially non-unique ([[intro.object]](intro.object "6.8.2 Object model"))[.](#9.sentence-2)
|
||||
|
||||
Whether successive evaluations of a[*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") yield the same or a different object is
|
||||
unspecified[.](#9.sentence-3)
|
||||
|
||||
â *end note*]
|
||||
|
||||
[*Note [5](#note-5)*:
|
||||
|
||||
The effect of attempting to modify a string literal object is undefined[.](#9.sentence-4)
|
||||
|
||||
â *end note*]
|
||||
|
||||
[10](#10)
|
||||
|
||||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2031)
|
||||
|
||||
String literal objects are initialized with
|
||||
the sequence of code unit values
|
||||
corresponding to the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s sequence of[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]")*s* (originally from non-raw string literals) and[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s* (originally from raw string literals),
|
||||
plus a terminating U+0000 null character,
|
||||
in order as follows:
|
||||
|
||||
- [(10.1)](#10.1)
|
||||
|
||||
The sequence of characters denoted by each contiguous sequence of[*basic-s-char*](#nt:basic-s-char "5.13.5 String literals [lex.string]")*s*,[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s*,[*simple-escape-sequence*](lex.ccon#nt:simple-escape-sequence "5.13.3 Character literals [lex.ccon]")*s* ([[lex.ccon]](lex.ccon "5.13.3 Character literals")), and[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]")*s* ([[lex.charset]](lex.charset "5.3.1 Character sets"))
|
||||
is encoded to a code unit sequence
|
||||
using the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s associated character encoding[.](#10.1.sentence-1)
|
||||
If a character lacks representation in the associated character encoding,
|
||||
then the program is ill-formed[.](#10.1.sentence-2)
|
||||
[*Note [6](#note-6)*:
|
||||
No character lacks representation in any Unicode encoding form[.](#10.1.sentence-3)
|
||||
â *end note*]
|
||||
When encoding a stateful character encoding,
|
||||
implementations should encode the first such sequence
|
||||
beginning with the initial encoding state and
|
||||
encode subsequent sequences
|
||||
beginning with the final encoding state of the prior sequence[.](#10.1.sentence-4)
|
||||
[*Note [7](#note-7)*:
|
||||
The encoded code unit sequence can differ from
|
||||
the sequence of code units that would be obtained by
|
||||
encoding each character independently[.](#10.1.sentence-5)
|
||||
â *end note*]
|
||||
|
||||
- [(10.2)](#10.2)
|
||||
|
||||
Each [*numeric-escape-sequence*](lex.ccon#nt:numeric-escape-sequence "5.13.3 Character literals [lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3 Character literals"))
|
||||
contributes a single code unit with a value as follows:
|
||||
* [(10.2.1)](#10.2.1)
|
||||
|
||||
Let v be the integer value represented by
|
||||
the octal number comprising
|
||||
the sequence of [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]")*s* in
|
||||
an [*octal-escape-sequence*](lex.ccon#nt:octal-escape-sequence "5.13.3 Character literals [lex.ccon]") or by
|
||||
the hexadecimal number comprising
|
||||
the sequence of [*hexadecimal-digit*](lex.icon#nt:hexadecimal-digit "5.13.2 Integer literals [lex.icon]")*s* in
|
||||
a [*hexadecimal-escape-sequence*](lex.ccon#nt:hexadecimal-escape-sequence "5.13.3 Character literals [lex.ccon]")[.](#10.2.1.sentence-1)
|
||||
|
||||
* [(10.2.2)](#10.2.2)
|
||||
|
||||
If v does not exceed the range of representable values of
|
||||
the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type,
|
||||
then the value is v[.](#10.2.2.sentence-1)
|
||||
|
||||
* [(10.2.3)](#10.2.3)
|
||||
|
||||
Otherwise,
|
||||
if the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") is absent or L, andv does not exceed the range of representable values of
|
||||
the corresponding unsigned type for the underlying type of
|
||||
the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type,
|
||||
then the value is the unique value of
|
||||
the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type T that is congruent to v modulo 2N, where N is the width of T[.](#10.2.3.sentence-1)
|
||||
|
||||
* [(10.2.4)](#10.2.4)
|
||||
|
||||
Otherwise, the program is ill-formed[.](#10.2.4.sentence-1)
|
||||
|
||||
When encoding a stateful character encoding,
|
||||
these sequences should have no effect on encoding state[.](#10.2.sentence-2)
|
||||
|
||||
- [(10.3)](#10.3)
|
||||
|
||||
Each [*conditional-escape-sequence*](lex.ccon#nt:conditional-escape-sequence "5.13.3 Character literals [lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3 Character literals"))
|
||||
contributes animplementation-defined
|
||||
code unit sequence[.](#10.3.sentence-1)
|
||||
When encoding a stateful character encoding,
|
||||
it isimplementation-defined
|
||||
what effect these sequences have on encoding state[.](#10.3.sentence-2)
|
||||
Reference in New Issue
Block a user