[lex.string] # 5 Lexical conventions [[lex]](./#lex) ## 5.13 Literals [[lex.literal]](lex.literal#lex.string) ### 5.13.5 String literals [lex.string] [string-literal:](#nt:string-literal "5.13.5 String literals [lex.string]") [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt " [*s-char-sequence*](#nt:s-char-sequence "5.13.5 String literals [lex.string]")opt " [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt R [*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]") [s-char-sequence:](#nt:s-char-sequence "5.13.5 String literals [lex.string]") [*s-char*](#nt:s-char "5.13.5 String literals [lex.string]") [*s-char-sequence*](#nt:s-char-sequence "5.13.5 String literals [lex.string]")opt [s-char:](#nt:s-char "5.13.5 String literals [lex.string]") [*basic-s-char*](#nt:basic-s-char "5.13.5 String literals [lex.string]") [*escape-sequence*](lex.ccon#nt:escape-sequence "5.13.3 Character literals [lex.ccon]") [*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]") [basic-s-char:](#nt:basic-s-char "5.13.5 String literals [lex.string]") any member of the translation character set except the U+0022 quotation mark, U+005c reverse solidus, or new-line character [raw-string:](#nt:raw-string "5.13.5 String literals [lex.string]") " [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt ( [*r-char-sequence*](#nt:r-char-sequence "5.13.5 String literals [lex.string]")opt ) [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt " [r-char-sequence:](#nt:r-char-sequence "5.13.5 String literals [lex.string]") [*r-char*](#nt:r-char "5.13.5 String literals [lex.string]") [*r-char-sequence*](#nt:r-char-sequence "5.13.5 String literals [lex.string]")opt [r-char:](#nt:r-char "5.13.5 String literals [lex.string]") any member of the translation character set, except a U+0029 right parenthesis followed by the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") (which may be empty) followed by a U+0022 quotation mark [d-char-sequence:](#nt:d-char-sequence "5.13.5 String literals [lex.string]") [*d-char*](#nt:d-char "5.13.5 String literals [lex.string]") [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt [d-char:](#nt:d-char "5.13.5 String literals [lex.string]") any member of the basic character set except: U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus, U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line [1](#1) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1850) The kind of a [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]"), its type, and its associated character encoding ([[lex.charset]](lex.charset "5.3.1 Character sets")) are determined by its encoding prefix and sequence of[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]")*s* or [*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s* as defined by Table [12](#tab:lex.string.literal "Table 12: String literals") where n is the number of encoded code units that would result from an evaluation of the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") (see below)[.](#1.sentence-1) Table [12](#tab:lex.string.literal) — String literals [[tab:lex.string.literal]](./tab:lex.string.literal) | [🔗](#tab:lex.string.literal-row-1)
**Enco-** | **Kind** | **Type** | **Associated** | **Examples** | | --- | --- | --- | --- | --- | | [🔗](#tab:lex.string.literal-row-2)
**ding** | | | **character** | | | [🔗](#tab:lex.string.literal-row-3)
**prefix** | | | **encoding** | | | [🔗](#tab:lex.string.literal-row-4)
none | [*ordinary string literal*](#def:literal,string,ordinary "5.13.5 String literals [lex.string]") | array of n const char | ordinary literal encoding | "ordinary string" R"(ordinary raw string)" | | [🔗](#tab:lex.string.literal-row-5)
L | [*wide string literal*](#def:literal,string,wide "5.13.5 String literals [lex.string]") | array of n const wchar_t | wide literal encoding | L"wide string" LR"w(wide raw string)w" | | [🔗](#tab:lex.string.literal-row-6)
u8 | [*UTF-8 string literal*](#def:literal,string,UTF-8 "5.13.5 String literals [lex.string]") | array of n const char8_t | UTF-8 | u8"UTF-8 string" u8R"x(UTF-8 raw string)x" | | [🔗](#tab:lex.string.literal-row-7)
u | [*UTF-16 string literal*](#def:literal,string,UTF-16 "5.13.5 String literals [lex.string]") | array of n const char16_t | UTF-16 | u"UTF-16 string" uR"y(UTF-16 raw string)y" | | [🔗](#tab:lex.string.literal-row-8)
U | [*UTF-32 string literal*](#def:literal,string,UTF-32 "5.13.5 String literals [lex.string]") | array of n const char32_t | UTF-32 | U"UTF-32 string" UR"z(UTF-32 raw string)z" | [2](#2) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1909) A [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") that has an Rin the prefix is a [*raw string literal*](#def:raw_string_literal "5.13.5 String literals [lex.string]")[.](#2.sentence-1) The[*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") serves as a delimiter[.](#2.sentence-2) The terminating[*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") of a [*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]") is the same sequence of characters as the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")[.](#2.sentence-3) A [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") shall consist of at most 16 characters[.](#2.sentence-4) [3](#3) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1919) [*Note [1](#note-1)*: The characters '(' and ')' can appear in a[*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]")[.](#3.sentence-1) Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)"[.](#3.sentence-2) — *end note*] [4](#4) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1926) [*Note [2](#note-2)*: A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal[.](#4.sentence-1) Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0); — *end note*] [5](#5) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1939) [*Example [1](#example-1)*: The raw stringR"a( )\ a" )a" is equivalent to "\n)\\\na\"\n"[.](#5.sentence-1) The raw stringR"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\""[.](#5.sentence-2) — *end example*] [6](#6) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1955) Ordinary string literals and UTF-8 string literals are also referred to as [*narrow string literals*](#def:literal,string,narrow "5.13.5 String literals [lex.string]")[.](#6.sentence-1) [7](#7) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1960) The [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* in any sequence of adjacent [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* shall have at most one unique [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") among them[.](#7.sentence-1) The common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") of the sequence is that [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]"), if any[.](#7.sentence-2) [*Note [3](#note-3)*: A [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s rawness has no effect on the determination of the common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")[.](#7.sentence-3) — *end note*] [8](#8) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1972) In translation phase 6 ([[lex.phases]](lex.phases "5.2 Phases of translation")), adjacent [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* are concatenated[.](#8.sentence-1) The lexical structure and grouping of the contents of the individual [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* is retained[.](#8.sentence-2) [*Example [2](#example-2)*: "\xA" "B" represents the code unit '\xA' and the character 'B' after concatenation (and not the single code unit '\xAB')[.](#8.sentence-3) Similarly,R"(\u00)" "41" represents six characters, starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a [*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]"))[.](#8.sentence-4) Table [13](#tab:lex.string.concat "Table 13: String literal concatenations") has some examples of valid concatenations[.](#8.sentence-5) — *end example*] Table [13](#tab:lex.string.concat) — String literal concatenations [[tab:lex.string.concat]](./tab:lex.string.concat) | [🔗](#tab:lex.string.concat-row-1)
Source | | Means | Source | | Means | Source | | Means | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | [🔗](#tab:lex.string.concat-row-2)
u"a" | u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" | | [🔗](#tab:lex.string.concat-row-3)
u"a" | "b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" | | [🔗](#tab:lex.string.concat-row-4)
"a" | u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" | [9](#9) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2017) Evaluating a [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") results in a string literal object with static storage duration ([[basic.stc]](basic.stc "6.8.6 Storage duration"))[.](#9.sentence-1) [*Note [4](#note-4)*: String literal objects are potentially non-unique ([[intro.object]](intro.object "6.8.2 Object model"))[.](#9.sentence-2) Whether successive evaluations of a[*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") yield the same or a different object is unspecified[.](#9.sentence-3) — *end note*] [*Note [5](#note-5)*: The effect of attempting to modify a string literal object is undefined[.](#9.sentence-4) — *end note*] [10](#10) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2031) String literal objects are initialized with the sequence of code unit values corresponding to the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s sequence of[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]")*s* (originally from non-raw string literals) and[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s* (originally from raw string literals), plus a terminating U+0000 null character, in order as follows: - [(10.1)](#10.1) The sequence of characters denoted by each contiguous sequence of[*basic-s-char*](#nt:basic-s-char "5.13.5 String literals [lex.string]")*s*,[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s*,[*simple-escape-sequence*](lex.ccon#nt:simple-escape-sequence "5.13.3 Character literals [lex.ccon]")*s* ([[lex.ccon]](lex.ccon "5.13.3 Character literals")), and[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]")*s* ([[lex.charset]](lex.charset "5.3.1 Character sets")) is encoded to a code unit sequence using the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s associated character encoding[.](#10.1.sentence-1) If a character lacks representation in the associated character encoding, then the program is ill-formed[.](#10.1.sentence-2) [*Note [6](#note-6)*: No character lacks representation in any Unicode encoding form[.](#10.1.sentence-3) — *end note*] When encoding a stateful character encoding, implementations should encode the first such sequence beginning with the initial encoding state and encode subsequent sequences beginning with the final encoding state of the prior sequence[.](#10.1.sentence-4) [*Note [7](#note-7)*: The encoded code unit sequence can differ from the sequence of code units that would be obtained by encoding each character independently[.](#10.1.sentence-5) — *end note*] - [(10.2)](#10.2) Each [*numeric-escape-sequence*](lex.ccon#nt:numeric-escape-sequence "5.13.3 Character literals [lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3 Character literals")) contributes a single code unit with a value as follows: * [(10.2.1)](#10.2.1) Let v be the integer value represented by the octal number comprising the sequence of [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]")*s* in an [*octal-escape-sequence*](lex.ccon#nt:octal-escape-sequence "5.13.3 Character literals [lex.ccon]") or by the hexadecimal number comprising the sequence of [*hexadecimal-digit*](lex.icon#nt:hexadecimal-digit "5.13.2 Integer literals [lex.icon]")*s* in a [*hexadecimal-escape-sequence*](lex.ccon#nt:hexadecimal-escape-sequence "5.13.3 Character literals [lex.ccon]")[.](#10.2.1.sentence-1) * [(10.2.2)](#10.2.2) If v does not exceed the range of representable values of the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type, then the value is v[.](#10.2.2.sentence-1) * [(10.2.3)](#10.2.3) Otherwise, if the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") is absent or L, andv does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type, then the value is the unique value of the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type T that is congruent to v modulo 2N, where N is the width of T[.](#10.2.3.sentence-1) * [(10.2.4)](#10.2.4) Otherwise, the program is ill-formed[.](#10.2.4.sentence-1) When encoding a stateful character encoding, these sequences should have no effect on encoding state[.](#10.2.sentence-2) - [(10.3)](#10.3) Each [*conditional-escape-sequence*](lex.ccon#nt:conditional-escape-sequence "5.13.3 Character literals [lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3 Character literals")) contributes animplementation-defined code unit sequence[.](#10.3.sentence-1) When encoding a stateful character encoding, it isimplementation-defined what effect these sequences have on encoding state[.](#10.3.sentence-2)