cppdraft_translate/cppdraft/lex/string.md

[lex.string]

# 5 Lexical conventions [[lex]](./#lex)

## 5.13 Literals [[lex.literal]](lex.literal#lex.string)

### 5.13.5 String literals [lex.string]

[string-literal:](#nt:string-literal "5.13.5 String literals [lex.string]")
[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt " [*s-char-sequence*](#nt:s-char-sequence "5.13.5 String literals [lex.string]")opt "
[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt R [*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]")

[s-char-sequence:](#nt:s-char-sequence "5.13.5 String literals [lex.string]")
[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]") [*s-char-sequence*](#nt:s-char-sequence "5.13.5 String literals [lex.string]")opt

[s-char:](#nt:s-char "5.13.5 String literals [lex.string]")
[*basic-s-char*](#nt:basic-s-char "5.13.5 String literals [lex.string]")
[*escape-sequence*](lex.ccon#nt:escape-sequence "5.13.3 Character literals [lex.ccon]")
[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]")

[basic-s-char:](#nt:basic-s-char "5.13.5 String literals [lex.string]")
any member of the translation character set except the U+0022 quotation mark,
 U+005c reverse solidus, or new-line character

[raw-string:](#nt:raw-string "5.13.5 String literals [lex.string]")
" [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt ( [*r-char-sequence*](#nt:r-char-sequence "5.13.5 String literals [lex.string]")opt ) [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt "

[r-char-sequence:](#nt:r-char-sequence "5.13.5 String literals [lex.string]")
[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]") [*r-char-sequence*](#nt:r-char-sequence "5.13.5 String literals [lex.string]")opt

[r-char:](#nt:r-char "5.13.5 String literals [lex.string]")
any member of the translation character set, except a U+0029 right parenthesis followed by
 the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") (which may be empty) followed by a U+0022 quotation mark

[d-char-sequence:](#nt:d-char-sequence "5.13.5 String literals [lex.string]")
[*d-char*](#nt:d-char "5.13.5 String literals [lex.string]") [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")opt

[d-char:](#nt:d-char "5.13.5 String literals [lex.string]")
any member of the basic character set except:
 U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
 U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line

[1](#1)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1850)

The kind of a [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]"),
its type, and
its associated character encoding ([[lex.charset]](lex.charset "5.3.1 Character sets"))
are determined by its encoding prefix and sequence of[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]")*s* or [*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s* as defined by Table [12](#tab:lex.string.literal "Table 12: String literals") where n is the number of encoded code units
that would result from an evaluation of the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") (see below)[.](#1.sentence-1)

Table [12](#tab:lex.string.literal) — String literals [[tab:lex.string.literal]](./tab:lex.string.literal)

| [ð](#tab:lex.string.literal-row-1)<br>**Enco-** | **Kind** | **Type** | **Associated** | **Examples** |
| --- | --- | --- | --- | --- |
| [ð](#tab:lex.string.literal-row-2)<br>**ding** |  |  | **character** |  |
| [ð](#tab:lex.string.literal-row-3)<br>**prefix** |  |  | **encoding** |  |
| [ð](#tab:lex.string.literal-row-4)<br>none | [*ordinary string literal*](#def:literal,string,ordinary "5.13.5 String literals [lex.string]") | array of n   const char | ordinary literal encoding | "ordinary string"   R"(ordinary raw string)" |
| [ð](#tab:lex.string.literal-row-5)<br>L | [*wide string literal*](#def:literal,string,wide "5.13.5 String literals [lex.string]") | array of n   const wchar_t | wide literal   encoding | L"wide string"   LR"w(wide raw string)w" |
| [ð](#tab:lex.string.literal-row-6)<br>u8 | [*UTF-8 string literal*](#def:literal,string,UTF-8 "5.13.5 String literals [lex.string]") | array of n   const char8_t | UTF-8 | u8"UTF-8 string"   u8R"x(UTF-8 raw string)x" |
| [ð](#tab:lex.string.literal-row-7)<br>u | [*UTF-16 string literal*](#def:literal,string,UTF-16 "5.13.5 String literals [lex.string]") | array of n   const char16_t | UTF-16 | u"UTF-16 string"   uR"y(UTF-16 raw string)y" |
| [ð](#tab:lex.string.literal-row-8)<br>U | [*UTF-32 string literal*](#def:literal,string,UTF-32 "5.13.5 String literals [lex.string]") | array of n   const char32_t | UTF-32 | U"UTF-32 string"   UR"z(UTF-32 raw string)z" |

[2](#2)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1909)

A [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") that has an Rin the prefix is a [*raw string literal*](#def:raw_string_literal "5.13.5 String literals [lex.string]")[.](#2.sentence-1)

The[*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") serves as a delimiter[.](#2.sentence-2)

The terminating[*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") of a [*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]") is the same sequence of
characters as the initial [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]")[.](#2.sentence-3)

A [*d-char-sequence*](#nt:d-char-sequence "5.13.5 String literals [lex.string]") shall consist of at most 16 characters[.](#2.sentence-4)

[3](#3)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1919)

[*Note [1](#note-1)*:

The characters '(' and ')' can appear in a[*raw-string*](#nt:raw-string "5.13.5 String literals [lex.string]")[.](#3.sentence-1)

Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)"[.](#3.sentence-2)

â *end note*]

[4](#4)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1926)

[*Note [2](#note-2)*:

A source-file new-line in a raw string literal results in a new-line in the
resulting execution string literal[.](#4.sentence-1)

Assuming no
whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);

â *end note*]

[5](#5)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1939)

[*Example [1](#example-1)*:

The raw stringR"a(
)\
a"
)a" is equivalent to "\n)\\\na\"\n"[.](#5.sentence-1)

The raw stringR"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\""[.](#5.sentence-2)

â *end example*]

[6](#6)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1955)

Ordinary string literals and UTF-8 string literals are
also referred to as [*narrow string literals*](#def:literal,string,narrow "5.13.5 String literals [lex.string]")[.](#6.sentence-1)

[7](#7)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1960)

The [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* in
any sequence of adjacent [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* shall have at most one unique [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") among them[.](#7.sentence-1)

The common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") of the sequence is
that [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]"), if any[.](#7.sentence-2)

[*Note [3](#note-3)*:

A [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s rawness has
no effect on the determination of the common [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")[.](#7.sentence-3)

â *end note*]

[8](#8)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1972)

In translation phase 6 ([[lex.phases]](lex.phases "5.2 Phases of translation")),
adjacent [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* are concatenated[.](#8.sentence-1)

The lexical structure and grouping of
the contents of the individual [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")*s* is retained[.](#8.sentence-2)

[*Example [2](#example-2)*:

"\xA" "B" represents
the code unit '\xA' and the character 'B' after concatenation
(and not the single code unit '\xAB')[.](#8.sentence-3)

Similarly,R"(\u00)" "41" represents six characters,
starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a [*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]"))[.](#8.sentence-4)

Table [13](#tab:lex.string.concat "Table 13: String literal concatenations") has some examples of valid concatenations[.](#8.sentence-5)

â *end example*]

Table [13](#tab:lex.string.concat) — String literal concatenations [[tab:lex.string.concat]](./tab:lex.string.concat)

| [ð](#tab:lex.string.concat-row-1)<br>Source | | Means | Source | | Means | Source | | Means |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| [ð](#tab:lex.string.concat-row-2)<br>u"a" | u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" |
| [ð](#tab:lex.string.concat-row-3)<br>u"a" | "b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" |
| [ð](#tab:lex.string.concat-row-4)<br>"a" | u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" |

[9](#9)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2017)

Evaluating a [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") results in a string literal object
with static storage duration ([[basic.stc]](basic.stc "6.8.6 Storage duration"))[.](#9.sentence-1)

[*Note [4](#note-4)*:

String literal objects are potentially non-unique ([[intro.object]](intro.object "6.8.2 Object model"))[.](#9.sentence-2)

Whether successive evaluations of a[*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]") yield the same or a different object is
unspecified[.](#9.sentence-3)

â *end note*]

[*Note [5](#note-5)*:

The effect of attempting to modify a string literal object is undefined[.](#9.sentence-4)

â *end note*]

[10](#10)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L2031)

String literal objects are initialized with
the sequence of code unit values
corresponding to the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s sequence of[*s-char*](#nt:s-char "5.13.5 String literals [lex.string]")*s* (originally from non-raw string literals) and[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s* (originally from raw string literals),
plus a terminating U+0000 null character,
in order as follows:

- [(10.1)](#10.1)

  The sequence of characters denoted by each contiguous sequence of[*basic-s-char*](#nt:basic-s-char "5.13.5 String literals [lex.string]")*s*,[*r-char*](#nt:r-char "5.13.5 String literals [lex.string]")*s*,[*simple-escape-sequence*](lex.ccon#nt:simple-escape-sequence "5.13.3 Character literals [lex.ccon]")*s* ([[lex.ccon]](lex.ccon "5.13.3 Character literals")), and[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2 Universal character names [lex.universal.char]")*s* ([[lex.charset]](lex.charset "5.3.1 Character sets"))
is encoded to a code unit sequence
using the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s associated character encoding[.](#10.1.sentence-1)
  If a character lacks representation in the associated character encoding,
then the program is ill-formed[.](#10.1.sentence-2)
  [*Note [6](#note-6)*:
  No character lacks representation in any Unicode encoding form[.](#10.1.sentence-3)
 â *end note*]
   When encoding a stateful character encoding,
implementations should encode the first such sequence
beginning with the initial encoding state and
encode subsequent sequences
beginning with the final encoding state of the prior sequence[.](#10.1.sentence-4)
  [*Note [7](#note-7)*:
  The encoded code unit sequence can differ from
the sequence of code units that would be obtained by
encoding each character independently[.](#10.1.sentence-5)
 â *end note*]

- [(10.2)](#10.2)

  Each [*numeric-escape-sequence*](lex.ccon#nt:numeric-escape-sequence "5.13.3 Character literals [lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3 Character literals"))
contributes a single code unit with a value as follows:
  * [(10.2.1)](#10.2.1)

      Let v be the integer value represented by
the octal number comprising
the sequence of [*octal-digit*](lex.icon#nt:octal-digit "5.13.2 Integer literals [lex.icon]")*s* in
an [*octal-escape-sequence*](lex.ccon#nt:octal-escape-sequence "5.13.3 Character literals [lex.ccon]") or by
the hexadecimal number comprising
the sequence of [*hexadecimal-digit*](lex.icon#nt:hexadecimal-digit "5.13.2 Integer literals [lex.icon]")*s* in
a [*hexadecimal-escape-sequence*](lex.ccon#nt:hexadecimal-escape-sequence "5.13.3 Character literals [lex.ccon]")[.](#10.2.1.sentence-1)

  * [(10.2.2)](#10.2.2)

      If v does not exceed the range of representable values of
the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type,
then the value is v[.](#10.2.2.sentence-1)

  * [(10.2.3)](#10.2.3)

      Otherwise,
if the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]") is absent or L, andv does not exceed the range of representable values of
the corresponding unsigned type for the underlying type of
the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type,
then the value is the unique value of
the [*string-literal*](#nt:string-literal "5.13.5 String literals [lex.string]")'s array element type T that is congruent to v modulo 2N, where N is the width of T[.](#10.2.3.sentence-1)

  * [(10.2.4)](#10.2.4)

      Otherwise, the program is ill-formed[.](#10.2.4.sentence-1)

   When encoding a stateful character encoding,
these sequences should have no effect on encoding state[.](#10.2.sentence-2)

- [(10.3)](#10.3)

  Each [*conditional-escape-sequence*](lex.ccon#nt:conditional-escape-sequence "5.13.3 Character literals [lex.ccon]") ([[lex.ccon]](lex.ccon "5.13.3 Character literals"))
contributes animplementation-defined
code unit sequence[.](#10.3.sentence-1)
  When encoding a stateful character encoding,
it isimplementation-defined
what effect these sequences have on encoding state[.](#10.3.sentence-2)