Files
cppdraft_translate/cppdraft/lex/ccon.md
2025-10-25 03:02:53 +03:00

187 lines
13 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[lex.ccon]
# 5 Lexical conventions [[lex]](./#lex)
## 5.13 Literals [[lex.literal]](lex.literal#lex.ccon)
### 5.13.3 Character literals [lex.ccon]
[character-literal:](#nt:character-literal "5.13.3Character literals[lex.ccon]")
[*encoding-prefix*](#nt:encoding-prefix "5.13.3Character literals[lex.ccon]")opt ' [*c-char-sequence*](#nt:c-char-sequence "5.13.3Character literals[lex.ccon]") '
[encoding-prefix:](#nt:encoding-prefix "5.13.3Character literals[lex.ccon]") one of
u8 u U L
[c-char-sequence:](#nt:c-char-sequence "5.13.3Character literals[lex.ccon]")
[*c-char*](#nt:c-char "5.13.3Character literals[lex.ccon]") [*c-char-sequence*](#nt:c-char-sequence "5.13.3Character literals[lex.ccon]")opt
[c-char:](#nt:c-char "5.13.3Character literals[lex.ccon]")
[*basic-c-char*](#nt:basic-c-char "5.13.3Character literals[lex.ccon]")
[*escape-sequence*](#nt:escape-sequence "5.13.3Character literals[lex.ccon]")
[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2Universal character names[lex.universal.char]")
[basic-c-char:](#nt:basic-c-char "5.13.3Character literals[lex.ccon]")
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character
[escape-sequence:](#nt:escape-sequence "5.13.3Character literals[lex.ccon]")
[*simple-escape-sequence*](#nt:simple-escape-sequence "5.13.3Character literals[lex.ccon]")
[*numeric-escape-sequence*](#nt:numeric-escape-sequence "5.13.3Character literals[lex.ccon]")
[*conditional-escape-sequence*](#nt:conditional-escape-sequence "5.13.3Character literals[lex.ccon]")
[simple-escape-sequence:](#nt:simple-escape-sequence "5.13.3Character literals[lex.ccon]")
\ [*simple-escape-sequence-char*](#nt:simple-escape-sequence-char "5.13.3Character literals[lex.ccon]")
[simple-escape-sequence-char:](#nt:simple-escape-sequence-char "5.13.3Character literals[lex.ccon]") one of
' " ? \ a b f n r t v
[numeric-escape-sequence:](#nt:numeric-escape-sequence "5.13.3Character literals[lex.ccon]")
[*octal-escape-sequence*](#nt:octal-escape-sequence "5.13.3Character literals[lex.ccon]")
[*hexadecimal-escape-sequence*](#nt:hexadecimal-escape-sequence "5.13.3Character literals[lex.ccon]")
[simple-octal-digit-sequence:](#nt:simple-octal-digit-sequence "5.13.3Character literals[lex.ccon]")
[*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]") [*simple-octal-digit-sequence*](#nt:simple-octal-digit-sequence "5.13.3Character literals[lex.ccon]")opt
[octal-escape-sequence:](#nt:octal-escape-sequence "5.13.3Character literals[lex.ccon]")
\ [*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]")
\ [*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]") [*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]")
\ [*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]") [*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]") [*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]")
\o{ [*simple-octal-digit-sequence*](#nt:simple-octal-digit-sequence "5.13.3Character literals[lex.ccon]") }
[hexadecimal-escape-sequence:](#nt:hexadecimal-escape-sequence "5.13.3Character literals[lex.ccon]")
\x [*simple-hexadecimal-digit-sequence*](lex.universal.char#nt:simple-hexadecimal-digit-sequence "5.3.2Universal character names[lex.universal.char]")
\x{ [*simple-hexadecimal-digit-sequence*](lex.universal.char#nt:simple-hexadecimal-digit-sequence "5.3.2Universal character names[lex.universal.char]") }
[conditional-escape-sequence:](#nt:conditional-escape-sequence "5.13.3Character literals[lex.ccon]")
\ [*conditional-escape-sequence-char*](#nt:conditional-escape-sequence-char "5.13.3Character literals[lex.ccon]")
[conditional-escape-sequence-char:](#nt:conditional-escape-sequence-char "5.13.3Character literals[lex.ccon]")
any member of the basic character set that is not an [*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]"), a [*simple-escape-sequence-char*](#nt:simple-escape-sequence-char "5.13.3Character literals[lex.ccon]"), or the characters N, o, u, U, or x
[1](#1)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1505)
A [*multicharacter literal*](#def:literal,multicharacter "5.13.3Character literals[lex.ccon]") is a [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]") whose [*c-char-sequence*](#nt:c-char-sequence "5.13.3Character literals[lex.ccon]") consists of
more than one [*c-char*](#nt:c-char "5.13.3Character literals[lex.ccon]")[.](#1.sentence-1)
A multicharacter literal shall not have an [*encoding-prefix*](#nt:encoding-prefix "5.13.3Character literals[lex.ccon]")[.](#1.sentence-2)
If a multicharacter literal contains a [*c-char*](#nt:c-char "5.13.3Character literals[lex.ccon]") that is not encodable as a single code unit in the ordinary literal encoding,
the program is ill-formed[.](#1.sentence-3)
Multicharacter literals are conditionally-supported[.](#1.sentence-4)
[2](#2)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1525)
The kind of a [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]"),
its type, and its associated character encoding ([[lex.charset]](lex.charset "5.3.1Character sets"))
are determined by
its [*encoding-prefix*](#nt:encoding-prefix "5.13.3Character literals[lex.ccon]") and its [*c-char-sequence*](#nt:c-char-sequence "5.13.3Character literals[lex.ccon]") as defined by Table [9](#tab:lex.ccon.literal "Table 9: Character literals")[.](#2.sentence-1)
Table [9](#tab:lex.ccon.literal) — Character literals [[tab:lex.ccon.literal]](./tab:lex.ccon.literal)
| [🔗](#tab:lex.ccon.literal-row-1)<br>**Encoding** | **Kind** | **Type** | **Associated char-** | **Example** |
| --- | --- | --- | --- | --- |
| [🔗](#tab:lex.ccon.literal-row-2)<br>**prefix** | | | **acter encoding** | |
| [🔗](#tab:lex.ccon.literal-row-3)<br>none | [*ordinary character literal*](#def:literal,character,ordinary "5.13.3Character literals[lex.ccon]") | char | ordinary literal | 'v' |
| [🔗](#tab:lex.ccon.literal-row-4)<br> | multicharacter literal | int | encoding | 'abcd' |
| [🔗](#tab:lex.ccon.literal-row-5)<br>L | [*wide character literal*](#def:literal,character,wide "5.13.3Character literals[lex.ccon]") | wchar_t | wide literal | L'w' |
| [🔗](#tab:lex.ccon.literal-row-6) | | | encoding | |
| [🔗](#tab:lex.ccon.literal-row-7)<br>u8 | [*UTF-8 character literal*](#def:literal,character,UTF-8 "5.13.3Character literals[lex.ccon]") | char8_t | UTF-8 | u8'x' |
| [🔗](#tab:lex.ccon.literal-row-8)<br>u | [*UTF-16 character literal*](#def:literal,character,UTF-16 "5.13.3Character literals[lex.ccon]") | char16_t | UTF-16 | u'y' |
| [🔗](#tab:lex.ccon.literal-row-9)<br>U | [*UTF-32 character literal*](#def:literal,character,UTF-32 "5.13.3Character literals[lex.ccon]") | char32_t | UTF-32 | U'z' |
[3](#3)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1571)
In translation phase 4,
the value of a [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]") is determined
using the range of representable values
of the [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]")'s type in translation phase 7[.](#3.sentence-1)
A multicharacter literal has animplementation-defined
value[.](#3.sentence-2)
The value of any other kind of [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]") is determined as follows:
- [(3.1)](#3.1)
A [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]") with
a [*c-char-sequence*](#nt:c-char-sequence "5.13.3Character literals[lex.ccon]") consisting of a single[*basic-c-char*](#nt:basic-c-char "5.13.3Character literals[lex.ccon]"),[*simple-escape-sequence*](#nt:simple-escape-sequence "5.13.3Character literals[lex.ccon]"), or[*universal-character-name*](lex.universal.char#nt:universal-character-name "5.3.2Universal character names[lex.universal.char]") is the code unit value of the specified character
as encoded in the literal's associated character encoding[.](#3.1.sentence-1)
If the specified character lacks
representation in the literal's associated character encoding or
if it cannot be encoded as a single code unit,
then the program is ill-formed[.](#3.1.sentence-2)
- [(3.2)](#3.2)
A [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]") with
a [*c-char-sequence*](#nt:c-char-sequence "5.13.3Character literals[lex.ccon]") consisting of
a single [*numeric-escape-sequence*](#nt:numeric-escape-sequence "5.13.3Character literals[lex.ccon]") has a value as follows:
* [(3.2.1)](#3.2.1)
Let v be the integer value represented by
the octal number comprising
the sequence of [*octal-digit*](lex.icon#nt:octal-digit "5.13.2Integer literals[lex.icon]")*s* in
an [*octal-escape-sequence*](#nt:octal-escape-sequence "5.13.3Character literals[lex.ccon]") or by
the hexadecimal number comprising
the sequence of [*hexadecimal-digit*](lex.icon#nt:hexadecimal-digit "5.13.2Integer literals[lex.icon]")*s* in
a [*hexadecimal-escape-sequence*](#nt:hexadecimal-escape-sequence "5.13.3Character literals[lex.ccon]")[.](#3.2.1.sentence-1)
* [(3.2.2)](#3.2.2)
If v does not exceed
the range of representable values of the [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]")'s type,
then the value is v[.](#3.2.2.sentence-1)
* [(3.2.3)](#3.2.3)
Otherwise,
if the [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]")'s [*encoding-prefix*](#nt:encoding-prefix "5.13.3Character literals[lex.ccon]") is absent or L, andv does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]")'s type,
then the value is the unique value of the [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]")'s type T that is congruent to v modulo 2N, where N is the width of T[.](#3.2.3.sentence-1)
* [(3.2.4)](#3.2.4)
Otherwise, the program is ill-formed[.](#3.2.4.sentence-1)
- [(3.3)](#3.3)
A [*character-literal*](#nt:character-literal "5.13.3Character literals[lex.ccon]") with
a [*c-char-sequence*](#nt:c-char-sequence "5.13.3Character literals[lex.ccon]") consisting of
a single [*conditional-escape-sequence*](#nt:conditional-escape-sequence "5.13.3Character literals[lex.ccon]") is conditionally-supported and
has an implementation-defined value[.](#3.3.sentence-1)
[4](#4)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L1629)
The character specified by a [*simple-escape-sequence*](#nt:simple-escape-sequence "5.13.3Character literals[lex.ccon]") is specified in Table [10](#tab:lex.ccon.esc "Table 10: Simple escape sequences")[.](#4.sentence-1)
[*Note [1](#note-1)*:
Using an escape sequence for a question mark
is supported for compatibility with C++ 2014 and C[.](#4.sentence-2)
— *end note*]
Table [10](#tab:lex.ccon.esc) — Simple escape sequences [[tab:lex.ccon.esc]](./tab:lex.ccon.esc)
| [🔗](#tab:lex.ccon.esc-row-1)<br>**character** | | **[*simple-escape-sequence*](#nt:simple-escape-sequence "5.13.3Character literals[lex.ccon]")** |
| --- | --- | --- |
| [🔗](#tab:lex.ccon.esc-row-2)<br>U+000a | line feed | \n |
| [🔗](#tab:lex.ccon.esc-row-3)<br>U+0009 | character tabulation | \t |
| [🔗](#tab:lex.ccon.esc-row-4)<br>U+000b | line tabulation | \v |
| [🔗](#tab:lex.ccon.esc-row-5)<br>U+0008 | backspace | \b |
| [🔗](#tab:lex.ccon.esc-row-6)<br>U+000d | carriage return | \r |
| [🔗](#tab:lex.ccon.esc-row-7)<br>U+000c | form feed | \f |
| [🔗](#tab:lex.ccon.esc-row-8)<br>U+0007 | alert | \a |
| [🔗](#tab:lex.ccon.esc-row-9)<br>U+005c | reverse solidus | \\ |
| [🔗](#tab:lex.ccon.esc-row-10)<br>U+003f | question mark | \? |
| [🔗](#tab:lex.ccon.esc-row-11)<br>U+0027 | apostrophe | \' |
| [🔗](#tab:lex.ccon.esc-row-12)<br>U+0022 | quotation mark | \" |