This commit is contained in:
2025-10-25 03:02:53 +03:00
commit 043225d523
3416 changed files with 681196 additions and 0 deletions

178
cppdraft/lex/pptoken.md Normal file
View File

@@ -0,0 +1,178 @@
[lex.pptoken]
# 5 Lexical conventions [[lex]](./#lex)
## 5.5 Preprocessing tokens [lex.pptoken]
[preprocessing-token:](#nt:preprocessing-token "5.5Preprocessing tokens[lex.pptoken]")
[*header-name*](lex.header#nt:header-name "5.6Header names[lex.header]")
import-keyword
module-keyword
export-keyword
[*identifier*](lex.name#nt:identifier "5.11Identifiers[lex.name]")
[*pp-number*](lex.ppnumber#nt:pp-number "5.7Preprocessing numbers[lex.ppnumber]")
[*character-literal*](lex.ccon#nt:character-literal "5.13.3Character literals[lex.ccon]")
[*user-defined-character-literal*](lex.ext#nt:user-defined-character-literal "5.13.9User-defined literals[lex.ext]")
[*string-literal*](lex.string#nt:string-literal "5.13.5String literals[lex.string]")
[*user-defined-string-literal*](lex.ext#nt:user-defined-string-literal "5.13.9User-defined literals[lex.ext]")
[*preprocessing-op-or-punc*](lex.operators#nt:preprocessing-op-or-punc "5.8Operators and punctuators[lex.operators]")
each non-whitespace character that cannot be one of the above
[1](#1)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L549)
A preprocessing token is the minimal lexical element of the language in translation
phases 3 through 6[.](#1.sentence-1)
In this document,
glyphs are used to identify
elements of the basic character set ([[lex.charset]](lex.charset "5.3.1Character sets"))[.](#1.sentence-2)
The categories of preprocessing token are: header names,
placeholder tokens produced by preprocessing import and module directives
(*import-keyword*, *module-keyword*, and *export-keyword*),
identifiers, preprocessing numbers, character literals (including user-defined character
literals), string literals (including user-defined string literals), preprocessing
operators and punctuators, and single non-whitespace characters that do not lexically
match the other preprocessing token categories[.](#1.sentence-3)
If a U+0027 apostrophe or a U+0022 quotation mark character
matches the last category, the program is ill-formed[.](#1.sentence-4)
If any character not in the basic character set matches the last category,
the program is ill-formed[.](#1.sentence-5)
Preprocessing tokens can be separated bywhitespace;this consists of comments ([[lex.comment]](lex.comment "5.4Comments")), or whitespace characters
(U+0020 space,U+0009 character tabulation,
new-line,U+000b line tabulation, andU+000c form feed), or both[.](#1.sentence-6)
As described in [[cpp]](cpp "15Preprocessing directives"), in certain
circumstances during translation phase 4, whitespace (or the absence
thereof) serves as more than preprocessing token separation[.](#1.sentence-7)
Whitespace
can appear within a preprocessing token only as part of a header name or
between the quotation characters in a character literal or
string literal[.](#1.sentence-8)
[2](#2)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L583)
Each preprocessing token that is converted to a token ([[lex.token]](lex.token "5.10Tokens"))
shall have the lexical form of a keyword, an identifier, a literal,
or an operator or punctuator[.](#2.sentence-1)
[3](#3)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L588)
The *import-keyword* is produced
by processing an import directive ([[cpp.import]](cpp.import "15.6Header unit importation")),
the *module-keyword* is produced
by preprocessing a module directive ([[cpp.module]](cpp.module "15.5Module directive")), and
the *export-keyword* is produced
by preprocessing either of the previous two directives[.](#3.sentence-1)
[*Note [1](#note-1)*:
None has any observable spelling[.](#3.sentence-2)
— *end note*]
[4](#4)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L599)
If the input stream has been parsed into preprocessing tokens up to a
given character:
- [(4.1)](#4.1)
If the next character begins a sequence of characters that could be the prefix
and initial double quote of a raw string literal, such as R", the next preprocessing
token shall be a raw string literal[.](#4.1.sentence-1)
Between the initial and final
double quote characters of the raw string, any transformations performed in phase
2 (line splicing) are reverted; this reversion
shall apply before any [*d-char*](lex.string#nt:d-char "5.13.5String literals[lex.string]"), [*r-char*](lex.string#nt:r-char "5.13.5String literals[lex.string]"), or delimiting
parenthesis is identified[.](#4.1.sentence-2)
The raw string literal is defined as the shortest sequence
of characters that matches the raw-string pattern
[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3Character literals[lex.ccon]")opt R [*raw-string*](lex.string#nt:raw-string "5.13.5String literals[lex.string]")
- [(4.2)](#4.2)
Otherwise, if the next three characters are <:: and the subsequent character
is neither : nor >, the < is treated as a preprocessing token by
itself and not as the first character of the alternative token <:[.](#4.2.sentence-1)
- [(4.3)](#4.3)
Otherwise, if the next three characters are [:: and
the subsequent character is not :, or
if the next three characters are [:>,
the [ is treated as a preprocessing token by itself and
not as the first character of the preprocessing token [:[.](#4.3.sentence-1)
[*Note [2](#note-2)*:
The tokens [: and :] cannot be composed from digraphs[.](#4.3.sentence-2)
— *end note*]
- [(4.4)](#4.4)
Otherwise,
the next preprocessing token is the longest sequence of
characters that could constitute a preprocessing token, even if that
would cause further lexical analysis to fail,
except that
* [(4.4.1)](#4.4.1)
a [*string-literal*](lex.string#nt:string-literal "5.13.5String literals[lex.string]") token is never formed
when a [*header-name*](lex.header#nt:header-name "5.6Header names[lex.header]") token can be formed, and
* [(4.4.2)](#4.4.2)
a [*header-name*](lex.header#nt:header-name "5.6Header names[lex.header]") ([[lex.header]](lex.header "5.6Header names")) is only formed
+
[(4.4.2.1)](#4.4.2.1)
immediately after the include, embed, or import preprocessing token in a#include ([[cpp.include]](cpp.include "15.3Source file inclusion")), #embed ([[cpp.embed]](cpp.embed "15.4Resource inclusion")), orimport ([[cpp.import]](cpp.import "15.6Header unit importation")) directive, respectively, or
+
[(4.4.2.2)](#4.4.2.2)
immediately after a preprocessing token sequence of __has_include or __has_embed immediately followed by ( in a #if, #elif, or #embed directive ([[cpp.cond]](cpp.cond "15.2Conditional inclusion"), [[cpp.embed]](cpp.embed "15.4Resource inclusion"))[.](#4.4.sentence-1)
[5](#5)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L655)
[*Example [1](#example-1)*: #define R "x"const char* s = R"y"; // ill-formed raw string, not "x" "y" — *end example*]
[6](#6)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L663)
[*Example [2](#example-2)*:
The program fragment 0xe+foo is parsed as a
preprocessing number token (one that is not a valid[*integer-literal*](lex.icon#nt:integer-literal "5.13.2Integer literals[lex.icon]") or [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4Floating-point literals[lex.fcon]") token),
even though a parse as three preprocessing tokens0xe, +, and foo can produce a valid expression (for example,
if foo is a macro defined as 1)[.](#6.sentence-1)
Similarly, the
program fragment 1E1 is parsed as a preprocessing number (one
that is a valid [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4Floating-point literals[lex.fcon]") token),
whether or not E is a macro name[.](#6.sentence-2)
— *end example*]
[7](#7)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L676)
[*Example [3](#example-3)*:
The program fragment x+++++y is parsed as x++ ++ + y, which, if x and y have integral types,
violates a constraint on increment operators, even though the parsex ++ + ++ y can yield a correct expression[.](#7.sentence-1)
— *end example*]