Init

2025-10-25 03:02:53 +03:00
commit 043225d523
3416 changed files with 681196 additions and 0 deletions
--- a/cppdraft/lex/pptoken.md
+++ b/cppdraft/lex/pptoken.md
@@ -0,0 +1,178 @@
+[lex.pptoken]
+
+# 5 Lexical conventions [[lex]](./#lex)
+
+## 5.5 Preprocessing tokens [lex.pptoken]
+
+[preprocessing-token:](#nt:preprocessing-token "5.5 Preprocessing tokens [lex.pptoken]")  
+[*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]")  
+import-keyword  
+module-keyword  
+export-keyword  
+[*identifier*](lex.name#nt:identifier "5.11 Identifiers [lex.name]")  
+[*pp-number*](lex.ppnumber#nt:pp-number "5.7 Preprocessing numbers [lex.ppnumber]")  
+[*character-literal*](lex.ccon#nt:character-literal "5.13.3 Character literals [lex.ccon]")  
+[*user-defined-character-literal*](lex.ext#nt:user-defined-character-literal "5.13.9 User-defined literals [lex.ext]")  
+[*string-literal*](lex.string#nt:string-literal "5.13.5 String literals [lex.string]")  
+[*user-defined-string-literal*](lex.ext#nt:user-defined-string-literal "5.13.9 User-defined literals [lex.ext]")  
+[*preprocessing-op-or-punc*](lex.operators#nt:preprocessing-op-or-punc "5.8 Operators and punctuators [lex.operators]")  
+each non-whitespace character that cannot be one of the above
+
+[1](#1)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L549)
+
+A preprocessing token is the minimal lexical element of the language in translation
+phases 3 through 6[.](#1.sentence-1)
+
+In this document,
+glyphs are used to identify
+elements of the basic character set ([[lex.charset]](lex.charset "5.3.1 Character sets"))[.](#1.sentence-2)
+
+The categories of preprocessing token are: header names,
+placeholder tokens produced by preprocessing import and module directives
+(*import-keyword*, *module-keyword*, and *export-keyword*),
+identifiers, preprocessing numbers, character literals (including user-defined character
+literals), string literals (including user-defined string literals), preprocessing
+operators and punctuators, and single non-whitespace characters that do not lexically
+match the other preprocessing token categories[.](#1.sentence-3)
+
+If a U+0027 apostrophe or a U+0022 quotation mark character
+matches the last category, the program is ill-formed[.](#1.sentence-4)
+
+If any character not in the basic character set matches the last category,
+the program is ill-formed[.](#1.sentence-5)
+
+Preprocessing tokens can be separated bywhitespace;this consists of comments ([[lex.comment]](lex.comment "5.4 Comments")), or whitespace characters
+(U+0020 space,U+0009 character tabulation,
+new-line,U+000b line tabulation, andU+000c form feed), or both[.](#1.sentence-6)
+
+As described in [[cpp]](cpp "15 Preprocessing directives"), in certain
+circumstances during translation phase 4, whitespace (or the absence
+thereof) serves as more than preprocessing token separation[.](#1.sentence-7)
+
+Whitespace
+can appear within a preprocessing token only as part of a header name or
+between the quotation characters in a character literal or
+string literal[.](#1.sentence-8)
+
+[2](#2)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L583)
+
+Each preprocessing token that is converted to a token ([[lex.token]](lex.token "5.10 Tokens"))
+shall have the lexical form of a keyword, an identifier, a literal,
+or an operator or punctuator[.](#2.sentence-1)
+
+[3](#3)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L588)
+
+The *import-keyword* is produced
+by processing an import directive ([[cpp.import]](cpp.import "15.6 Header unit importation")),
+the *module-keyword* is produced
+by preprocessing a module directive ([[cpp.module]](cpp.module "15.5 Module directive")), and
+the *export-keyword* is produced
+by preprocessing either of the previous two directives[.](#3.sentence-1)
+
+[*Note [1](#note-1)*:
+
+None has any observable spelling[.](#3.sentence-2)
+
+â *end note*]
+
+[4](#4)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L599)
+
+If the input stream has been parsed into preprocessing tokens up to a
+given character:
+
+- [(4.1)](#4.1)
+
+  If the next character begins a sequence of characters that could be the prefix
+and initial double quote of a raw string literal, such as R", the next preprocessing
+token shall be a raw string literal[.](#4.1.sentence-1)
+  Between the initial and final
+double quote characters of the raw string, any transformations performed in phase
+2 (line splicing) are reverted; this reversion
+shall apply before any [*d-char*](lex.string#nt:d-char "5.13.5 String literals [lex.string]"), [*r-char*](lex.string#nt:r-char "5.13.5 String literals [lex.string]"), or delimiting
+parenthesis is identified[.](#4.1.sentence-2)
+  The raw string literal is defined as the shortest sequence
+of characters that matches the raw-string pattern
+  [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt R [*raw-string*](lex.string#nt:raw-string "5.13.5 String literals [lex.string]")
+
+- [(4.2)](#4.2)
+
+  Otherwise, if the next three characters are <:: and the subsequent character
+is neither : nor >, the < is treated as a preprocessing token by
+itself and not as the first character of the alternative token <:[.](#4.2.sentence-1)
+
+- [(4.3)](#4.3)
+
+  Otherwise, if the next three characters are [:: and
+the subsequent character is not :, or
+if the next three characters are [:>,
+the [ is treated as a preprocessing token by itself and
+not as the first character of the preprocessing token [:[.](#4.3.sentence-1)
+  [*Note [2](#note-2)*:
+  The tokens [: and :] cannot be composed from digraphs[.](#4.3.sentence-2)
+ â *end note*]
+
+- [(4.4)](#4.4)
+
+  Otherwise,
+the next preprocessing token is the longest sequence of
+characters that could constitute a preprocessing token, even if that
+would cause further lexical analysis to fail,
+except that
+  * [(4.4.1)](#4.4.1)
+
+a [*string-literal*](lex.string#nt:string-literal "5.13.5 String literals [lex.string]") token is never formed
+when a [*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]") token can be formed, and
+
+  * [(4.4.2)](#4.4.2)
+
+a [*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]") ([[lex.header]](lex.header "5.6 Header names")) is only formed
+    +
+          [(4.4.2.1)](#4.4.2.1)
+immediately after the include, embed, or import preprocessing token in a#include ([[cpp.include]](cpp.include "15.3 Source file inclusion")), #embed ([[cpp.embed]](cpp.embed "15.4 Resource inclusion")), orimport ([[cpp.import]](cpp.import "15.6 Header unit importation")) directive, respectively, or
+
+    +
+          [(4.4.2.2)](#4.4.2.2)
+immediately after a preprocessing token sequence of __has_include or __has_embed immediately followed by ( in a #if, #elif, or #embed directive ([[cpp.cond]](cpp.cond "15.2 Conditional inclusion"), [[cpp.embed]](cpp.embed "15.4 Resource inclusion"))[.](#4.4.sentence-1)
+
+[5](#5)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L655)
+
+[*Example [1](#example-1)*: #define R "x"const char* s = R"y"; // ill-formed raw string, not "x" "y" â *end example*]
+
+[6](#6)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L663)
+
+[*Example [2](#example-2)*:
+
+The program fragment 0xe+foo is parsed as a
+preprocessing number token (one that is not a valid[*integer-literal*](lex.icon#nt:integer-literal "5.13.2 Integer literals [lex.icon]") or [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4 Floating-point literals [lex.fcon]") token),
+even though a parse as three preprocessing tokens0xe, +, and foo can produce a valid expression (for example,
+if foo is a macro defined as 1)[.](#6.sentence-1)
+
+Similarly, the
+program fragment 1E1 is parsed as a preprocessing number (one
+that is a valid [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4 Floating-point literals [lex.fcon]") token),
+whether or not E is a macro name[.](#6.sentence-2)
+
+â *end example*]
+
+[7](#7)
+
+[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L676)
+
+[*Example [3](#example-3)*:
+
+The program fragment x+++++y is parsed as x++ ++ + y, which, if x and y have integral types,
+violates a constraint on increment operators, even though the parsex ++ + ++ y can yield a correct expression[.](#7.sentence-1)
+
+â *end example*]