Files
2025-10-25 03:02:53 +03:00

179 lines
8.2 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[lex.pptoken]
# 5 Lexical conventions [[lex]](./#lex)
## 5.5 Preprocessing tokens [lex.pptoken]
[preprocessing-token:](#nt:preprocessing-token "5.5Preprocessing tokens[lex.pptoken]")
[*header-name*](lex.header#nt:header-name "5.6Header names[lex.header]")
import-keyword
module-keyword
export-keyword
[*identifier*](lex.name#nt:identifier "5.11Identifiers[lex.name]")
[*pp-number*](lex.ppnumber#nt:pp-number "5.7Preprocessing numbers[lex.ppnumber]")
[*character-literal*](lex.ccon#nt:character-literal "5.13.3Character literals[lex.ccon]")
[*user-defined-character-literal*](lex.ext#nt:user-defined-character-literal "5.13.9User-defined literals[lex.ext]")
[*string-literal*](lex.string#nt:string-literal "5.13.5String literals[lex.string]")
[*user-defined-string-literal*](lex.ext#nt:user-defined-string-literal "5.13.9User-defined literals[lex.ext]")
[*preprocessing-op-or-punc*](lex.operators#nt:preprocessing-op-or-punc "5.8Operators and punctuators[lex.operators]")
each non-whitespace character that cannot be one of the above
[1](#1)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L549)
A preprocessing token is the minimal lexical element of the language in translation
phases 3 through 6[.](#1.sentence-1)
In this document,
glyphs are used to identify
elements of the basic character set ([[lex.charset]](lex.charset "5.3.1Character sets"))[.](#1.sentence-2)
The categories of preprocessing token are: header names,
placeholder tokens produced by preprocessing import and module directives
(*import-keyword*, *module-keyword*, and *export-keyword*),
identifiers, preprocessing numbers, character literals (including user-defined character
literals), string literals (including user-defined string literals), preprocessing
operators and punctuators, and single non-whitespace characters that do not lexically
match the other preprocessing token categories[.](#1.sentence-3)
If a U+0027 apostrophe or a U+0022 quotation mark character
matches the last category, the program is ill-formed[.](#1.sentence-4)
If any character not in the basic character set matches the last category,
the program is ill-formed[.](#1.sentence-5)
Preprocessing tokens can be separated bywhitespace;this consists of comments ([[lex.comment]](lex.comment "5.4Comments")), or whitespace characters
(U+0020 space,U+0009 character tabulation,
new-line,U+000b line tabulation, andU+000c form feed), or both[.](#1.sentence-6)
As described in [[cpp]](cpp "15Preprocessing directives"), in certain
circumstances during translation phase 4, whitespace (or the absence
thereof) serves as more than preprocessing token separation[.](#1.sentence-7)
Whitespace
can appear within a preprocessing token only as part of a header name or
between the quotation characters in a character literal or
string literal[.](#1.sentence-8)
[2](#2)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L583)
Each preprocessing token that is converted to a token ([[lex.token]](lex.token "5.10Tokens"))
shall have the lexical form of a keyword, an identifier, a literal,
or an operator or punctuator[.](#2.sentence-1)
[3](#3)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L588)
The *import-keyword* is produced
by processing an import directive ([[cpp.import]](cpp.import "15.6Header unit importation")),
the *module-keyword* is produced
by preprocessing a module directive ([[cpp.module]](cpp.module "15.5Module directive")), and
the *export-keyword* is produced
by preprocessing either of the previous two directives[.](#3.sentence-1)
[*Note [1](#note-1)*:
None has any observable spelling[.](#3.sentence-2)
— *end note*]
[4](#4)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L599)
If the input stream has been parsed into preprocessing tokens up to a
given character:
- [(4.1)](#4.1)
If the next character begins a sequence of characters that could be the prefix
and initial double quote of a raw string literal, such as R", the next preprocessing
token shall be a raw string literal[.](#4.1.sentence-1)
Between the initial and final
double quote characters of the raw string, any transformations performed in phase
2 (line splicing) are reverted; this reversion
shall apply before any [*d-char*](lex.string#nt:d-char "5.13.5String literals[lex.string]"), [*r-char*](lex.string#nt:r-char "5.13.5String literals[lex.string]"), or delimiting
parenthesis is identified[.](#4.1.sentence-2)
The raw string literal is defined as the shortest sequence
of characters that matches the raw-string pattern
[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3Character literals[lex.ccon]")opt R [*raw-string*](lex.string#nt:raw-string "5.13.5String literals[lex.string]")
- [(4.2)](#4.2)
Otherwise, if the next three characters are <:: and the subsequent character
is neither : nor >, the < is treated as a preprocessing token by
itself and not as the first character of the alternative token <:[.](#4.2.sentence-1)
- [(4.3)](#4.3)
Otherwise, if the next three characters are [:: and
the subsequent character is not :, or
if the next three characters are [:>,
the [ is treated as a preprocessing token by itself and
not as the first character of the preprocessing token [:[.](#4.3.sentence-1)
[*Note [2](#note-2)*:
The tokens [: and :] cannot be composed from digraphs[.](#4.3.sentence-2)
— *end note*]
- [(4.4)](#4.4)
Otherwise,
the next preprocessing token is the longest sequence of
characters that could constitute a preprocessing token, even if that
would cause further lexical analysis to fail,
except that
* [(4.4.1)](#4.4.1)
a [*string-literal*](lex.string#nt:string-literal "5.13.5String literals[lex.string]") token is never formed
when a [*header-name*](lex.header#nt:header-name "5.6Header names[lex.header]") token can be formed, and
* [(4.4.2)](#4.4.2)
a [*header-name*](lex.header#nt:header-name "5.6Header names[lex.header]") ([[lex.header]](lex.header "5.6Header names")) is only formed
+
[(4.4.2.1)](#4.4.2.1)
immediately after the include, embed, or import preprocessing token in a#include ([[cpp.include]](cpp.include "15.3Source file inclusion")), #embed ([[cpp.embed]](cpp.embed "15.4Resource inclusion")), orimport ([[cpp.import]](cpp.import "15.6Header unit importation")) directive, respectively, or
+
[(4.4.2.2)](#4.4.2.2)
immediately after a preprocessing token sequence of __has_include or __has_embed immediately followed by ( in a #if, #elif, or #embed directive ([[cpp.cond]](cpp.cond "15.2Conditional inclusion"), [[cpp.embed]](cpp.embed "15.4Resource inclusion"))[.](#4.4.sentence-1)
[5](#5)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L655)
[*Example [1](#example-1)*: #define R "x"const char* s = R"y"; // ill-formed raw string, not "x" "y" — *end example*]
[6](#6)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L663)
[*Example [2](#example-2)*:
The program fragment 0xe+foo is parsed as a
preprocessing number token (one that is not a valid[*integer-literal*](lex.icon#nt:integer-literal "5.13.2Integer literals[lex.icon]") or [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4Floating-point literals[lex.fcon]") token),
even though a parse as three preprocessing tokens0xe, +, and foo can produce a valid expression (for example,
if foo is a macro defined as 1)[.](#6.sentence-1)
Similarly, the
program fragment 1E1 is parsed as a preprocessing number (one
that is a valid [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4Floating-point literals[lex.fcon]") token),
whether or not E is a macro name[.](#6.sentence-2)
— *end example*]
[7](#7)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L676)
[*Example [3](#example-3)*:
The program fragment x+++++y is parsed as x++ ++ + y, which, if x and y have integral types,
violates a constraint on increment operators, even though the parsex ++ + ++ y can yield a correct expression[.](#7.sentence-1)
— *end example*]