179 lines
8.2 KiB
Markdown
179 lines
8.2 KiB
Markdown
[lex.pptoken]
|
||
|
||
# 5 Lexical conventions [[lex]](./#lex)
|
||
|
||
## 5.5 Preprocessing tokens [lex.pptoken]
|
||
|
||
[preprocessing-token:](#nt:preprocessing-token "5.5 Preprocessing tokens [lex.pptoken]")
|
||
[*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]")
|
||
import-keyword
|
||
module-keyword
|
||
export-keyword
|
||
[*identifier*](lex.name#nt:identifier "5.11 Identifiers [lex.name]")
|
||
[*pp-number*](lex.ppnumber#nt:pp-number "5.7 Preprocessing numbers [lex.ppnumber]")
|
||
[*character-literal*](lex.ccon#nt:character-literal "5.13.3 Character literals [lex.ccon]")
|
||
[*user-defined-character-literal*](lex.ext#nt:user-defined-character-literal "5.13.9 User-defined literals [lex.ext]")
|
||
[*string-literal*](lex.string#nt:string-literal "5.13.5 String literals [lex.string]")
|
||
[*user-defined-string-literal*](lex.ext#nt:user-defined-string-literal "5.13.9 User-defined literals [lex.ext]")
|
||
[*preprocessing-op-or-punc*](lex.operators#nt:preprocessing-op-or-punc "5.8 Operators and punctuators [lex.operators]")
|
||
each non-whitespace character that cannot be one of the above
|
||
|
||
[1](#1)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L549)
|
||
|
||
A preprocessing token is the minimal lexical element of the language in translation
|
||
phases 3 through 6[.](#1.sentence-1)
|
||
|
||
In this document,
|
||
glyphs are used to identify
|
||
elements of the basic character set ([[lex.charset]](lex.charset "5.3.1 Character sets"))[.](#1.sentence-2)
|
||
|
||
The categories of preprocessing token are: header names,
|
||
placeholder tokens produced by preprocessing import and module directives
|
||
(*import-keyword*, *module-keyword*, and *export-keyword*),
|
||
identifiers, preprocessing numbers, character literals (including user-defined character
|
||
literals), string literals (including user-defined string literals), preprocessing
|
||
operators and punctuators, and single non-whitespace characters that do not lexically
|
||
match the other preprocessing token categories[.](#1.sentence-3)
|
||
|
||
If a U+0027 apostrophe or a U+0022 quotation mark character
|
||
matches the last category, the program is ill-formed[.](#1.sentence-4)
|
||
|
||
If any character not in the basic character set matches the last category,
|
||
the program is ill-formed[.](#1.sentence-5)
|
||
|
||
Preprocessing tokens can be separated bywhitespace;this consists of comments ([[lex.comment]](lex.comment "5.4 Comments")), or whitespace characters
|
||
(U+0020 space,U+0009 character tabulation,
|
||
new-line,U+000b line tabulation, andU+000c form feed), or both[.](#1.sentence-6)
|
||
|
||
As described in [[cpp]](cpp "15 Preprocessing directives"), in certain
|
||
circumstances during translation phase 4, whitespace (or the absence
|
||
thereof) serves as more than preprocessing token separation[.](#1.sentence-7)
|
||
|
||
Whitespace
|
||
can appear within a preprocessing token only as part of a header name or
|
||
between the quotation characters in a character literal or
|
||
string literal[.](#1.sentence-8)
|
||
|
||
[2](#2)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L583)
|
||
|
||
Each preprocessing token that is converted to a token ([[lex.token]](lex.token "5.10 Tokens"))
|
||
shall have the lexical form of a keyword, an identifier, a literal,
|
||
or an operator or punctuator[.](#2.sentence-1)
|
||
|
||
[3](#3)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L588)
|
||
|
||
The *import-keyword* is produced
|
||
by processing an import directive ([[cpp.import]](cpp.import "15.6 Header unit importation")),
|
||
the *module-keyword* is produced
|
||
by preprocessing a module directive ([[cpp.module]](cpp.module "15.5 Module directive")), and
|
||
the *export-keyword* is produced
|
||
by preprocessing either of the previous two directives[.](#3.sentence-1)
|
||
|
||
[*Note [1](#note-1)*:
|
||
|
||
None has any observable spelling[.](#3.sentence-2)
|
||
|
||
â *end note*]
|
||
|
||
[4](#4)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L599)
|
||
|
||
If the input stream has been parsed into preprocessing tokens up to a
|
||
given character:
|
||
|
||
- [(4.1)](#4.1)
|
||
|
||
If the next character begins a sequence of characters that could be the prefix
|
||
and initial double quote of a raw string literal, such as R", the next preprocessing
|
||
token shall be a raw string literal[.](#4.1.sentence-1)
|
||
Between the initial and final
|
||
double quote characters of the raw string, any transformations performed in phase
|
||
2 (line splicing) are reverted; this reversion
|
||
shall apply before any [*d-char*](lex.string#nt:d-char "5.13.5 String literals [lex.string]"), [*r-char*](lex.string#nt:r-char "5.13.5 String literals [lex.string]"), or delimiting
|
||
parenthesis is identified[.](#4.1.sentence-2)
|
||
The raw string literal is defined as the shortest sequence
|
||
of characters that matches the raw-string pattern
|
||
[*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt R [*raw-string*](lex.string#nt:raw-string "5.13.5 String literals [lex.string]")
|
||
|
||
- [(4.2)](#4.2)
|
||
|
||
Otherwise, if the next three characters are <:: and the subsequent character
|
||
is neither : nor >, the < is treated as a preprocessing token by
|
||
itself and not as the first character of the alternative token <:[.](#4.2.sentence-1)
|
||
|
||
- [(4.3)](#4.3)
|
||
|
||
Otherwise, if the next three characters are [:: and
|
||
the subsequent character is not :, or
|
||
if the next three characters are [:>,
|
||
the [ is treated as a preprocessing token by itself and
|
||
not as the first character of the preprocessing token [:[.](#4.3.sentence-1)
|
||
[*Note [2](#note-2)*:
|
||
The tokens [: and :] cannot be composed from digraphs[.](#4.3.sentence-2)
|
||
â *end note*]
|
||
|
||
- [(4.4)](#4.4)
|
||
|
||
Otherwise,
|
||
the next preprocessing token is the longest sequence of
|
||
characters that could constitute a preprocessing token, even if that
|
||
would cause further lexical analysis to fail,
|
||
except that
|
||
* [(4.4.1)](#4.4.1)
|
||
|
||
a [*string-literal*](lex.string#nt:string-literal "5.13.5 String literals [lex.string]") token is never formed
|
||
when a [*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]") token can be formed, and
|
||
|
||
* [(4.4.2)](#4.4.2)
|
||
|
||
a [*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]") ([[lex.header]](lex.header "5.6 Header names")) is only formed
|
||
+
|
||
[(4.4.2.1)](#4.4.2.1)
|
||
immediately after the include, embed, or import preprocessing token in a#include ([[cpp.include]](cpp.include "15.3 Source file inclusion")), #embed ([[cpp.embed]](cpp.embed "15.4 Resource inclusion")), orimport ([[cpp.import]](cpp.import "15.6 Header unit importation")) directive, respectively, or
|
||
|
||
+
|
||
[(4.4.2.2)](#4.4.2.2)
|
||
immediately after a preprocessing token sequence of __has_include or __has_embed immediately followed by ( in a #if, #elif, or #embed directive ([[cpp.cond]](cpp.cond "15.2 Conditional inclusion"), [[cpp.embed]](cpp.embed "15.4 Resource inclusion"))[.](#4.4.sentence-1)
|
||
|
||
[5](#5)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L655)
|
||
|
||
[*Example [1](#example-1)*: #define R "x"const char* s = R"y"; // ill-formed raw string, not "x" "y" â *end example*]
|
||
|
||
[6](#6)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L663)
|
||
|
||
[*Example [2](#example-2)*:
|
||
|
||
The program fragment 0xe+foo is parsed as a
|
||
preprocessing number token (one that is not a valid[*integer-literal*](lex.icon#nt:integer-literal "5.13.2 Integer literals [lex.icon]") or [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4 Floating-point literals [lex.fcon]") token),
|
||
even though a parse as three preprocessing tokens0xe, +, and foo can produce a valid expression (for example,
|
||
if foo is a macro defined as 1)[.](#6.sentence-1)
|
||
|
||
Similarly, the
|
||
program fragment 1E1 is parsed as a preprocessing number (one
|
||
that is a valid [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4 Floating-point literals [lex.fcon]") token),
|
||
whether or not E is a macro name[.](#6.sentence-2)
|
||
|
||
â *end example*]
|
||
|
||
[7](#7)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L676)
|
||
|
||
[*Example [3](#example-3)*:
|
||
|
||
The program fragment x+++++y is parsed as x++ ++ + y, which, if x and y have integral types,
|
||
violates a constraint on increment operators, even though the parsex ++ + ++ y can yield a correct expression[.](#7.sentence-1)
|
||
|
||
â *end example*]
|