cppdraft_translate/cppdraft/lex/pptoken.md

[lex.pptoken]

# 5 Lexical conventions [[lex]](./#lex)

## 5.5 Preprocessing tokens [lex.pptoken]

[preprocessing-token:](#nt:preprocessing-token "5.5 Preprocessing tokens [lex.pptoken]")
[*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]")
import-keyword
module-keyword
export-keyword
[*identifier*](lex.name#nt:identifier "5.11 Identifiers [lex.name]")
[*pp-number*](lex.ppnumber#nt:pp-number "5.7 Preprocessing numbers [lex.ppnumber]")
[*character-literal*](lex.ccon#nt:character-literal "5.13.3 Character literals [lex.ccon]")
[*user-defined-character-literal*](lex.ext#nt:user-defined-character-literal "5.13.9 User-defined literals [lex.ext]")
[*string-literal*](lex.string#nt:string-literal "5.13.5 String literals [lex.string]")
[*user-defined-string-literal*](lex.ext#nt:user-defined-string-literal "5.13.9 User-defined literals [lex.ext]")
[*preprocessing-op-or-punc*](lex.operators#nt:preprocessing-op-or-punc "5.8 Operators and punctuators [lex.operators]")
each non-whitespace character that cannot be one of the above

[1](#1)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L549)

A preprocessing token is the minimal lexical element of the language in translation
phases 3 through 6[.](#1.sentence-1)

In this document,
glyphs are used to identify
elements of the basic character set ([[lex.charset]](lex.charset "5.3.1 Character sets"))[.](#1.sentence-2)

The categories of preprocessing token are: header names,
placeholder tokens produced by preprocessing import and module directives
(*import-keyword*, *module-keyword*, and *export-keyword*),
identifiers, preprocessing numbers, character literals (including user-defined character
literals), string literals (including user-defined string literals), preprocessing
operators and punctuators, and single non-whitespace characters that do not lexically
match the other preprocessing token categories[.](#1.sentence-3)

If a U+0027 apostrophe or a U+0022 quotation mark character
matches the last category, the program is ill-formed[.](#1.sentence-4)

If any character not in the basic character set matches the last category,
the program is ill-formed[.](#1.sentence-5)

Preprocessing tokens can be separated bywhitespace;this consists of comments ([[lex.comment]](lex.comment "5.4 Comments")), or whitespace characters
(U+0020 space,U+0009 character tabulation,
new-line,U+000b line tabulation, andU+000c form feed), or both[.](#1.sentence-6)

As described in [[cpp]](cpp "15 Preprocessing directives"), in certain
circumstances during translation phase 4, whitespace (or the absence
thereof) serves as more than preprocessing token separation[.](#1.sentence-7)

Whitespace
can appear within a preprocessing token only as part of a header name or
between the quotation characters in a character literal or
string literal[.](#1.sentence-8)

[2](#2)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L583)

Each preprocessing token that is converted to a token ([[lex.token]](lex.token "5.10 Tokens"))
shall have the lexical form of a keyword, an identifier, a literal,
or an operator or punctuator[.](#2.sentence-1)

[3](#3)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L588)

The *import-keyword* is produced
by processing an import directive ([[cpp.import]](cpp.import "15.6 Header unit importation")),
the *module-keyword* is produced
by preprocessing a module directive ([[cpp.module]](cpp.module "15.5 Module directive")), and
the *export-keyword* is produced
by preprocessing either of the previous two directives[.](#3.sentence-1)

[*Note [1](#note-1)*:

None has any observable spelling[.](#3.sentence-2)

â *end note*]

[4](#4)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L599)

If the input stream has been parsed into preprocessing tokens up to a
given character:

- [(4.1)](#4.1)

  If the next character begins a sequence of characters that could be the prefix
and initial double quote of a raw string literal, such as R", the next preprocessing
token shall be a raw string literal[.](#4.1.sentence-1)
  Between the initial and final
double quote characters of the raw string, any transformations performed in phase
2 (line splicing) are reverted; this reversion
shall apply before any [*d-char*](lex.string#nt:d-char "5.13.5 String literals [lex.string]"), [*r-char*](lex.string#nt:r-char "5.13.5 String literals [lex.string]"), or delimiting
parenthesis is identified[.](#4.1.sentence-2)
  The raw string literal is defined as the shortest sequence
of characters that matches the raw-string pattern
  [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt R [*raw-string*](lex.string#nt:raw-string "5.13.5 String literals [lex.string]")

- [(4.2)](#4.2)

  Otherwise, if the next three characters are <:: and the subsequent character
is neither : nor >, the < is treated as a preprocessing token by
itself and not as the first character of the alternative token <:[.](#4.2.sentence-1)

- [(4.3)](#4.3)

  Otherwise, if the next three characters are [:: and
the subsequent character is not :, or
if the next three characters are [:>,
the [ is treated as a preprocessing token by itself and
not as the first character of the preprocessing token [:[.](#4.3.sentence-1)
  [*Note [2](#note-2)*:
  The tokens [: and :] cannot be composed from digraphs[.](#4.3.sentence-2)
 â *end note*]

- [(4.4)](#4.4)

  Otherwise,
the next preprocessing token is the longest sequence of
characters that could constitute a preprocessing token, even if that
would cause further lexical analysis to fail,
except that
  * [(4.4.1)](#4.4.1)

a [*string-literal*](lex.string#nt:string-literal "5.13.5 String literals [lex.string]") token is never formed
when a [*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]") token can be formed, and

  * [(4.4.2)](#4.4.2)

a [*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]") ([[lex.header]](lex.header "5.6 Header names")) is only formed
    +
          [(4.4.2.1)](#4.4.2.1)
immediately after the include, embed, or import preprocessing token in a#include ([[cpp.include]](cpp.include "15.3 Source file inclusion")), #embed ([[cpp.embed]](cpp.embed "15.4 Resource inclusion")), orimport ([[cpp.import]](cpp.import "15.6 Header unit importation")) directive, respectively, or

    +
          [(4.4.2.2)](#4.4.2.2)
immediately after a preprocessing token sequence of __has_include or __has_embed immediately followed by ( in a #if, #elif, or #embed directive ([[cpp.cond]](cpp.cond "15.2 Conditional inclusion"), [[cpp.embed]](cpp.embed "15.4 Resource inclusion"))[.](#4.4.sentence-1)

[5](#5)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L655)

[*Example [1](#example-1)*: #define R "x"const char* s = R"y"; // ill-formed raw string, not "x" "y" â *end example*]

[6](#6)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L663)

[*Example [2](#example-2)*:

The program fragment 0xe+foo is parsed as a
preprocessing number token (one that is not a valid[*integer-literal*](lex.icon#nt:integer-literal "5.13.2 Integer literals [lex.icon]") or [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4 Floating-point literals [lex.fcon]") token),
even though a parse as three preprocessing tokens0xe, +, and foo can produce a valid expression (for example,
if foo is a macro defined as 1)[.](#6.sentence-1)

Similarly, the
program fragment 1E1 is parsed as a preprocessing number (one
that is a valid [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4 Floating-point literals [lex.fcon]") token),
whether or not E is a macro name[.](#6.sentence-2)

â *end example*]

[7](#7)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L676)

[*Example [3](#example-3)*:

The program fragment x+++++y is parsed as x++ ++ + y, which, if x and y have integral types,
violates a constraint on increment operators, even though the parsex ++ + ++ y can yield a correct expression[.](#7.sentence-1)

â *end example*]