[lex.pptoken] # 5 Lexical conventions [[lex]](./#lex) ## 5.5 Preprocessing tokens [lex.pptoken] [preprocessing-token:](#nt:preprocessing-token "5.5 Preprocessing tokens [lex.pptoken]") [*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]") import-keyword module-keyword export-keyword [*identifier*](lex.name#nt:identifier "5.11 Identifiers [lex.name]") [*pp-number*](lex.ppnumber#nt:pp-number "5.7 Preprocessing numbers [lex.ppnumber]") [*character-literal*](lex.ccon#nt:character-literal "5.13.3 Character literals [lex.ccon]") [*user-defined-character-literal*](lex.ext#nt:user-defined-character-literal "5.13.9 User-defined literals [lex.ext]") [*string-literal*](lex.string#nt:string-literal "5.13.5 String literals [lex.string]") [*user-defined-string-literal*](lex.ext#nt:user-defined-string-literal "5.13.9 User-defined literals [lex.ext]") [*preprocessing-op-or-punc*](lex.operators#nt:preprocessing-op-or-punc "5.8 Operators and punctuators [lex.operators]") each non-whitespace character that cannot be one of the above [1](#1) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L549) A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6[.](#1.sentence-1) In this document, glyphs are used to identify elements of the basic character set ([[lex.charset]](lex.charset "5.3.1 Character sets"))[.](#1.sentence-2) The categories of preprocessing token are: header names, placeholder tokens produced by preprocessing import and module directives (*import-keyword*, *module-keyword*, and *export-keyword*), identifiers, preprocessing numbers, character literals (including user-defined character literals), string literals (including user-defined string literals), preprocessing operators and punctuators, and single non-whitespace characters that do not lexically match the other preprocessing token categories[.](#1.sentence-3) If a U+0027 apostrophe or a U+0022 quotation mark character matches the last category, the program is ill-formed[.](#1.sentence-4) If any character not in the basic character set matches the last category, the program is ill-formed[.](#1.sentence-5) Preprocessing tokens can be separated bywhitespace;this consists of comments ([[lex.comment]](lex.comment "5.4 Comments")), or whitespace characters (U+0020 space,U+0009 character tabulation, new-line,U+000b line tabulation, andU+000c form feed), or both[.](#1.sentence-6) As described in [[cpp]](cpp "15 Preprocessing directives"), in certain circumstances during translation phase 4, whitespace (or the absence thereof) serves as more than preprocessing token separation[.](#1.sentence-7) Whitespace can appear within a preprocessing token only as part of a header name or between the quotation characters in a character literal or string literal[.](#1.sentence-8) [2](#2) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L583) Each preprocessing token that is converted to a token ([[lex.token]](lex.token "5.10 Tokens")) shall have the lexical form of a keyword, an identifier, a literal, or an operator or punctuator[.](#2.sentence-1) [3](#3) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L588) The *import-keyword* is produced by processing an import directive ([[cpp.import]](cpp.import "15.6 Header unit importation")), the *module-keyword* is produced by preprocessing a module directive ([[cpp.module]](cpp.module "15.5 Module directive")), and the *export-keyword* is produced by preprocessing either of the previous two directives[.](#3.sentence-1) [*Note [1](#note-1)*: None has any observable spelling[.](#3.sentence-2) — *end note*] [4](#4) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L599) If the input stream has been parsed into preprocessing tokens up to a given character: - [(4.1)](#4.1) If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal[.](#4.1.sentence-1) Between the initial and final double quote characters of the raw string, any transformations performed in phase 2 (line splicing) are reverted; this reversion shall apply before any [*d-char*](lex.string#nt:d-char "5.13.5 String literals [lex.string]"), [*r-char*](lex.string#nt:r-char "5.13.5 String literals [lex.string]"), or delimiting parenthesis is identified[.](#4.1.sentence-2) The raw string literal is defined as the shortest sequence of characters that matches the raw-string pattern [*encoding-prefix*](lex.ccon#nt:encoding-prefix "5.13.3 Character literals [lex.ccon]")opt R [*raw-string*](lex.string#nt:raw-string "5.13.5 String literals [lex.string]") - [(4.2)](#4.2) Otherwise, if the next three characters are <​::​ and the subsequent character is neither : nor >, the < is treated as a preprocessing token by itself and not as the first character of the alternative token <:[.](#4.2.sentence-1) - [(4.3)](#4.3) Otherwise, if the next three characters are [​::​ and the subsequent character is not :, or if the next three characters are [:>, the [ is treated as a preprocessing token by itself and not as the first character of the preprocessing token [:[.](#4.3.sentence-1) [*Note [2](#note-2)*: The tokens [: and :] cannot be composed from digraphs[.](#4.3.sentence-2) — *end note*] - [(4.4)](#4.4) Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that * [(4.4.1)](#4.4.1) a [*string-literal*](lex.string#nt:string-literal "5.13.5 String literals [lex.string]") token is never formed when a [*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]") token can be formed, and * [(4.4.2)](#4.4.2) a [*header-name*](lex.header#nt:header-name "5.6 Header names [lex.header]") ([[lex.header]](lex.header "5.6 Header names")) is only formed + [(4.4.2.1)](#4.4.2.1) immediately after the include, embed, or import preprocessing token in a#include ([[cpp.include]](cpp.include "15.3 Source file inclusion")), #embed ([[cpp.embed]](cpp.embed "15.4 Resource inclusion")), orimport ([[cpp.import]](cpp.import "15.6 Header unit importation")) directive, respectively, or + [(4.4.2.2)](#4.4.2.2) immediately after a preprocessing token sequence of __has_include or __has_embed immediately followed by ( in a #if, #elif, or #embed directive ([[cpp.cond]](cpp.cond "15.2 Conditional inclusion"), [[cpp.embed]](cpp.embed "15.4 Resource inclusion"))[.](#4.4.sentence-1) [5](#5) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L655) [*Example [1](#example-1)*: #define R "x"const char* s = R"y"; // ill-formed raw string, not "x" "y" — *end example*] [6](#6) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L663) [*Example [2](#example-2)*: The program fragment 0xe+foo is parsed as a preprocessing number token (one that is not a valid[*integer-literal*](lex.icon#nt:integer-literal "5.13.2 Integer literals [lex.icon]") or [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4 Floating-point literals [lex.fcon]") token), even though a parse as three preprocessing tokens0xe, +, and foo can produce a valid expression (for example, if foo is a macro defined as 1)[.](#6.sentence-1) Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid [*floating-point-literal*](lex.fcon#nt:floating-point-literal "5.13.4 Floating-point literals [lex.fcon]") token), whether or not E is a macro name[.](#6.sentence-2) — *end example*] [7](#7) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lex.tex#L676) [*Example [3](#example-3)*: The program fragment x+++++y is parsed as x++ ++ + y, which, if x and y have integral types, violates a constraint on increment operators, even though the parsex ++ + ++ y can yield a correct expression[.](#7.sentence-1) — *end example*]