cppdraft_translate/cppdraft/re/grammar.md

[re.grammar]

# 28 Text processing library [[text]](./#text)

## 28.6 Regular expressions library [[re]](re#grammar)

### 28.6.12 Modified ECMAScript regular expression grammar [re.grammar]

[1](#1)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12786)

The regular expression grammar recognized bybasic_regex objects constructed with the ECMAScript
flag is that specified by ECMA-262, except as specified below[.](#1.sentence-1)

[2](#2)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12793)

Objects of type specialization of basic_regex store within themselves a
default-constructed instance of their traits template parameter, henceforth
referred to as traits_inst[.](#2.sentence-1)

This traits_inst object is used to support localization
of the regular expression; basic_regex member functions shall not call
any locale dependent C or C++ API, including the formatted string input functions[.](#2.sentence-2)

Instead they shall call the appropriate traits member function to achieve the required effect[.](#2.sentence-3)

[3](#3)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12803)

The following productions within the ECMAScript grammar are modified as follows:

ClassAtom ::
-
ClassAtomNoDash
ClassAtomExClass
ClassAtomCollatingElement
ClassAtomEquivalence

IdentityEscape ::
SourceCharacter **but not** c

[4](#4)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12820)

The following new productions are then added:

ClassAtomExClass ::
[: ClassName :]

ClassAtomCollatingElement ::
[. ClassName .]

ClassAtomEquivalence ::
[= ClassName =]

ClassName ::
ClassNameCharacter
ClassNameCharacter ClassName

ClassNameCharacter ::
SourceCharacter **but not one of** . **or** = **or** :

[5](#5)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12849)

The productions *ClassAtomExClass*, *ClassAtomCollatingElement* and *ClassAtomEquivalence* provide functionality
equivalent to that of the same features in regular expressions in POSIX[.](#5.sentence-1)

[6](#6)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12854)

The regular expression grammar may be modified by
any regex_constants::syntax_option_type flags specified when
constructing an object of type specialization of basic_regex according to the rules in Table [118](re.synopt#tab:re.synopt "Table 118: syntax_option_type effects")[.](#6.sentence-1)

[7](#7)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12860)

A *ClassName* production, when used in *ClassAtomExClass*,
is not valid if traits_inst.lookup_classname returns zero for
that name[.](#7.sentence-1)

The names recognized as valid *ClassName**s* are
determined by the type of the traits class, but at least the following
names shall be recognized:alnum, alpha, blank, cntrl, digit,graph, lower, print, punct, space,upper, xdigit, d, s, w[.](#7.sentence-2)

In addition the following expressions shall be equivalent:

\d and [[:digit:]] \D and [^[:digit:]] \s and [[:space:]] \S and [^[:space:]] \w and [_[:alnum:]] \W and [^_[:alnum:]]

[8](#8)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12885)

A *ClassName* production when used in
a *ClassAtomCollatingElement* production is not valid
if the value returned by traits_inst.lookup_collatename for
that name is an empty string[.](#8.sentence-1)

[9](#9)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12893)

The results from multiple calls
to traits_inst.lookup_classname can be bitwise or'ed
together and subsequently passed to traits_inst.isctype[.](#9.sentence-1)

[10](#10)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12902)

A *ClassName* production when used in
a *ClassAtomEquivalence* production is not valid if the value
returned by traits_inst.lookup_collatename for that name is an
empty string or if the value returned by traits_inst.transform_primary for the result of the call to traits_inst.lookup_collatename is an empty string[.](#10.sentence-1)

[11](#11)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12910)

When the sequence of characters being transformed to a finite state
machine contains an invalid class name the translator shall throw an
exception object of type regex_error[.](#11.sentence-1)

[12](#12)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12916)

If the *CV* of a *UnicodeEscapeSequence* is greater than the largest
value that can be held in an object of type charT the translator shall
throw an exception object of type regex_error[.](#12.sentence-1)

[*Note [1](#note-1)*:

This means that values of the form "\uxxxx" that do not fit in
a character are invalid[.](#12.sentence-2)

â *end note*]

[13](#13)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12926)

Where the regular expression grammar requires the conversion of a sequence of characters
to an integral value, this is accomplished by calling traits_inst.value[.](#13.sentence-1)

[14](#14)

[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12930)

The behavior of the internal finite state machine representation when used to match a
sequence of characters is as described in ECMA-262[.](#14.sentence-1)

The behavior is modified according
to any match_flag_type flags ([[re.matchflag]](re.matchflag "28.6.4.3 Bitmask type match_flag_type")) specified when using the regular expression
object in one of the regular expression algorithms ([[re.alg]](re.alg "28.6.10 Regular expression algorithms"))[.](#14.sentence-2)

The behavior is also
localized by interaction with the traits class template parameter as follows:

- [(14.1)](#14.1)

  During matching of a regular expression finite state machine
against a sequence of characters, two characters c and d are compared using the following rules:
  * [(14.1.1)](#14.1.1)

if (flags() & regex_constants::icase) the two characters are equal
if traits_inst.translate_nocase(c) == traits_inst.translate_nocase(d);

  * [(14.1.2)](#14.1.2)

otherwise, if flags() & regex_constants::collate the
two characters are equal iftraits_inst.translate(c) == traits_inst.translate(d);

  * [(14.1.3)](#14.1.3)

otherwise, the two characters are equal if c == d[.](#14.1.sentence-1)

- [(14.2)](#14.2)

  During matching of a regular expression finite state machine
against a sequence of characters, comparison of a collating element
range c1-c2 against a character c is
conducted as follows: if flags() & regex_constants::collate is false then the character c is matched if c1<= c && c <= c2, otherwise c is matched in
accordance with the following algorithm:
  string_type str1 = string_type(1,
 flags() & icase ? traits_inst.translate_nocase(c1) : traits_inst.translate(c1));
string_type str2 = string_type(1,
 flags() & icase ? traits_inst.translate_nocase(c2) : traits_inst.translate(c2));
string_type str = string_type(1,
 flags() & icase ? traits_inst.translate_nocase(c) : traits_inst.translate(c));return traits_inst.transform(str1.begin(), str1.end())<= traits_inst.transform(str.begin(), str.end())&& traits_inst.transform(str.begin(), str.end())<= traits_inst.transform(str2.begin(), str2.end());

- [(14.3)](#14.3)

  During matching of a regular expression finite state machine against a sequence of
characters, testing whether a collating element is a member of a primary equivalence
class is conducted by first converting the collating element and the equivalence
class to sort keys using traits::transform_primary, and then comparing the sort
keys for equality[.](#14.3.sentence-1)

- [(14.4)](#14.4)

  During matching of a regular expression finite state machine against a sequence
of characters, a character c is a member of a character class designated by an
iterator range [first, last) iftraits_inst.isctype(c, traits_inst.lookup_classname(first, last, flags() & icase)) is true[.](#14.4.sentence-1)

See also: ECMA-262 15.10