[re.grammar] # 28 Text processing library [[text]](./#text) ## 28.6 Regular expressions library [[re]](re#grammar) ### 28.6.12 Modified ECMAScript regular expression grammar [re.grammar] [1](#1) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12786) The regular expression grammar recognized bybasic_regex objects constructed with the ECMAScript flag is that specified by ECMA-262, except as specified below[.](#1.sentence-1) [2](#2) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12793) Objects of type specialization of basic_regex store within themselves a default-constructed instance of their traits template parameter, henceforth referred to as traits_inst[.](#2.sentence-1) This traits_inst object is used to support localization of the regular expression; basic_regex member functions shall not call any locale dependent C or C++ API, including the formatted string input functions[.](#2.sentence-2) Instead they shall call the appropriate traits member function to achieve the required effect[.](#2.sentence-3) [3](#3) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12803) The following productions within the ECMAScript grammar are modified as follows: ClassAtom :: - ClassAtomNoDash ClassAtomExClass ClassAtomCollatingElement ClassAtomEquivalence IdentityEscape :: SourceCharacter **but not** c [4](#4) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12820) The following new productions are then added: ClassAtomExClass :: [: ClassName :] ClassAtomCollatingElement :: [. ClassName .] ClassAtomEquivalence :: [= ClassName =] ClassName :: ClassNameCharacter ClassNameCharacter ClassName ClassNameCharacter :: SourceCharacter **but not one of** . **or** = **or** : [5](#5) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12849) The productions *ClassAtomExClass*, *ClassAtomCollatingElement* and *ClassAtomEquivalence* provide functionality equivalent to that of the same features in regular expressions in POSIX[.](#5.sentence-1) [6](#6) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12854) The regular expression grammar may be modified by any regex_constants​::​syntax_option_type flags specified when constructing an object of type specialization of basic_regex according to the rules in Table [118](re.synopt#tab:re.synopt "Table 118: syntax_­option_­type effects")[.](#6.sentence-1) [7](#7) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12860) A *ClassName* production, when used in *ClassAtomExClass*, is not valid if traits_inst.lookup_classname returns zero for that name[.](#7.sentence-1) The names recognized as valid *ClassName**s* are determined by the type of the traits class, but at least the following names shall be recognized:alnum, alpha, blank, cntrl, digit,graph, lower, print, punct, space,upper, xdigit, d, s, w[.](#7.sentence-2) In addition the following expressions shall be equivalent: \d and [[:digit:]] \D and [^[:digit:]] \s and [[:space:]] \S and [^[:space:]] \w and [_[:alnum:]] \W and [^_[:alnum:]] [8](#8) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12885) A *ClassName* production when used in a *ClassAtomCollatingElement* production is not valid if the value returned by traits_inst.lookup_collatename for that name is an empty string[.](#8.sentence-1) [9](#9) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12893) The results from multiple calls to traits_inst.lookup_classname can be bitwise or'ed together and subsequently passed to traits_inst.isctype[.](#9.sentence-1) [10](#10) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12902) A *ClassName* production when used in a *ClassAtomEquivalence* production is not valid if the value returned by traits_inst.lookup_collatename for that name is an empty string or if the value returned by traits_inst​.transform_primary for the result of the call to traits_inst.lookup_collatename is an empty string[.](#10.sentence-1) [11](#11) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12910) When the sequence of characters being transformed to a finite state machine contains an invalid class name the translator shall throw an exception object of type regex_error[.](#11.sentence-1) [12](#12) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12916) If the *CV* of a *UnicodeEscapeSequence* is greater than the largest value that can be held in an object of type charT the translator shall throw an exception object of type regex_error[.](#12.sentence-1) [*Note [1](#note-1)*: This means that values of the form "\uxxxx" that do not fit in a character are invalid[.](#12.sentence-2) — *end note*] [13](#13) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12926) Where the regular expression grammar requires the conversion of a sequence of characters to an integral value, this is accomplished by calling traits_inst.value[.](#13.sentence-1) [14](#14) [#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12930) The behavior of the internal finite state machine representation when used to match a sequence of characters is as described in ECMA-262[.](#14.sentence-1) The behavior is modified according to any match_flag_type flags ([[re.matchflag]](re.matchflag "28.6.4.3 Bitmask type match_­flag_­type")) specified when using the regular expression object in one of the regular expression algorithms ([[re.alg]](re.alg "28.6.10 Regular expression algorithms"))[.](#14.sentence-2) The behavior is also localized by interaction with the traits class template parameter as follows: - [(14.1)](#14.1) During matching of a regular expression finite state machine against a sequence of characters, two characters c and d are compared using the following rules: * [(14.1.1)](#14.1.1) if (flags() & regex_constants​::​icase) the two characters are equal if traits_inst.translate_nocase(c) == traits_inst.translate_nocase(d); * [(14.1.2)](#14.1.2) otherwise, if flags() & regex_constants​::​collate the two characters are equal iftraits_inst​.translate(c) == traits_inst​.translate(d); * [(14.1.3)](#14.1.3) otherwise, the two characters are equal if c == d[.](#14.1.sentence-1) - [(14.2)](#14.2) During matching of a regular expression finite state machine against a sequence of characters, comparison of a collating element range c1-c2 against a character c is conducted as follows: if flags() & regex_constants​::​collate is false then the character c is matched if c1<= c && c <= c2, otherwise c is matched in accordance with the following algorithm: string_type str1 = string_type(1, flags() & icase ? traits_inst.translate_nocase(c1) : traits_inst.translate(c1)); string_type str2 = string_type(1, flags() & icase ? traits_inst.translate_nocase(c2) : traits_inst.translate(c2)); string_type str = string_type(1, flags() & icase ? traits_inst.translate_nocase(c) : traits_inst.translate(c));return traits_inst.transform(str1.begin(), str1.end())<= traits_inst.transform(str.begin(), str.end())&& traits_inst.transform(str.begin(), str.end())<= traits_inst.transform(str2.begin(), str2.end()); - [(14.3)](#14.3) During matching of a regular expression finite state machine against a sequence of characters, testing whether a collating element is a member of a primary equivalence class is conducted by first converting the collating element and the equivalence class to sort keys using traits​::​transform_primary, and then comparing the sort keys for equality[.](#14.3.sentence-1) - [(14.4)](#14.4) During matching of a regular expression finite state machine against a sequence of characters, a character c is a member of a character class designated by an iterator range [first, last) iftraits_inst.isctype(c, traits_inst.lookup_classname(first, last, flags() & icase)) is true[.](#14.4.sentence-1) See also: ECMA-262 15.10