216 lines
8.3 KiB
Markdown
216 lines
8.3 KiB
Markdown
[re.grammar]
|
||
|
||
# 28 Text processing library [[text]](./#text)
|
||
|
||
## 28.6 Regular expressions library [[re]](re#grammar)
|
||
|
||
### 28.6.12 Modified ECMAScript regular expression grammar [re.grammar]
|
||
|
||
[1](#1)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12786)
|
||
|
||
The regular expression grammar recognized bybasic_regex objects constructed with the ECMAScript
|
||
flag is that specified by ECMA-262, except as specified below[.](#1.sentence-1)
|
||
|
||
[2](#2)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12793)
|
||
|
||
Objects of type specialization of basic_regex store within themselves a
|
||
default-constructed instance of their traits template parameter, henceforth
|
||
referred to as traits_inst[.](#2.sentence-1)
|
||
|
||
This traits_inst object is used to support localization
|
||
of the regular expression; basic_regex member functions shall not call
|
||
any locale dependent C or C++ API, including the formatted string input functions[.](#2.sentence-2)
|
||
|
||
Instead they shall call the appropriate traits member function to achieve the required effect[.](#2.sentence-3)
|
||
|
||
[3](#3)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12803)
|
||
|
||
The following productions within the ECMAScript grammar are modified as follows:
|
||
|
||
ClassAtom ::
|
||
-
|
||
ClassAtomNoDash
|
||
ClassAtomExClass
|
||
ClassAtomCollatingElement
|
||
ClassAtomEquivalence
|
||
|
||
IdentityEscape ::
|
||
SourceCharacter **but not** c
|
||
|
||
[4](#4)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12820)
|
||
|
||
The following new productions are then added:
|
||
|
||
ClassAtomExClass ::
|
||
[: ClassName :]
|
||
|
||
ClassAtomCollatingElement ::
|
||
[. ClassName .]
|
||
|
||
ClassAtomEquivalence ::
|
||
[= ClassName =]
|
||
|
||
ClassName ::
|
||
ClassNameCharacter
|
||
ClassNameCharacter ClassName
|
||
|
||
ClassNameCharacter ::
|
||
SourceCharacter **but not one of** . **or** = **or** :
|
||
|
||
[5](#5)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12849)
|
||
|
||
The productions *ClassAtomExClass*, *ClassAtomCollatingElement* and *ClassAtomEquivalence* provide functionality
|
||
equivalent to that of the same features in regular expressions in POSIX[.](#5.sentence-1)
|
||
|
||
[6](#6)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12854)
|
||
|
||
The regular expression grammar may be modified by
|
||
any regex_constants::syntax_option_type flags specified when
|
||
constructing an object of type specialization of basic_regex according to the rules in Table [118](re.synopt#tab:re.synopt "Table 118: syntax_option_type effects")[.](#6.sentence-1)
|
||
|
||
[7](#7)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12860)
|
||
|
||
A *ClassName* production, when used in *ClassAtomExClass*,
|
||
is not valid if traits_inst.lookup_classname returns zero for
|
||
that name[.](#7.sentence-1)
|
||
|
||
The names recognized as valid *ClassName**s* are
|
||
determined by the type of the traits class, but at least the following
|
||
names shall be recognized:alnum, alpha, blank, cntrl, digit,graph, lower, print, punct, space,upper, xdigit, d, s, w[.](#7.sentence-2)
|
||
|
||
In addition the following expressions shall be equivalent:
|
||
|
||
\d and [[:digit:]] \D and [^[:digit:]] \s and [[:space:]] \S and [^[:space:]] \w and [_[:alnum:]] \W and [^_[:alnum:]]
|
||
|
||
[8](#8)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12885)
|
||
|
||
A *ClassName* production when used in
|
||
a *ClassAtomCollatingElement* production is not valid
|
||
if the value returned by traits_inst.lookup_collatename for
|
||
that name is an empty string[.](#8.sentence-1)
|
||
|
||
[9](#9)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12893)
|
||
|
||
The results from multiple calls
|
||
to traits_inst.lookup_classname can be bitwise or'ed
|
||
together and subsequently passed to traits_inst.isctype[.](#9.sentence-1)
|
||
|
||
[10](#10)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12902)
|
||
|
||
A *ClassName* production when used in
|
||
a *ClassAtomEquivalence* production is not valid if the value
|
||
returned by traits_inst.lookup_collatename for that name is an
|
||
empty string or if the value returned by traits_inst.transform_primary for the result of the call to traits_inst.lookup_collatename is an empty string[.](#10.sentence-1)
|
||
|
||
[11](#11)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12910)
|
||
|
||
When the sequence of characters being transformed to a finite state
|
||
machine contains an invalid class name the translator shall throw an
|
||
exception object of type regex_error[.](#11.sentence-1)
|
||
|
||
[12](#12)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12916)
|
||
|
||
If the *CV* of a *UnicodeEscapeSequence* is greater than the largest
|
||
value that can be held in an object of type charT the translator shall
|
||
throw an exception object of type regex_error[.](#12.sentence-1)
|
||
|
||
[*Note [1](#note-1)*:
|
||
|
||
This means that values of the form "\uxxxx" that do not fit in
|
||
a character are invalid[.](#12.sentence-2)
|
||
|
||
â *end note*]
|
||
|
||
[13](#13)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12926)
|
||
|
||
Where the regular expression grammar requires the conversion of a sequence of characters
|
||
to an integral value, this is accomplished by calling traits_inst.value[.](#13.sentence-1)
|
||
|
||
[14](#14)
|
||
|
||
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12930)
|
||
|
||
The behavior of the internal finite state machine representation when used to match a
|
||
sequence of characters is as described in ECMA-262[.](#14.sentence-1)
|
||
|
||
The behavior is modified according
|
||
to any match_flag_type flags ([[re.matchflag]](re.matchflag "28.6.4.3 Bitmask type match_flag_type")) specified when using the regular expression
|
||
object in one of the regular expression algorithms ([[re.alg]](re.alg "28.6.10 Regular expression algorithms"))[.](#14.sentence-2)
|
||
|
||
The behavior is also
|
||
localized by interaction with the traits class template parameter as follows:
|
||
|
||
- [(14.1)](#14.1)
|
||
|
||
During matching of a regular expression finite state machine
|
||
against a sequence of characters, two characters c and d are compared using the following rules:
|
||
* [(14.1.1)](#14.1.1)
|
||
|
||
if (flags() & regex_constants::icase) the two characters are equal
|
||
if traits_inst.translate_nocase(c) == traits_inst.translate_nocase(d);
|
||
|
||
* [(14.1.2)](#14.1.2)
|
||
|
||
otherwise, if flags() & regex_constants::collate the
|
||
two characters are equal iftraits_inst.translate(c) == traits_inst.translate(d);
|
||
|
||
* [(14.1.3)](#14.1.3)
|
||
|
||
otherwise, the two characters are equal if c == d[.](#14.1.sentence-1)
|
||
|
||
- [(14.2)](#14.2)
|
||
|
||
During matching of a regular expression finite state machine
|
||
against a sequence of characters, comparison of a collating element
|
||
range c1-c2 against a character c is
|
||
conducted as follows: if flags() & regex_constants::collate is false then the character c is matched if c1<= c && c <= c2, otherwise c is matched in
|
||
accordance with the following algorithm:
|
||
string_type str1 = string_type(1,
|
||
flags() & icase ? traits_inst.translate_nocase(c1) : traits_inst.translate(c1));
|
||
string_type str2 = string_type(1,
|
||
flags() & icase ? traits_inst.translate_nocase(c2) : traits_inst.translate(c2));
|
||
string_type str = string_type(1,
|
||
flags() & icase ? traits_inst.translate_nocase(c) : traits_inst.translate(c));return traits_inst.transform(str1.begin(), str1.end())<= traits_inst.transform(str.begin(), str.end())&& traits_inst.transform(str.begin(), str.end())<= traits_inst.transform(str2.begin(), str2.end());
|
||
|
||
- [(14.3)](#14.3)
|
||
|
||
During matching of a regular expression finite state machine against a sequence of
|
||
characters, testing whether a collating element is a member of a primary equivalence
|
||
class is conducted by first converting the collating element and the equivalence
|
||
class to sort keys using traits::transform_primary, and then comparing the sort
|
||
keys for equality[.](#14.3.sentence-1)
|
||
|
||
- [(14.4)](#14.4)
|
||
|
||
During matching of a regular expression finite state machine against a sequence
|
||
of characters, a character c is a member of a character class designated by an
|
||
iterator range [first, last) iftraits_inst.isctype(c, traits_inst.lookup_classname(first, last, flags() & icase)) is true[.](#14.4.sentence-1)
|
||
|
||
See also: ECMA-262 15.10
|