Files
2025-10-25 03:02:53 +03:00

216 lines
8.3 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[re.grammar]
# 28 Text processing library [[text]](./#text)
## 28.6 Regular expressions library [[re]](re#grammar)
### 28.6.12 Modified ECMAScript regular expression grammar [re.grammar]
[1](#1)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12786)
The regular expression grammar recognized bybasic_regex objects constructed with the ECMAScript
flag is that specified by ECMA-262, except as specified below[.](#1.sentence-1)
[2](#2)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12793)
Objects of type specialization of basic_regex store within themselves a
default-constructed instance of their traits template parameter, henceforth
referred to as traits_inst[.](#2.sentence-1)
This traits_inst object is used to support localization
of the regular expression; basic_regex member functions shall not call
any locale dependent C or C++ API, including the formatted string input functions[.](#2.sentence-2)
Instead they shall call the appropriate traits member function to achieve the required effect[.](#2.sentence-3)
[3](#3)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12803)
The following productions within the ECMAScript grammar are modified as follows:
ClassAtom ::
-
ClassAtomNoDash
ClassAtomExClass
ClassAtomCollatingElement
ClassAtomEquivalence
IdentityEscape ::
SourceCharacter **but not** c
[4](#4)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12820)
The following new productions are then added:
ClassAtomExClass ::
[: ClassName :]
ClassAtomCollatingElement ::
[. ClassName .]
ClassAtomEquivalence ::
[= ClassName =]
ClassName ::
ClassNameCharacter
ClassNameCharacter ClassName
ClassNameCharacter ::
SourceCharacter **but not one of** . **or** = **or** :
[5](#5)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12849)
The productions *ClassAtomExClass*, *ClassAtomCollatingElement* and *ClassAtomEquivalence* provide functionality
equivalent to that of the same features in regular expressions in POSIX[.](#5.sentence-1)
[6](#6)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12854)
The regular expression grammar may be modified by
any regex_constants::syntax_option_type flags specified when
constructing an object of type specialization of basic_regex according to the rules in Table [118](re.synopt#tab:re.synopt "Table 118: syntax_­option_­type effects")[.](#6.sentence-1)
[7](#7)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12860)
A *ClassName* production, when used in *ClassAtomExClass*,
is not valid if traits_inst.lookup_classname returns zero for
that name[.](#7.sentence-1)
The names recognized as valid *ClassName**s* are
determined by the type of the traits class, but at least the following
names shall be recognized:alnum, alpha, blank, cntrl, digit,graph, lower, print, punct, space,upper, xdigit, d, s, w[.](#7.sentence-2)
In addition the following expressions shall be equivalent:
\d and [[:digit:]] \D and [^[:digit:]] \s and [[:space:]] \S and [^[:space:]] \w and [_[:alnum:]] \W and [^_[:alnum:]]
[8](#8)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12885)
A *ClassName* production when used in
a *ClassAtomCollatingElement* production is not valid
if the value returned by traits_inst.lookup_collatename for
that name is an empty string[.](#8.sentence-1)
[9](#9)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12893)
The results from multiple calls
to traits_inst.lookup_classname can be bitwise or'ed
together and subsequently passed to traits_inst.isctype[.](#9.sentence-1)
[10](#10)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12902)
A *ClassName* production when used in
a *ClassAtomEquivalence* production is not valid if the value
returned by traits_inst.lookup_collatename for that name is an
empty string or if the value returned by traits_inst.transform_primary for the result of the call to traits_inst.lookup_collatename is an empty string[.](#10.sentence-1)
[11](#11)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12910)
When the sequence of characters being transformed to a finite state
machine contains an invalid class name the translator shall throw an
exception object of type regex_error[.](#11.sentence-1)
[12](#12)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12916)
If the *CV* of a *UnicodeEscapeSequence* is greater than the largest
value that can be held in an object of type charT the translator shall
throw an exception object of type regex_error[.](#12.sentence-1)
[*Note [1](#note-1)*:
This means that values of the form "\uxxxx" that do not fit in
a character are invalid[.](#12.sentence-2)
— *end note*]
[13](#13)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12926)
Where the regular expression grammar requires the conversion of a sequence of characters
to an integral value, this is accomplished by calling traits_inst.value[.](#13.sentence-1)
[14](#14)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/text.tex#L12930)
The behavior of the internal finite state machine representation when used to match a
sequence of characters is as described in ECMA-262[.](#14.sentence-1)
The behavior is modified according
to any match_flag_type flags ([[re.matchflag]](re.matchflag "28.6.4.3Bitmask type match_­flag_­type")) specified when using the regular expression
object in one of the regular expression algorithms ([[re.alg]](re.alg "28.6.10Regular expression algorithms"))[.](#14.sentence-2)
The behavior is also
localized by interaction with the traits class template parameter as follows:
- [(14.1)](#14.1)
During matching of a regular expression finite state machine
against a sequence of characters, two characters c and d are compared using the following rules:
* [(14.1.1)](#14.1.1)
if (flags() & regex_constants::icase) the two characters are equal
if traits_inst.translate_nocase(c) == traits_inst.translate_nocase(d);
* [(14.1.2)](#14.1.2)
otherwise, if flags() & regex_constants::collate the
two characters are equal iftraits_inst.translate(c) == traits_inst.translate(d);
* [(14.1.3)](#14.1.3)
otherwise, the two characters are equal if c == d[.](#14.1.sentence-1)
- [(14.2)](#14.2)
During matching of a regular expression finite state machine
against a sequence of characters, comparison of a collating element
range c1-c2 against a character c is
conducted as follows: if flags() & regex_constants::collate is false then the character c is matched if c1<= c && c <= c2, otherwise c is matched in
accordance with the following algorithm:
string_type str1 = string_type(1,
flags() & icase ? traits_inst.translate_nocase(c1) : traits_inst.translate(c1));
string_type str2 = string_type(1,
flags() & icase ? traits_inst.translate_nocase(c2) : traits_inst.translate(c2));
string_type str = string_type(1,
flags() & icase ? traits_inst.translate_nocase(c) : traits_inst.translate(c));return traits_inst.transform(str1.begin(), str1.end())<= traits_inst.transform(str.begin(), str.end())&& traits_inst.transform(str.begin(), str.end())<= traits_inst.transform(str2.begin(), str2.end());
- [(14.3)](#14.3)
During matching of a regular expression finite state machine against a sequence of
characters, testing whether a collating element is a member of a primary equivalence
class is conducted by first converting the collating element and the equivalence
class to sort keys using traits::transform_primary, and then comparing the sort
keys for equality[.](#14.3.sentence-1)
- [(14.4)](#14.4)
During matching of a regular expression finite state machine against a sequence
of characters, a character c is a member of a character class designated by an
iterator range [first, last) iftraits_inst.isctype(c, traits_inst.lookup_classname(first, last, flags() & icase)) is true[.](#14.4.sentence-1)
See also: ECMA-262 15.10