mirror of
https://gitlab.com/fbb-git/cppannotations
synced 2024-11-16 07:48:44 +01:00
125 lines
5.3 KiB
Text
125 lines
5.3 KiB
Text
Regular expressions are expressions consisting of elements resembling those of
|
|
numeric expressions. Regular expressions consist of basic elements and
|
|
operators, having various priorities and associations. Like numeric
|
|
expressions, parentheses can be used to group elements together to form a
|
|
units on which operators operate. For an extensive discussion the reader is
|
|
referred to, e.g., section 15.10 of the url(ecma-international.org)
|
|
(http://ecma-international.org/ecma-262/5.1/#sec-15.10) page, which
|
|
describes the characteristics of the regular expressions used by default by
|
|
bf(C++)'s tt(regex) classes.
|
|
|
|
bf(C++)'s default definition of regular expressions distinguishes the
|
|
following em(atoms):
|
|
itemization(
|
|
itt(x): the character `x';
|
|
|
|
itt(.): any character except for the newline character;
|
|
|
|
itt([xyz]): a character class; in this case, either an `x', a `y', or a `z'
|
|
matches the regular expression. See also the paragraph about
|
|
character classes below;
|
|
|
|
itt([abj-oZ]): a character class containing a range of characters; this
|
|
regular expression matches an `a', a `b', any letter from `j' through
|
|
`o', or a `Z'. See also the paragraph about character classes below;
|
|
|
|
itt([^A-Z]): a negated character class: this regular expression matches
|
|
any character but those in the class beyond tt(^). In this case, any
|
|
character em(except for) an uppercase letter. See also the paragraph
|
|
about character classes below;
|
|
|
|
itt([:predef:]): a em(predefined) set of characters. See below for an
|
|
overview. When used, it is interpreted as an element in a character
|
|
class. It is therefore always embedded in a set of square brackets
|
|
defining the character class (e.g., tt([[:alnum:]]));
|
|
|
|
itt(\X): if X is `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C
|
|
interpretation of `\x'. Otherwise, a literal `X' (used to escape
|
|
operators such as tt(*));
|
|
|
|
itt((r)): the regular expression tt(r). It is used to override precedence
|
|
(see below), but also to define tt(r) as a
|
|
emi(marked sub-expression) whose matching characters may directly be
|
|
retrieved from, e.g., an tt(std::smatch) object (cf. section
|
|
ref(SMATCH));
|
|
|
|
itt((?:r)): the regular expression tt(r). It is used to override
|
|
precedence (see below), but it is em(not) regarded as a em(marked
|
|
sub-expression);
|
|
)
|
|
|
|
In addition to these basic atoms, the following special atoms are available
|
|
(which can also be used in character classes):
|
|
itemization(
|
|
itt(\s): a whitespace character;
|
|
|
|
itt(\S): any character but a whitespace character;
|
|
|
|
itt(\d): a decimal digit character;
|
|
|
|
itt(\D): any character but a decimal digit character;
|
|
|
|
itt(\w): an alphanumeric character or an underscore (tt(_)) character;
|
|
|
|
itt(\W): any character but an alphanumeric character or an underscore
|
|
(tt(_)) character.
|
|
)
|
|
|
|
Atoms may be concatenated. If tt(r) and tt(s) are atoms then the regular
|
|
expression tt(rs) matches a target text if the target text matches tt(r)
|
|
em(and) tt(s), in that order (without any intermediate characters
|
|
inside the target text). E.g., the regular expression tt([ab][cd]) matches the
|
|
target text tt(ac), but not the target text tt(a:c).
|
|
|
|
Atoms may be combined using operators. Operators bind to the preceding
|
|
atom. If an operator should operate on multiple atoms the atoms must be
|
|
surrounded by parentheses (see the last element in the previous itemization).
|
|
To use an operator character as an atom it can be escaped. Eg., tt(*)
|
|
represent an operator, tt(\*) the atom character star. Note that
|
|
character classes do not recognize escape sequences: tt([\*]) represents a
|
|
character class consisting of two characters: a backslash and a star.
|
|
|
|
The following operators are supported (tt(r) and tt(s) represent regular
|
|
expression atoms):
|
|
itemization(
|
|
itt(r*): zero or more tt(r)s;
|
|
|
|
itt(r+): one or more tt(r)s;
|
|
|
|
itt(r?): zero or one tt(r)s (that is, an optional r);
|
|
|
|
itt(r{m, n}): where tt(1 <= m <= n): matches `r' at least m, but at most n
|
|
times;
|
|
|
|
itt(r{m,}): where tt(1 <= m): matches `r' at least m times;
|
|
|
|
itt(r{m}): where tt(1 <= m): matches `r' exactly m times;
|
|
|
|
itt(r|s): matches either an `r' or an `s'. This operator has a lower
|
|
priority than any of the multiplication operators;
|
|
|
|
itt(^r) : tt(^) is a pseudo operator. This expression matches `r', if
|
|
appearing at the beginning of the target text. If the tt(^)-character
|
|
is not the first character of a regular expression it is interpreted
|
|
as a literal tt(^)-character;
|
|
|
|
itt(r$): tt($) is a pseudo operator. This expression matches `r', if
|
|
appearing at the end of the target text. If the tt($)-character is not
|
|
the last character of a regular expression it is interpreted as a
|
|
literal tt($)-character;
|
|
)
|
|
|
|
When a regular expression contains marked sub-expressions and multipliers, and
|
|
the marked sub-expressions are multiply matched, then the target's final
|
|
sub-string matching the marked sub-expression is reported as the text matching
|
|
the marked sub-expression. E.g, when using tt(regex_search) (cf. section
|
|
ref(REGSRCH)), marked sub-expression (tt(((a|b)+\s?))), and target text tt(a a
|
|
b), then tt(a a b) is the fully matched text, while tt(b) is reported as the
|
|
sub-string matching the first and second marked sub-expressions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|