HomeFreeBSD

libregex: implement GNU extensions

Description

libregex: implement GNU extensions

18a1e2e9: libregex: Implement a subset of the GNU extensions

The entire patch-set is not yet mature enough for commit, but this usable
subset is generally enough for googletest to be happy with and mostly map to
some existing concepts, so they're not as invasive.

The specific changes included here are:

  • Branching in BREs with \|
  • \w and \W for :alnum: and [^[:alnum:]] respectively
  • \s and \S for :space: and [^[:space:]] respectively
  • Additional quantifiers in BREs, \? and \+ (self-explanatory)

There's some #ifdef'd out work for allowing empty branches as a match-all.
This is a feature that's under assessment... future work will determine
how standard this behavior is and act accordingly.

61898cde: libregex: disable some of the unimplemented test cases for now

This should allow the tests to actually pass. Future work will uncomment the
unimplemented tests as they're implemented.

7518fb34: libc: regex: factor out ISBOW/ISEOW macros

These will be reused for \b (word boundary, which matches both sides).

No functional change.

ca53e5ae: libregex: implement \` and \' (begin-of-subj, end-of-subj)

These are GNU extensions, generally equivalent to ^ and $ except that the
new syntax will not match beginning of line after the first in a multi-line
expression or the end of line before absolute last in a multi-line
expression.

6b986646: libregex: implement \b and \B (word boundary, not word boundary)

This is the last of the needed GNU expressions before we can unleash bsdgrep
by default. \b is effectively an agnostic equivalent of \< and \>, while
\B will match every space that isn't making a transition from
nonchar -> char or char -> nonchar.

4afa7dd6: libc: regex: retire internal EMPTBR ("Empty branch present")

It was realized just a little too late that this was a hack that belonged in
individual regex(3)-using applications. It was surrounded in NOTYET and not
implemented in the engine, so remove it.

4f1efa30: libc: regex: partial revert of r368358 (6b986646)

MFC NOTE: Altered to match the legacy behavior of a\bc => abc.

Part of the libregex functionality leaked into the tests it shares with
the standard regex(3). Introduce a P flag to set the REG_POSIX cflag to
indicate that libc regex should effectively do nothing while libregex should
specifically run it in non-extended mode.

This unbreaks the libc/regex test run.

(cherry picked from commit 18a1e2e9b9f109a78c5a9274e4cfb4777801b4fb)
(cherry picked from commit 61898cde69374d5a9994e2074605bc4101aff72d)
(cherry picked from commit 7518fb346fe9603f99d2406a073b30fb8e4a270c)
(cherry picked from commit ca53e5aedfebcc1b4091b68e01b2d5cae923f85e)
(cherry picked from commit 6b986646d434baa21ae3d74d6a662ad206c7ddbd)
(cherry picked from commit 4afa7dd61a3a1454a5b3cf5e6de2029c7e2d9a84)
(cherry picked from commit 4f1efa309ca48a088595dd57969ae6a397dd49d1)

Details

Provenance
kevansAuthored on Aug 4 2020, 2:14 AM
Parents
rG70233fc21258: regex(3): Interpret many escaped ordinary characters as EESCAPE
Branches
Unknown
Tags
Unknown