Page MenuHomeFreeBSD

regex: mixed sets are misidentified as singletons
ClosedPublic

Authored by yuripv on Dec 21 2023, 3:57 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 10, 11:17 AM
Unknown Object (File)
Fri, Jan 10, 10:42 AM
Unknown Object (File)
Fri, Jan 10, 6:21 AM
Unknown Object (File)
Fri, Jan 3, 1:22 AM
Unknown Object (File)
Sun, Dec 29, 10:43 PM
Unknown Object (File)
Dec 3 2024, 7:53 PM
Unknown Object (File)
Dec 1 2024, 1:38 AM
Unknown Object (File)
Nov 19 2024, 2:24 AM
Subscribers

Details

Summary

Fix by Bill Sommerfeld <sommerfeld@hamachi.org>

Commit message (and description here) copied from Bill's RTI mail.

Fix "singleton" function used by regcomp() to turn character set matches
into exact character matches if a character set has exactly one
element.

The underlying cset representation is complex; most critically it
records"small" characters (codepoint less than either 128
or 256 depending on locale) in a bit vector, and "wide" characters in
a secondary array.

Unfortunately the "singleton" function uses to identify singleton sets
treated a cset as a singleton if either the "small" or the "wide" sets
had exactly one element (it would then ignore the other set).

The easiest way to demonstrate this bug:

        $ export LANG=C.UTF-8
        $ echo 'a' | grep '[abà]'

It should match (and print "a") but instead it doesn't match because the
single accented character in the set is misinterpreted as a singleton.

Obtained from:  illumos
Test Plan

See added test case (converted).

Other libc regex tests pass.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kevans added inline comments.
lib/libc/tests/regex/multibyte.sh
68

This won't work as intended, unfortunately; now that it's in a pipe, atf_check can't fail + exit properly (it can fail, but later passes will override that)

This revision is now accepted and ready to land.Dec 21 2023, 4:33 AM
kevans requested changes to this revision.Dec 21 2023, 4:34 AM
This revision now requires changes to proceed.Dec 21 2023, 4:34 AM
yuripv added inline comments.
lib/libc/tests/regex/multibyte.sh
68

Yep, it produces cryptic "failed: Test case body returned a non-ok exit code, but this is not allowed", and looks like previous test cases in this file need to be fixed as well...

yuripv marked an inline comment as done.

Rework test case

This revision is now accepted and ready to land.Dec 21 2023, 6:20 PM