Page MenuHomeFreeBSD

libc: Add `strverscmp(3)` & `versionsort(3)`
ClosedPublic

Authored by obiwac_gmail.com on Jul 13 2022, 8:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 27, 6:34 PM
Unknown Object (File)
Tue, Dec 24, 3:07 PM
Unknown Object (File)
Sun, Dec 22, 9:00 PM
Unknown Object (File)
Sat, Dec 21, 1:25 PM
Unknown Object (File)
Sun, Dec 15, 2:21 AM
Unknown Object (File)
Dec 11 2024, 8:49 AM
Unknown Object (File)
Dec 10 2024, 5:25 AM
Unknown Object (File)
Dec 1 2024, 7:02 PM

Details

Summary

Add a strverscmp(3) function to libc, a GNU extension I implemented by reading its glibc manual page.
It orders strings following a much more natural ordering (e.g. "ent1 < ent2 < ent10" as opposed to "ent1 < ent10 < ent2" with strcmp(3)'s lexicographic ordering).

Also add versionsort(3) for use as scandir(3)'s compar argument.

Update manual page for scandir(3) and add one for strverscmp(3).

Add tests for strverscmp(3).

Original diff: D35783

Test Plan

Tests pass on my side.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
pauamma_gundo.com added inline comments.
lib/libc/gen/scandir.3
116–117

"naturally" doesn't explain anything to me.

lib/libc/string/strverscmp.3
39
42

No contractions.

lib/libc/string/strverscmp.c
27

Does the original use isdigit or isnumber?

This revision now requires changes to proceed.Jul 15 2022, 7:52 AM
lib/libc/gen/scandir.3
116–117

Indeed "naturally" isn't very descriptive. I'm not sure I understand

in a combined numeric (for base-10 digit sequences) and text (for other characters) using strverscmp(3)

though.

lib/libc/string/strverscmp.3
42

Alright

lib/libc/string/strverscmp.c
27

It uses isdigit. strverscmp is analog to strcmp and not strcoll, so it wouldn't make much sense to use isnumber I believe.

lib/libc/gen/scandir.3
116–117

But "natural sort order" is an established term!

Remove contraction in manual page

lib/libc/gen/scandir.3
116–117

If it is, I never got the memo.

116–117

I was trying to convey in a single sentence that fragments made of digits are sorted numerically (with the extra wrinkle for those starting with 0) and fragments made of text without digits are sorted as strings, with the earliest differing fragment pair determining the overall order.

And I just thought of something; given 2 strings foo123456 and foobar789, are the first fragment pairs compared:

  • "foo" and "foobar" (string comparison), with 123456 and 789 never compared

or

  • "foo123" and "foobar" (string comparison), with 456 and 789 never compared?

This may matter depending on the relative ordering of letters and digits, eg EBCDIC or for future-proofing if this is changed in 10 years to accept all Unicode digits.

lib/libc/string/strverscmp.c
24

Style: test non-booleans against values. In this case: *u1 != '\0' etc.

78

If I'm not missing something, there is a possible optimization here: digit_count_1 is always equal to u1 - num_1, so there should be no need for the variable and the instruction to increment it. Same applies to digit_count_2.

lib/libc/gen/scandir.3
116–117

And I just thought of something; given 2 strings foo123456 and foobar789, are the first fragment pairs compared

Both strings are compared byte-by-byte, so it would go up to "foo1" and "foob" before returning.

This may matter depending on the relative ordering of letters and digits, eg EBCDIC or for future-proofing if this is changed in 10 years to accept all Unicode digits.

Again, strverscmp is analog to strcmp, not strcoll, so I don't think that matters.

lib/libc/string/strverscmp.c
24

Got it

78

You're not missing anything. Although I agree it's more readable without the extra variables, looking at the output of both GCC & Clang, the unnecessary increment is optimized away anyway. Modern compilers are wonderful aren't they 😄

This also applies to zero_count_#; I'll change that shortly.

Simplify digit counting code & style

I accept this revision. I don't mind declarations after statements, but others might.

Hey guys, is there anything here that I still need to do?

Hey guys, is there anything here that I still need to do?

My remaining 2 concerns/questions;

Hey guys, is there anything here that I still need to do?

No, it just needs someone with a commit bit to do the commit. I've been busy recently.

In D35807#816089, @pauamma wrote:

IMHO, at that point, the user should just read the strverscmp(3) manpage.
Explaining what strverscmp(3) does in scandir(3) in a precise manner would clutter things up and give quite a bit of redundant information I feel.
And since "natural sort order" is indeed an established term as @pstef mentioned, I personally think it's better as it currently is.

No, it just needs someone with a commit bit to do the commit. I've been busy recently.

Okay, sorry, no pressure. It's my first time using Phabricator - was just wondering if I had missed something!

gbe added a subscriber: gbe.

LGTM for the man page changes.

Hey guys, is there anything here that I still need to do?

While @pstef is busy, I suggest you can have a post on -hackers mailing list to call for more reviews and tests.

Hey guys, is there anything here that I still need to do?

Sorry for the delay. In the meantime style(9) was clarified to forbid declarations following statements, the rule that this patch breaks.
Also the commit message should be in the imperative form: "add", "update" instead of "adds", "updates".
I can fix both of those - or do you want to do it yourself and update this review?

I can fix both of those - or do you want to do it yourself and update this review?

I can do it myself this evening, if that doesn't slow you down for anything else. (Aswell as making that post on -hackers that @lwhsu mentioned, if that's still necessary.)

I can do it myself this evening, if that doesn't slow you down for anything else. (Aswell as making that post on -hackers that @lwhsu mentioned, if that's still necessary.)

Please go ahead.
As for the post on -hackers, I don't think it ever was necessary, but it's not a bad idea either.

  • strverscmp: Respect new rules in style(9)

Please go ahead.

Done :) (All should be okay, but I still need to check if everything compiles 100%.)

Writing post on -hackers...

strverscmp: Respect new rules in style(9)

kib added inline comments.
lib/libc/gen/scandir.c
198

return (...);

lib/libc/string/strverscmp.c
15

Extra blank line

17

Extra blank line

17

Extra blank line

18

No space needed after ). But there must be a space between void and *.

Why not use __DECONST() instead of explicit void cast?

22

s1 == s2 is not aliasing, it is identity

25

Extra blank line

27

return (0);

48

This is a block comment which scope, as I see, extends to the end of u2 loop. Therefore, there is no need for a blank lines 51 and 54.

77

Extra blank line, same comment as above.

83

';' should be on the new line, indented

96

Extra blank line

I'll make those changes a little later, thank you :)

lib/libc/string/strverscmp.c
18

The void cast is simply to cast to unsigned char without having to explicitly type out the type name. I'm not sure I see why __DECONST is apt here?

Where can I find the rules for vertical spacing? I can't find it explicitly in style(9)

Where can I find the rules for vertical spacing? I can't find it explicitly in style(9)

Do you mean indentation? style(9) says "Indentation is an 8 character tab. Second level indents are four spaces." and where the rule isn't stated explicitly, it is shown by example.

Do you mean indentation?

Nah I meant vertical spacing as in where I may or may not add blank lines, as @kib was talking about, but thank you :)

Do you mean indentation?

Nah I meant vertical spacing as in where I may or may not add blank lines, as @kib was talking about, but thank you :)

Yes, it is implicit in examples, and in the existing code.
Not to mention that excess blank lines stole the screen space in editor.

lib/libc/string/strverscmp.c
18

__DECONST is somewhat unfortunate name for the macro. De-facto contract for it is to provide the type cast without triggering any warnings from any possible compiler warning/error flags

DECONST/DEVOLATILE/__DEQUALIFY are about the same. We guarantee to maintain them regardless of the insanity level compiler authors go for.

Replace casts with __DECONST & remove some blank lines

Does spacing like this work for you @kib? Hopefully I'll get the feeling for where blank spaces are/aren't appropriate in FreeBSD code in future patches 😄

lib/libc/string/strverscmp.c
16

Did you tried to compile this?

lib/libc/tests/string/strverscmp_test.c
72

Indent should be +4 spaces for the continuation line.

I still haven't tried compiling anything yet; I don't have access to a FreeBSD machine for the moment.

Stupid mistake in usage of __DECONST

scandir: Add man link from versionsort.3

Could you please send me the complete patch in the form of git format-patch by mail? With the author set, proper commit message etc. This would expedite the commit.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 25 2022, 12:29 AM
This revision was automatically updated to reflect the committed changes.