Page MenuHomeFreeBSD

ls: `-v` flag (and `strverscmp(3)` & `versionsort(3)`)
AbandonedPublic

Authored by obiwac_gmail.com on Jul 12 2022, 2:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 17, 1:07 PM
Unknown Object (File)
Mon, Dec 30, 10:22 AM
Unknown Object (File)
Dec 7 2024, 9:02 PM
Unknown Object (File)
Nov 21 2024, 12:31 PM
Unknown Object (File)
Oct 16 2024, 8:04 PM
Unknown Object (File)
Oct 11 2024, 1:13 AM
Unknown Object (File)
Oct 10 2024, 10:13 PM
Unknown Object (File)
Sep 24 2024, 9:25 AM

Details

Reviewers
pauamma_gundo.com
Group Reviewers
manpages
Summary

Adds a -v flag for ls(1), which sorts entries following a more natural ordering, rather than lexicographically ordering as it usually does (e.g. "ent1 ent2 ent10" as opposed to "ent1 ent10 ent2").

This is done through a strverscmp(3) function in libc, a GNU extension which I implemented by reading the glibc manual page for it.
I took the opportunity to also add a versionsort(3) function for scandir(3).

Manual pages updated for ls(1) & scandir(3), and adds a new manual page for strverscmp(3).

Adds tests for ls -v and strverscmp.

Test Plan

Tested through the tests I wrote, and, AFAICT, everything is working as it should.
A very thorough review is probably required, as I'm not too sure to have done everything the most "correct" way, but as it stands I don't think I missed anything.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 46381
Build 43270: arc lint + arc unit

Event Timeline

I think this is a good idea. I know that strverscmp(3) is learnt from gnu extension, but since this is on FreeBSD, maybe we need to check its behavior is compatible with -t in pkg-version(8)? I think the related code is at https://github.com/freebsd/pkg/blob/master/src/version.c

lib/libc/string/strverscmp.3
13

I think "REGENTS" is not suitable here, not most place uses "AUTHOR", you might want to check the copyright text like what in src/COPYRIGHT.

sys/sys/exec.h
94

I think those are sneaked changes?

Looks interesting. However, personally I would like to see these two ideas (the new libc function and the new ls(1) option) have their separate reviews and ultimately commits. See also my inline comments.

lib/libc/string/strverscmp.c
38

Please see man 9 style how asterisks are supposed to be placed.

39

I don't think we allow BCPL-style comments yet.

lib/libc/tests/string/strverscmp_test.c
5

We're in the process of removing "All rights reserved", please don't add new ones.

since this is on FreeBSD, maybe we need to check its behavior is the compatible with -t in pkg-version(8)?

pkg version -t a b and strverscmp(a, b) look like they work very differently. For reference, it uses pkg_version_cmp, which can be found here.

I would like to see these two ideas (the new libc function and the new ls(1) option) have their separate reviews and ultimately commits.

Alright, should I also separate strverscmp from versionsort while I'm at it? I'll create the two new diffs in a little moment :)

lib/libc/string/strverscmp.3
13

Ah, right, I blindly copied the license text from strcmp, that's my bad.

lib/libc/tests/string/strverscmp_test.c
5

Sure. It may be helpful to update style(9) to say so then, perhaps?

sys/sys/exec.h
94

My editor automatically strips trailing whitespaces and I didn't think was super necessary to remove that from the diff :)

I would like to see these two ideas (the new libc function and the new ls(1) option) have their separate reviews and ultimately commits.

Alright, should I also separate strverscmp from versionsort while I'm at it? I'll create the two new diffs in a little moment :)

I don't think so, personally I think just two reviews would be perfect for this diff.

I implemented something similar to strverscmp() several years back, but in a much simpler (though potentially slower?) way, by leveraging strtol():

int
strnumcmp(
    const char *s1,
    const char *s2)
{
    unsigned int n1 = 0;
    unsigned int n2 = 0;

    while (*s1 == *s2++) {
        if (*s1++ == 0) {
            return 0;
        }
        if (isdigit(*s1) && isdigit(*s2)) {
            /* We have found number strings, so convert the strings to
             * numbers and compare numerically.
             */
            n1 = (unsigned int) strtol(s1, NULL, 10);
            n2 = (unsigned int) strtol(s2, NULL, 10);
            if (n1 != n2)
                return (int) (n1 - n2);
        }
    }

    return (*s1 - *(s2 - 1));
}

The while, the test against 0, and the final return are all cribbed directly from strcmp(); things only get different if we start seeing digits; in that case, we compare the number composed of those digits; if the digits are the same, then we continue on with normal character comparison again.

Just throwing that out there for consideration.

I don't think yours is slower per se, but without testing I do think the behaviour is slightly different for fractional numbers. There's also the issue of strtol potentially overflowing, but that could easily be remedied.

I can't check if it passes the tests until tomorrow :)

So despite the number of nits I commented on, I think this may be close. It has tests, it seems to implement the right ordering, the code seems good on the whole apart from the style nits identified.

lib/libc/string/strverscmp.c
4

Is this a company or a person? In any event, it should end with a '.'

5

You don't need this.

26

Since this is boilerplate, I'd consider just using the SPDX identifier with a copyright.

29–34

Delete all of this. It's not needed. strverscmp.c never was in the CSRG SCCS tree and we don't put $FreeBSD$ in new code.

39

The preferred style is /* .. */ still.

82

We put the } and else on the same line.
I'd also move the comment to inside the if (and as with all the C++ style comments, change them to /* */

91

These should have the n1++ and n2++ on a separate line.

98

Make this a block comment.

101

comma operator is confusing and buys nothing, imho. Just do two assignments.

109

ditto.

lib/libc/tests/string/strverscmp_test.c
5

So stlye(9) doesn't have any all rights reserved in it today though... It's by example and never uses it as an example.
Also, I'd just use the SPDX identifier with a copyright.... It's a policy that's being finalized now.
https://www.freebsd.org/internal/software-license/ spells out the All Rights Reserved thing.

35

{ on a separate line

sys/sys/exec.h
94

Yea, but you didn't change anything else in exec.h, so please drop it from future diffs :)

Thanks for all the feedback!

lib/libc/string/strverscmp.c
4

Just my username. Is it a problem like this?

lib/libc/string/strverscmp.c
4

The project has a strong preference for this to be either an actual person, or a company.
If that's not possible, the preference is to use a well-established pseudonym that has had a longer term presence in the community.
Just a username from a gmail account likely would meet resistance from core@ who would need to grant an exception.

pauamma_gundo.com added inline comments.
bin/ls/ls.1
402–407

I'd explain here instead, perhaps with examples. Sending the reader to library manual pages they may not be able to puzzle out easily or at all will only discourage them.

lib/libc/gen/scandir.3
129

I'd mention here that strverscmp came in 14.0 (

lib/libc/string/strverscmp.3
63

Am I missing something? Lexicographic order would be "10", "420", "9", so different but not the opposite.

67

Is that true even without an intervening decimal point?

This revision now requires changes to proceed.Jul 13 2022, 10:14 PM

Thank you for the review :)

As per pstef's feedback though, I separated out the libc stuff in a separate diff you might wanna see: D35807

bin/ls/ls.1
402–407

Gotcha, that makes sense

lib/libc/string/strverscmp.3
63

You're not missing anything, mistake on my end

67

Yup. It reflects the behaviour of the glibc strverscmp. I'll specify things here a little more explicitly in the next revision perhaps :)