Page MenuHomeFreeBSD

Mk/Scripts/do-users-groups.sh: Make users and groups creation fail-fast
ClosedPublic

Authored by michaelo on Nov 22 2023, 11:34 AM.
Tags
None
Referenced Files
F108493005: D42719.id137030.diff
Sat, Jan 25, 1:23 PM
Unknown Object (File)
Thu, Jan 16, 1:00 AM
Unknown Object (File)
Sat, Jan 11, 3:55 PM
Unknown Object (File)
Nov 27 2024, 7:56 AM
Unknown Object (File)
Nov 13 2024, 12:31 AM
Unknown Object (File)
Oct 24 2024, 12:31 PM
Unknown Object (File)
Oct 24 2024, 12:30 PM
Unknown Object (File)
Oct 23 2024, 11:32 AM
Subscribers

Details

Summary

Fail fast when pw(8) fails to create a user or a group. Especially when it is
not the last command in the pre-install script then the exit code will be 0 and
the failure will go unnoticed.

Also trim/cleanup message endings.

PR: 267384
Approved by: jrm (mentor), otis (mentor), mat, mandree, portmgr

Diff Detail

Repository
R11 FreeBSD ports repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 57103
Build 53991: arc lint + arc unit

Event Timeline

michaelo created this revision.

Would be nice to get a review here as well.

Does this (set -e) have any consequences on existing flow?

In D42719#978591, @otis wrote:

Does this (set -e) have any consequences on existing flow?

I'd say yes. I will fail immediately (fail-fast) when a command exits with non-zero. That's the point. Better be fail than be sorry. My initial problem was a bad NIS setup which I cannot fix at the moment, but the currect create script did not fail, thus left me with a halfbroken installation. I prefer no installation instead of a broken one. I think this should also be on 14 and 13 since it will improve realibility and will affect only a few users.

In D42719#978591, @otis wrote:

Does this (set -e) have any consequences on existing flow?

I'd say yes. I will fail immediately (fail-fast) when a command exits with non-zero. That's the point. Better be fail than be sorry. My initial problem was a bad NIS setup which I cannot fix at the moment, but the currect create script did not fail, thus left me with a halfbroken installation. I prefer no installation instead of a broken one. I think this should also be on 14 and 13 since it will improve realibility and will affect only a few users.

I meant, does this have any consequences on existing flow in case without fail? If not, then LGTM.

In D42719#978626, @otis wrote:
In D42719#978591, @otis wrote:

Does this (set -e) have any consequences on existing flow?

I'd say yes. I will fail immediately (fail-fast) when a command exits with non-zero. That's the point. Better be fail than be sorry. My initial problem was a bad NIS setup which I cannot fix at the moment, but the currect create script did not fail, thus left me with a halfbroken installation. I prefer no installation instead of a broken one. I think this should also be on 14 and 13 since it will improve realibility and will affect only a few users.

I meant, does this have any consequences on existing flow in case without fail? If not, then LGTM.

No consequences if exit code for every single command is zero.

In D42719#978626, @otis wrote:
In D42719#978591, @otis wrote:

Does this (set -e) have any consequences on existing flow?

I'd say yes. I will fail immediately (fail-fast) when a command exits with non-zero. That's the point. Better be fail than be sorry. My initial problem was a bad NIS setup which I cannot fix at the moment, but the currect create script did not fail, thus left me with a halfbroken installation. I prefer no installation instead of a broken one. I think this should also be on 14 and 13 since it will improve realibility and will affect only a few users.

I meant, does this have any consequences on existing flow in case without fail? If not, then LGTM.

Please formally approve if this is OK. will give other some more time to review.

rene added inline comments.
Mk/Scripts/do-users-groups.sh
79

As an alternative to the set -e use \${PW} ... || error "oops" here

Mk/Scripts/do-users-groups.sh
79

True, but I have done that intentionally not to repeat the spot in the code and it was a nice tip from @bapt.

I lean towrads @rene's solution

Mk/Scripts/do-users-groups.sh
79

FWIW, over half of the shell scripts under Mk/Scripts/ set this option.

% grep -l 'set -e' /usr/ports/Mk/Scripts/*.sh | wc -l
      16
ls -l /usr/ports/Mk/Scripts/*.sh | wc -l
      25

despite the warning in sh(1)

-e errexit
        Exit immediately if any untested command fails in non-interactive
        mode.  The exit status of a command is considered to be
        explicitly tested if the command is part of the list used to
        control an if, elif, while, or until; if the command is the left
        hand operand of an “&&” or “||” operator; or if the command is a
        pipeline preceded by the ! keyword.  If a shell function is
        executed and its exit status is explicitly tested, all commands
        of the function are considered to be tested as well.

        It is recommended to check for failures explicitly instead of
        relying on -e because it tends to behave in unexpected ways,
        particularly in larger scripts.

Given that warning, it sounds like an explicit check, as @rene suggests, is preferred. How about both setting the option and adding an explicit check?

I'll pick this up again next year.

Mk/Scripts/do-users-groups.sh
79

error cannot be used in the generated script because it is a custom function on this shell script and uses import variables. One needs to duplicate all of that code. I thinkt hat set -e is a good compromise here. OR we use || exit $?

Mk/Scripts/do-users-groups.sh
79

@rene, @jrm what do you think about this?

Mk/Scripts/do-users-groups.sh
79

I think || exit $? is safer because it's specifically targeting the problem you want to address. However, if you are prepared to deal with any fallout from set -e, and someone from portmgr@ approves that change, it seems reasonable for the reasons I mentioned above.

Mk/Scripts/do-users-groups.sh
79

Makes sense, let's ping @portmgr to see what they think.

Rebased and tested again with poudriere-built packages:

root@132-release-amd64-default-head:/usr/ports/devel/git # grep 964 /etc/passwd /etc/group
/etc/passwd:foo:*:964:964:foo daemon:/nonexistent:/usr/sbin/nologin
/etc/group:foo:*:964:

Before:

root@132-release-amd64-default-head:/tmp/pkgs # pkg add git-2.44.0.pkg
[132-release-amd64-default-head] Installing git-2.44.0...
===> Creating groups
Creating group 'git_daemon' with gid '964'
pw: gid `964' has already been allocated
===> Creating users
Creating user 'git_daemon' with uid '964'
pw: uid `964' has already been allocated
pkg: PRE-INSTALL script failed

After:

root@132-release-amd64-default-head:/tmp/pkgs # pkg add git-2.44.0.pkg
[132-release-amd64-default-head] Installing git-2.44.0...
===> Creating groups
Creating group 'git_daemon' with gid '964'
pw: gid `964' has already been allocated
pkg: PRE-INSTALL script failed

with:

$ ypcat passwd | grep 964
krenek_k:XXX:964:121:krenek,,,:/net/home/krenek_k:/bin/sh.user

If the || exit $? version is preferred I will try it, no issue.

Tried || exit $? locally, same result:

[132-release-amd64-default-head] Installing git-2.44.0...
===> Creating groups
Creating group 'git_daemon' with gid '964'
pw: gid `964' has already been allocated
pkg: PRE-INSTALL script failed

Failed to install the following 1 package(s): git-2.44.0.pkg
patch
diff --git a/Mk/Scripts/do-users-groups.sh b/Mk/Scripts/do-users-groups.sh
index eb5e43681e45..fdaad401de72 100644
--- a/Mk/Scripts/do-users-groups.sh
+++ b/Mk/Scripts/do-users-groups.sh
@@ -49,2 +48,0 @@ cp -f "${dp_UG_INSTALL}" "${dp_UG_DEINSTALL}"
-echo "set -e" >> "${dp_UG_INSTALL}"
-
@@ -81 +79 @@ if [ -n "${GROUPS}" ]; then
-             \${PW} groupadd $group -g $gid
+             \${PW} groupadd $group -g $gid || exit \$?
@@ -134 +132 @@ if [ -n "${USERS}" ]; then
-             \${PW} useradd $login -u $uid -g $gid $class -c "$gecos" -d $homedir -s $shell
+             \${PW} useradd $login -u $uid -g $gid $class -c "$gecos" -d $homedir -s $shell || exit \$?
@@ -190 +188 @@ if [ -n "${GROUPS}" ]; then
-                         \${PW} groupmod ${group} -m ${login}
+                         \${PW} groupmod ${group} -m ${login} || exit \$?

Which one do you prefer?

@michaelo, thanks for sticking with this review. My preference is still to go with the || exit \$? with or without the set -e.

In D42719#1020644, @jrm wrote:

@michaelo, thanks for sticking with this review. My preference is still to go with the || exit \$? with or without the set -e.

@jrm I have not understood the motivation to avoid set -e. My standard is to "set -eu" to trap maintainer oversights, which happen quickly. And possibly even "set -o pipefail" if I know the shell supports it (if bash has been ensured some way).
My clear preference is set -e and stop polluting the code with || exit and what not. You want robust scripts? Then you want error checking EVERYWHERE. I abhor scripts that run past errors and pretend they had completed and possibly even leave exit code 0...

add one inline comment

Mk/Scripts/do-users-groups.sh
79

It is recommended to check for failures explicitly instead of relying on -e because it tends to behave in unexpected ways, particularly in larger scripts.

Looks like spreading Fear, Uncertainty, Doubt to me. There is no clear, objective, factual statement as to what happens. Seems this is from someone who got hurt just slapping set -e on a grown collection of the script and generalized that anectodatal evidence.

I see https://cgit.freebsd.org/src/commit/bin/sh/sh.1?id=b14cfdf665bb8b7b2898a4ee5b073ab87f8ea3d0 that brought these changes one decade ago, but unfortunately, the commit log does not contain proper evidence, PR references or anything. IMO this paragraph should be ditched.

In D42719#1020644, @jrm wrote:

@michaelo, thanks for sticking with this review. My preference is still to go with the || exit \$? with or without the set -e.

@jrm I have not understood the motivation to avoid set -e. My standard is to "set -eu" to trap maintainer oversights, which happen quickly. And possibly even "set -o pipefail" if I know the shell supports it (if bash has been ensured some way).
My clear preference is set -e and stop polluting the code with || exit and what not. You want robust scripts? Then you want error checking EVERYWHERE. I abhor scripts that run past errors and pretend they had completed and possibly even leave exit code 0...

@mandree, as I say in my earlier comments, I'm not against set -e. However, to solve the particular problem that Michael is trying to address, I feel an explicit check is warranted rather than simply relying on set -e. Also, when set -e has been set, there has been fallout, but as I said above, if @michaleo is prepared to deal with any potential fallout, then it's fine to *also* add set -e.

To elaborate, I felt that specifically checking for the issue at hand is unlikely to introduce any new problems, but adding set -e, which, of course, affects the whole script is more likely. That said, since @michael is prepared to take on any new problems, I'm fine either way.

In D42719#1021272, @jrm wrote:

To elaborate, I felt that specifically checking for the issue at hand is unlikely to introduce any new problems, but adding set -e, which, of course, affects the whole script is more likely. That said, since @michael is prepared to take on any new problems, I'm fine either way.

As fallout only legit ones and not those ones I try to find/fix like ours here: assiging uids to humans < 1000. That is the reason why I have noticed and took upon this.

So which route should I go @mandree? set or exit?

Since I haven't had my hands on the script, nor am I volunteering to become its maintainer, I am not going to claim reponsibility for it. I think my stance is clear, "automatically fail everywhere" and handle those few expected errors, and let's remove @jilles's manpage addition (that scares users away from set -e in the sh(1) man page) because it's not substantiated. manpages aren't the place to scare people without giving reasons. We're early enough after a freshly forked quarterly release to clean up fallout over the next ten weeks, so... perhaps that fact makes your decision easier.

This revision is now accepted and ready to land.May 2 2024, 8:36 PM

So I merged now the explicit exits. We have two months until next quarterly for failures. As @mandree said, enough time. I will come up with a review for the manpage as well.