Page MenuHomeFreeBSD

git-arc: Trap on every mktemp
ClosedPublic

Authored by jlduran on Oct 25 2024, 11:21 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 21, 3:02 PM
Unknown Object (File)
Thu, Dec 12, 1:58 AM
Unknown Object (File)
Mon, Dec 9, 5:20 AM
Unknown Object (File)
Nov 23 2024, 8:09 AM
Unknown Object (File)
Nov 21 2024, 11:31 PM
Unknown Object (File)
Nov 20 2024, 11:23 PM
Unknown Object (File)
Nov 18 2024, 9:42 AM
Unknown Object (File)
Nov 15 2024, 10:25 PM
Subscribers

Details

Summary

Trap:

  • EXIT (0)
  • HUP (1)
  • INT (2)
  • QUIT (3)
  • TRAP (5)
  • USR1 (10)
  • TERM (15)

every time mktemp is called to reduce the chances of leaving stray files
or directories with possible sensitive data inside.

We avoid using a template with mktemp, as some operating systems may use
unpredictable base paths by default (macOS).

Suggested by: des

Test Plan

I'm not familiar with all the recommended traps[1]. Also I prefer to
list them by name. If the number is preferred, I'll change them without
hesitation. I changed the visibility of the local file/dir variables so
the function called by trap can catch it, again if any other approach is
preferred, just let me know.

I have updated/rebased the branch thatalso include D46789 and D46804:

https://github.com/jlduran/freebsd-src/commits/fix-git-arc-consolidated/

[1]: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_29

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

jlduran edited the test plan for this revision. (Show Details)
tools/tools/git/git-arc.sh
232

Why remove this? The variable is still used.

798

How does this work when rm_list has more than one element? It looks like it shouldn't be quoted.

Note to self: (Feature requeAdd a -T option to pass a test plan, similar to arcgit's.

tools/tools/git/git-arc.sh
232

Making all the temp files and tempdirs global, so they can be cleaned up.

798

Sorry, this is wrong... wrong diff... I'll update

Sorry, I originally had one revision where the temp file/directory was passed to the cleanup() function. The previously submitted one was a non-working hybrid.

With this approach, the locals for tmp, msg, and tmp_data_dir, are removed, effectively making these variables globals, so they can be evaluated by cleanup() upon trap/exit.

I think this is more readable, but you'll know better.

I'm also not sure about the name SIGUSR1 (is it 10?) .

  • Minor iteration

And a more detailed explanation (I hope):
It was suggested in D46767 to use a trap to remove the temporary files created with mktemp. This is almost always considered a good practice, as it reduces the chances of leaving stray files, which may contain sensitive information.
The suggested line was:

trap 'rc=$?; rm -f "$review_data" "$user_data" "$diff_data" "$tmp"; trap - 0; exit $rc' 0 1 2 3 5 10 15

Meaning that whenever a signal of EXIT HUP INT QUIT TRAP USR1 TERM (0 1 2 3 5 10 15), without the SIG prefix is triggered, the temporary file is removed. The value of $? after the trap action completes shall be the value it had before the trap action was executed. Finally, if the action is -, the shell shall reset each condition to the default value.
It was also suggested to store {review,user,dif,tmp}_data under a directory, in order to just:

trap 'rc=$?; rm -fr "$tmp_data_dir"; trap - 0; exit $rc' 0 1 2 3 5 10 15

The code was changed to use a directory.
There are also other places in this file where mktemp is used—there's even one function where the temporary file is not being removed ($tmp under gitarc__stage())—so a global cleanup() makes sense. However, depending on where the trap is triggered, the local variable containing the temporary file or directory may not exist (i.e., outside the function). It is for this reason that these temporary variables must not be local variables, that is why I removed them from their respective local declarations.

Thank you!

des requested changes to this revision.Oct 28 2024, 2:43 PM
des added inline comments.
tools/tools/git/git-arc.sh
797

Only one of these will be non-empty. You should use just one variable instead (I suggest tmp) which is either a file or a directory.

This revision now requires changes to proceed.Oct 28 2024, 2:43 PM

lgtm but I would suggest waiting for @markj to approve as well

This revision is now accepted and ready to land.Oct 29 2024, 5:42 PM
In D47289#1079383, @des wrote:

lgtm but I would suggest waiting for @markj to approve as well

Absolutely, and thank you for reviewing!

tools/tools/git/git-arc.sh
239–240

Please use TMP or similar as the name instead, now that this is a global variable.

I don't think there are any such bugs in this patch, but now it'll be easy to accidentally clobber TMP by calling a subroutine which also sets TMP, so I'm not a big fan of this approach.

jlduran marked 2 inline comments as done.

This iteration attempts to reach a middle ground between all valid suggestions.

Changes:

  • Use an uppercased global variable which holds a temporary directory that serves as the base for all temporary files. It will be cleaned up upon any of the listed traps. It should be clear that this is the pattern to follow in the event of an addition of a subroutine that makes use of temporary files
  • Incorporate the suggestion to prefer echo over printf (this is actually on a separate commit, just added here to avoid noise with a separate review for such a simple change)
This revision now requires review to proceed.Oct 30 2024, 4:30 PM

I think this is ok. You could still give each tmpfile a unique name by using something like

xmktemp()
{
    mktemp ${GITARC_TMPDIR}/tmp.XXXXXX`
}
This revision is now accepted and ready to land.Oct 31 2024, 1:31 PM

You could still give each tmpfile a unique name

IMO with a unique top level directory it's more convenient to use consistent subdirectory names.

emaste added inline comments.
tools/tools/git/git-arc.sh
378

Not

378

Oops, comment added by accident.

If I understood both suggestions correctly, the issue is that tmp is used in gitarc__{list,patch,stage}, and shadowing of the variable (tmp is a local variable)/overwriting of the file contents could accidentally occur in a future refactor of this script.
Suggestions:

  1. One suggestions is to randomize the temporary file every time with the use of a subroutine xmktemp which nests the random file name under an already randomized GITARC_TMPDIR.
  2. A consistent file naming scheme is desired, so each tmp file name would have to be changed to GITARC_TMPDIR/tmp_{list,patch,stage}, for example.

Both are valid, IMO I would also prefer option 2.
However, it is possible that in the future, someone™ extends this script, and by the virtues of copy/pasting, this nomenclature is lost or not easily followed. Given that this someone could very well be me, let's avoid a possible foot-shooting and implement 1[^1].

[^1]: A more intricate solution would try to accommodate 2 as well by having xmktemp optionally accept a base name for the template, which can be implemented if explicitly requested.

This revision now requires review to proceed.Oct 31 2024, 7:35 PM
jlduran added inline comments.
tools/tools/git/git-arc.sh
378

No problem. This change will be committed separately.

This revision is now accepted and ready to land.Nov 2 2024, 3:35 PM

LGTM.

I've left a question in a comment but it's not blocking. I'm just curious :)

tools/tools/git/git-arc.sh
884

I'm just curious: shouldn't the trap be set up first before invoking mktemp?

jlduran added inline comments.
tools/tools/git/git-arc.sh
884

I've seen that practice you suggest (trap-before-mkdir, or similar) in other places[^1], however in those places the name of the artifact to-be-removed is previously known.
Thinking about it, it would not be an issue here either, because we are using rm -fr which won't fail if GITARC_TMPDIR is empty.
I'll soon commit it as is, but feel free to change it if you prefer it the other way around.

[^1]: tools/tools/nanobsd/Files/root/save_cfg is one that comes to mind (used before mount).

This revision was automatically updated to reflect the committed changes.
tools/tools/git/git-arc.sh
884

If you install the trap first and mktemp fails and the script terminates because of -e, the trap fires pointlessly. In the other direction, the only thing that can terminate the script after mktemp succeeds but before the trap is in place is an improbably (un)lucky signal.

tools/tools/git/git-arc.sh
884

Thanks! Based on what you wrote, my intuition would be that firing a pointless trap just in case is better for the correctness of the program. Am I wrong?

tools/tools/git/git-arc.sh
884

I would rather have a stray empty directory on my disk than run rm -rf "$tmp" without being certain what $tmp is.