Details

Reviewers

bapt
markj
jhb

Commits

rG709783373e57: Fix another race between fork(2) and PROC_REAP_KILL subtree
rG39794d80ad90: Fix a race between fork(2) and PROC_REAP_KILL subtree
rGd1df34736888: kern_procctl: add possibility to take stop_all_proc_block() around exec
rG2e7595ef2f53: Add stop_all_proc_block(9)
rG54a11adbd9c2: reap_kill(): split children and subtree killers into helpers
rGe4ce431e2a02: reap_kill(): de-inline LIST_FOREACH(), twice
rG134529b11b53: reap_kill(): rename the reap variable to reaper
rGb9294a3e152b: reaper_abandon_children(): upgrade proctree_lock assert to exclusive
rGe59b940dcb45: unr(9): allow to avoid internal locking
rGc4be460e84b4: init_unrhdr(): make it usable by initializing everything

Summary

by repeating iteration over the subtree until there are no new processes to signal.

Another race between fork(2) and PROC_REAP_KILL subtree is where we might not yet see a new child when signalling a process. Ensure that this cannot happen by stopping all reapping subtree, which ensures that the child is not inside a syscall, in particular fork(2).

Reported and tested by: pho

Also there:

unr(9): allow to avoid internal locking

Add stop_all_proc_blocker(9)

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

kib created this revision.Apr 20 2022, 11:01 PM

Herald added a subscriber: imp. · View Herald TranscriptApr 20 2022, 11:01 PM

Per-commit view is available at https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=shortlog;h=refs/heads/reap

Stop the victim while signalling.

Add stop_all_proc_blocker.

kib edited the summary of this revision. (Show Details)Apr 24 2022, 1:32 PM

markj added inline comments.Apr 25 2022, 2:35 PM

sys/kern/kern_proc.c
3408	Why can't it simply be another sx lock? Does the allproc lock provide some interlocking here, or is it only used to synchronize updates to `stop_all_proc_blocker`?
sys/kern/kern_procctl.c
266
269	Can we derive `need_stop` when needed, instead of passing it around? need_stop = (rk->rk_flags & REAPER_KILL_SUBTREE) != 0
274	Assert that the proc is held too? I think this is required for the relocking dance below to be safe when thread_single(SINGLE_ALLPROC) fails. Also perhaps assert that the proc is not exiting.
335	Can it be an STAILQ?
357	LIST_FOREACH?
358
383	LIST_FOREACH?
393	Looks like we don't need `s` and can just write if (!reap_kill_proc(td, p2, true, ksi, rk, error)) free_unr(pids, p2->p_pid);
403	I'd suggest `s/reap/reaper/` for the variable name, here and above. `reap` sounds like the name of a boolean variable to me.
419	Extra (harmless?) clear_unrhdr() call.

kib marked 11 inline comments as done.Apr 25 2022, 10:31 PM

kib added inline comments.

sys/kern/kern_proc.c
3408	The allproc_lock only syncs updates. In fact I considered making stop_all_proc_blocker a char, but then convinced myself that 1 byte vs 8 bytes is not too much, while right now owning assert makes sense. sizeof(struct sx) == 32 which is more above the threshold for just an anti-foot shooting measure. If you strongly prefer sx, it can be used there, a new one of course.
sys/kern/kern_procctl.c
335	In the code structure as it is ATM, it seems yes it can. But a minimal modifications of the walk algorithm would require returning back-link, and the saving there is quite small: the structures only live during syscall execution. I can change, but I do not see an advantage. Say if you see it important for the review.
419	In fact it is clean_unrhdr() that is extra. 'clean' removes postponed-free blocks, while 'clear' does the finalization, while assumes that postponed-free blocks are already handled. In this use, clean_unrhdr() can be removed because code does not use postponed deallocation. But I kept it to be completely correct. May be clear_unrhdr() should call clean_unrhdr(), but this is a bigger project. See 09828ba947eb87c5ae5e852 for more details.

Handle Mark' comments.

Fix condition for need_stop: it should check that REAPER_KILL_CHILDREN is not set. REAPER_KILL_SUBTREE is legitimately unset when all descendants of the reaper must be killed.

Reported by: pho

Seems ok to me. One general comment: we have to stop the target process because we have no mechanism to synchronize with a forking process. For example, a per-proc sx lock which is held for the duration of fork, so other threads can block until the target is finished forking. Or some blockcount-like barrier. I think I added such a thing for Isilon a long time ago to fix a problem with dtrace, IMO it would be useful here too. Do you agree?

sys/kern/kern_proc.c
3408	If you consider 32 bytes vs. 8 bytes plus extra .text for the implementation, I'm not sure there's much of a difference. I was curious so I tried it. With static struct sx stopproc_lock; void stop_all_proc_block(void) { sx_xlock(&stopproc_lock); } void stop_all_proc_unblock(void) { sx_xunlock(&stopproc_lock); } and an sx_init() call in procinit(), a GENERIC-NODEBUG kernel is 8 bytes smaller than with the current implementation. I don't insist either way, I just prefer mechanisms with witness etc. hooks instead of hand-rolled wait channels.
sys/kern/kern_procctl.c
357	Hmm, should it be LIST_FOREACH_SAFE, in fact? What prevents p2 from exiting before the current thread proceeds to the next iteration?
419	I see, thanks. Seems fine to keep it as it is.
431	I noticed that reaper_abandon_children() asserts SX_LOCKED, i.e., not SX_XLOCKED. But it cannot safely execute in parallel with reap_kill_subtree(). So that assertion should be upgraded to XLOCKED, it is already true in practice.

This revision is now accepted and ready to land.Apr 26 2022, 2:07 PM

In D35014#794658, @markj wrote:

Seems ok to me. One general comment: we have to stop the target process because we have no mechanism to synchronize with a forking process. For example, a per-proc sx lock which is held for the duration of fork, so other threads can block until the target is finished forking. Or some blockcount-like barrier. I think I added such a thing for Isilon a long time ago to fix a problem with dtrace, IMO it would be useful here too. Do you agree?

Suppose that a process just entered a syscall code, but did not even reached the syscall implementation. In this case, there is no block point to stop at, and fork is not interruptible by signals. This means that userspace can leak an unsignalled child from under REAP_KILL and prove that it leaked. I do not see how any barrier can work there except stopping the userspace.

sys/kern/kern_procctl.c
357	proctree_lock should be. p_sibling is annotated with (e) == proctree_lock, and I do not believe that exit1() regressed there.

Use sx for stop_all_proc_block().
Upgrade proctree_lock assert to XLOCKED in reaper_abandon_children().

This revision now requires review to proceed.Apr 26 2022, 2:50 PM

markj accepted this revision.Apr 27 2022, 2:03 PM

This revision is now accepted and ready to land.Apr 27 2022, 2:03 PM

Avoid LOR between stop_all_proc_block() and proctree_lock.

This revision now requires review to proceed.Apr 27 2022, 9:48 PM

Since PROC_REAP_KILL only operates with P_PID, it could set the PCTL_UNLOCKED flag instead, and the stop_all_proc and proctree locks can be acquired in the proper order within reap_kill().

This revision is now accepted and ready to land.Apr 27 2022, 9:59 PM

In D35014#795107, @markj wrote:

Since PROC_REAP_KILL only operates with P_PID, it could set the PCTL_UNLOCKED flag instead, and the stop_all_proc and proctree locks can be acquired in the proper order within reap_kill().

I considered not adding the sapblk hook and locking everything inside exec hook, but I decided that explicit listiing of the lock_tree commands is cleaner. I do not have a strong opinion there, say if you want your variant.

In D35014#795115, @kib wrote:

In D35014#795107, @markj wrote:

Since PROC_REAP_KILL only operates with P_PID, it could set the PCTL_UNLOCKED flag instead, and the stop_all_proc and proctree locks can be acquired in the proper order within reap_kill().

I considered not adding the sapblk hook and locking everything inside exec hook, but I decided that explicit listiing of the lock_tree commands is cleaner. I do not have a strong opinion there, say if you want your variant.