Page MenuHomeFreeBSD

Fix a race between fork(2) and PROC_REAP_KILL subtree
ClosedPublic

Authored by kib on Apr 20 2022, 11:01 PM.
Tags
None
Referenced Files
F96526570: D35014.diff
Wed, Sep 25, 8:56 AM
Unknown Object (File)
Sat, Sep 21, 6:03 AM
Unknown Object (File)
Fri, Sep 20, 10:36 PM
Unknown Object (File)
Wed, Sep 18, 9:49 AM
Unknown Object (File)
Wed, Sep 18, 1:33 AM
Unknown Object (File)
Tue, Sep 17, 7:33 PM
Unknown Object (File)
Tue, Sep 17, 2:47 PM
Unknown Object (File)
Tue, Sep 17, 2:15 PM
Subscribers

Details

Summary

by repeating iteration over the subtree until there are no new processes to signal.

Another race between fork(2) and PROC_REAP_KILL subtree is where we might not yet see a new child when signalling a process. Ensure that this cannot happen by stopping all reapping subtree, which ensures that the child is not inside a syscall, in particular fork(2).

Reported and tested by: pho

Also there:

unr(9): allow to avoid internal locking
Add stop_all_proc_blocker(9)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib edited the summary of this revision. (Show Details)
kib added a reviewer: jhb.

Stop the victim while signalling.

kib edited the summary of this revision. (Show Details)

Add stop_all_proc_blocker.

sys/kern/kern_proc.c
3408 ↗(On Diff #105352)

Why can't it simply be another sx lock? Does the allproc lock provide some interlocking here, or is it only used to synchronize updates to stop_all_proc_blocker?

sys/kern/kern_procctl.c
266 ↗(On Diff #105352)
269 ↗(On Diff #105352)

Can we derive need_stop when needed, instead of passing it around?

need_stop = (rk->rk_flags & REAPER_KILL_SUBTREE) != 0
274 ↗(On Diff #105352)

Assert that the proc is held too? I think this is required for the relocking dance below to be safe when thread_single(SINGLE_ALLPROC) fails. Also perhaps assert that the proc is not exiting.

335 ↗(On Diff #105352)

Can it be an STAILQ?

357 ↗(On Diff #105352)

LIST_FOREACH?

358 ↗(On Diff #105352)
383 ↗(On Diff #105352)

LIST_FOREACH?

393 ↗(On Diff #105352)

Looks like we don't need s and can just write

if (!reap_kill_proc(td, p2, true, ksi, rk, error))
    free_unr(pids, p2->p_pid);
403 ↗(On Diff #105352)

I'd suggest s/reap/reaper/ for the variable name, here and above. reap sounds like the name of a boolean variable to me.

419 ↗(On Diff #105352)

Extra (harmless?) clear_unrhdr() call.

kib marked 11 inline comments as done.Apr 25 2022, 10:31 PM
kib added inline comments.
sys/kern/kern_proc.c
3408 ↗(On Diff #105352)

The allproc_lock only syncs updates. In fact I considered making stop_all_proc_blocker a char, but then convinced myself that 1 byte vs 8 bytes is not too much, while right now owning assert makes sense.

sizeof(struct sx) == 32 which is more above the threshold for just an anti-foot shooting measure.

If you strongly prefer sx, it can be used there, a new one of course.

sys/kern/kern_procctl.c
335 ↗(On Diff #105352)

In the code structure as it is ATM, it seems yes it can. But a minimal modifications of the walk algorithm would require returning back-link, and the saving there is quite small: the structures only live during syscall execution.

I can change, but I do not see an advantage. Say if you see it important for the review.

419 ↗(On Diff #105352)

In fact it is clean_unrhdr() that is extra. 'clean' removes postponed-free blocks, while 'clear' does the finalization, while assumes that postponed-free blocks are already handled.

In this use, clean_unrhdr() can be removed because code does not use postponed deallocation. But I kept it to be completely correct. May be clear_unrhdr() should call clean_unrhdr(), but this is a bigger project.

See 09828ba947eb87c5ae5e852 for more details.

kib marked 3 inline comments as done.

Handle Mark' comments.

Fix condition for need_stop: it should check that REAPER_KILL_CHILDREN is not set. REAPER_KILL_SUBTREE is legitimately unset when all descendants of the reaper must be killed.

Reported by: pho

Seems ok to me. One general comment: we have to stop the target process because we have no mechanism to synchronize with a forking process. For example, a per-proc sx lock which is held for the duration of fork, so other threads can block until the target is finished forking. Or some blockcount-like barrier. I think I added such a thing for Isilon a long time ago to fix a problem with dtrace, IMO it would be useful here too. Do you agree?

sys/kern/kern_proc.c
3408 ↗(On Diff #105352)

If you consider 32 bytes vs. 8 bytes plus extra .text for the implementation, I'm not sure there's much of a difference.

I was curious so I tried it. With

static struct sx stopproc_lock;

void
stop_all_proc_block(void)
{
    sx_xlock(&stopproc_lock);
}

void
stop_all_proc_unblock(void)
{
    sx_xunlock(&stopproc_lock);
}

and an sx_init() call in procinit(), a GENERIC-NODEBUG kernel is 8 bytes smaller than with the current implementation.

I don't insist either way, I just prefer mechanisms with witness etc. hooks instead of hand-rolled wait channels.

sys/kern/kern_procctl.c
432 ↗(On Diff #105426)

I noticed that reaper_abandon_children() asserts SX_LOCKED, i.e., not SX_XLOCKED. But it cannot safely execute in parallel with reap_kill_subtree(). So that assertion should be upgraded to XLOCKED, it is already true in practice.

357 ↗(On Diff #105352)

Hmm, should it be LIST_FOREACH_SAFE, in fact? What prevents p2 from exiting before the current thread proceeds to the next iteration?

419 ↗(On Diff #105352)

I see, thanks. Seems fine to keep it as it is.

This revision is now accepted and ready to land.Apr 26 2022, 2:07 PM
kib marked 4 inline comments as done.Apr 26 2022, 2:48 PM

Seems ok to me. One general comment: we have to stop the target process because we have no mechanism to synchronize with a forking process. For example, a per-proc sx lock which is held for the duration of fork, so other threads can block until the target is finished forking. Or some blockcount-like barrier. I think I added such a thing for Isilon a long time ago to fix a problem with dtrace, IMO it would be useful here too. Do you agree?

Suppose that a process just entered a syscall code, but did not even reached the syscall implementation. In this case, there is no block point to stop at, and fork is not interruptible by signals. This means that userspace can leak an unsignalled child from under REAP_KILL and prove that it leaked. I do not see how any barrier can work there except stopping the userspace.

sys/kern/kern_procctl.c
357 ↗(On Diff #105352)

proctree_lock should be. p_sibling is annotated with (e) == proctree_lock, and I do not believe that exit1() regressed there.

kib marked an inline comment as done.

Use sx for stop_all_proc_block().
Upgrade proctree_lock assert to XLOCKED in reaper_abandon_children().

This revision now requires review to proceed.Apr 26 2022, 2:50 PM
This revision is now accepted and ready to land.Apr 27 2022, 2:03 PM

Avoid LOR between stop_all_proc_block() and proctree_lock.

This revision now requires review to proceed.Apr 27 2022, 9:48 PM

Since PROC_REAP_KILL only operates with P_PID, it could set the PCTL_UNLOCKED flag instead, and the stop_all_proc and proctree locks can be acquired in the proper order within reap_kill().

This revision is now accepted and ready to land.Apr 27 2022, 9:59 PM

Since PROC_REAP_KILL only operates with P_PID, it could set the PCTL_UNLOCKED flag instead, and the stop_all_proc and proctree locks can be acquired in the proper order within reap_kill().

I considered not adding the sapblk hook and locking everything inside exec hook, but I decided that explicit listiing of the lock_tree commands is cleaner. I do not have a strong opinion there, say if you want your variant.

In D35014#795115, @kib wrote:

Since PROC_REAP_KILL only operates with P_PID, it could set the PCTL_UNLOCKED flag instead, and the stop_all_proc and proctree locks can be acquired in the proper order within reap_kill().

I considered not adding the sapblk hook and locking everything inside exec hook, but I decided that explicit listiing of the lock_tree commands is cleaner. I do not have a strong opinion there, say if you want your variant.

I'm fine with your approach.