Page MenuHomeFreeBSD

Fix a race between fork(2) and PROC_REAP_KILL subtree
ClosedPublic

Authored by kib on Apr 20 2022, 11:01 PM.
Tags
None
Referenced Files
F96526570: D35014.diff
Wed, Sep 25, 8:56 AM
Unknown Object (File)
Sat, Sep 21, 6:03 AM
Unknown Object (File)
Fri, Sep 20, 10:36 PM
Unknown Object (File)
Wed, Sep 18, 9:49 AM
Unknown Object (File)
Wed, Sep 18, 1:33 AM
Unknown Object (File)
Tue, Sep 17, 7:33 PM
Unknown Object (File)
Tue, Sep 17, 2:47 PM
Unknown Object (File)
Tue, Sep 17, 2:15 PM
Subscribers

Details

Summary

by repeating iteration over the subtree until there are no new processes to signal.

Another race between fork(2) and PROC_REAP_KILL subtree is where we might not yet see a new child when signalling a process. Ensure that this cannot happen by stopping all reapping subtree, which ensures that the child is not inside a syscall, in particular fork(2).

Reported and tested by: pho

Also there:

unr(9): allow to avoid internal locking
Add stop_all_proc_blocker(9)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib edited the summary of this revision. (Show Details)
kib added a reviewer: jhb.

Stop the victim while signalling.

kib edited the summary of this revision. (Show Details)

Add stop_all_proc_blocker.

sys/kern/kern_proc.c
3408

Why can't it simply be another sx lock? Does the allproc lock provide some interlocking here, or is it only used to synchronize updates to stop_all_proc_blocker?

sys/kern/kern_procctl.c
266
269

Can we derive need_stop when needed, instead of passing it around?

need_stop = (rk->rk_flags & REAPER_KILL_SUBTREE) != 0
274

Assert that the proc is held too? I think this is required for the relocking dance below to be safe when thread_single(SINGLE_ALLPROC) fails. Also perhaps assert that the proc is not exiting.

335

Can it be an STAILQ?

357

LIST_FOREACH?

358
383

LIST_FOREACH?

393

Looks like we don't need s and can just write

if (!reap_kill_proc(td, p2, true, ksi, rk, error))
    free_unr(pids, p2->p_pid);
403

I'd suggest s/reap/reaper/ for the variable name, here and above. reap sounds like the name of a boolean variable to me.

419

Extra (harmless?) clear_unrhdr() call.

kib marked 11 inline comments as done.Apr 25 2022, 10:31 PM
kib added inline comments.
sys/kern/kern_proc.c
3408

The allproc_lock only syncs updates. In fact I considered making stop_all_proc_blocker a char, but then convinced myself that 1 byte vs 8 bytes is not too much, while right now owning assert makes sense.

sizeof(struct sx) == 32 which is more above the threshold for just an anti-foot shooting measure.

If you strongly prefer sx, it can be used there, a new one of course.

sys/kern/kern_procctl.c
335

In the code structure as it is ATM, it seems yes it can. But a minimal modifications of the walk algorithm would require returning back-link, and the saving there is quite small: the structures only live during syscall execution.

I can change, but I do not see an advantage. Say if you see it important for the review.

419

In fact it is clean_unrhdr() that is extra. 'clean' removes postponed-free blocks, while 'clear' does the finalization, while assumes that postponed-free blocks are already handled.

In this use, clean_unrhdr() can be removed because code does not use postponed deallocation. But I kept it to be completely correct. May be clear_unrhdr() should call clean_unrhdr(), but this is a bigger project.

See 09828ba947eb87c5ae5e852 for more details.

kib marked 3 inline comments as done.

Handle Mark' comments.

Fix condition for need_stop: it should check that REAPER_KILL_CHILDREN is not set. REAPER_KILL_SUBTREE is legitimately unset when all descendants of the reaper must be killed.

Reported by: pho

Seems ok to me. One general comment: we have to stop the target process because we have no mechanism to synchronize with a forking process. For example, a per-proc sx lock which is held for the duration of fork, so other threads can block until the target is finished forking. Or some blockcount-like barrier. I think I added such a thing for Isilon a long time ago to fix a problem with dtrace, IMO it would be useful here too. Do you agree?

sys/kern/kern_proc.c
3408

If you consider 32 bytes vs. 8 bytes plus extra .text for the implementation, I'm not sure there's much of a difference.

I was curious so I tried it. With

static struct sx stopproc_lock;

void
stop_all_proc_block(void)
{
    sx_xlock(&stopproc_lock);
}

void
stop_all_proc_unblock(void)
{
    sx_xunlock(&stopproc_lock);
}

and an sx_init() call in procinit(), a GENERIC-NODEBUG kernel is 8 bytes smaller than with the current implementation.

I don't insist either way, I just prefer mechanisms with witness etc. hooks instead of hand-rolled wait channels.

sys/kern/kern_procctl.c
357

Hmm, should it be LIST_FOREACH_SAFE, in fact? What prevents p2 from exiting before the current thread proceeds to the next iteration?

419

I see, thanks. Seems fine to keep it as it is.

431

I noticed that reaper_abandon_children() asserts SX_LOCKED, i.e., not SX_XLOCKED. But it cannot safely execute in parallel with reap_kill_subtree(). So that assertion should be upgraded to XLOCKED, it is already true in practice.

This revision is now accepted and ready to land.Apr 26 2022, 2:07 PM
kib marked 4 inline comments as done.Apr 26 2022, 2:48 PM

Seems ok to me. One general comment: we have to stop the target process because we have no mechanism to synchronize with a forking process. For example, a per-proc sx lock which is held for the duration of fork, so other threads can block until the target is finished forking. Or some blockcount-like barrier. I think I added such a thing for Isilon a long time ago to fix a problem with dtrace, IMO it would be useful here too. Do you agree?

Suppose that a process just entered a syscall code, but did not even reached the syscall implementation. In this case, there is no block point to stop at, and fork is not interruptible by signals. This means that userspace can leak an unsignalled child from under REAP_KILL and prove that it leaked. I do not see how any barrier can work there except stopping the userspace.

sys/kern/kern_procctl.c
357

proctree_lock should be. p_sibling is annotated with (e) == proctree_lock, and I do not believe that exit1() regressed there.

kib marked an inline comment as done.

Use sx for stop_all_proc_block().
Upgrade proctree_lock assert to XLOCKED in reaper_abandon_children().

This revision now requires review to proceed.Apr 26 2022, 2:50 PM
This revision is now accepted and ready to land.Apr 27 2022, 2:03 PM

Avoid LOR between stop_all_proc_block() and proctree_lock.

This revision now requires review to proceed.Apr 27 2022, 9:48 PM

Since PROC_REAP_KILL only operates with P_PID, it could set the PCTL_UNLOCKED flag instead, and the stop_all_proc and proctree locks can be acquired in the proper order within reap_kill().

This revision is now accepted and ready to land.Apr 27 2022, 9:59 PM

Since PROC_REAP_KILL only operates with P_PID, it could set the PCTL_UNLOCKED flag instead, and the stop_all_proc and proctree locks can be acquired in the proper order within reap_kill().

I considered not adding the sapblk hook and locking everything inside exec hook, but I decided that explicit listiing of the lock_tree commands is cleaner. I do not have a strong opinion there, say if you want your variant.

In D35014#795115, @kib wrote:

Since PROC_REAP_KILL only operates with P_PID, it could set the PCTL_UNLOCKED flag instead, and the stop_all_proc and proctree locks can be acquired in the proper order within reap_kill().

I considered not adding the sapblk hook and locking everything inside exec hook, but I decided that explicit listiing of the lock_tree commands is cleaner. I do not have a strong opinion there, say if you want your variant.

I'm fine with your approach.