Page MenuHomeFreeBSD

Another fix for PR275286
ClosedPublic

Authored by kib on Nov 26 2023, 9:33 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 17, 11:55 AM
Unknown Object (File)
Fri, Jan 10, 1:31 PM
Unknown Object (File)
Wed, Jan 8, 2:47 AM
Unknown Object (File)
Dec 6 2024, 8:53 AM
Unknown Object (File)
Nov 20 2024, 12:19 PM
Unknown Object (File)
Nov 15 2024, 11:07 AM
Unknown Object (File)
Nov 13 2024, 2:07 AM
Unknown Object (File)
Oct 30 2024, 12:28 AM
Subscribers

Details

Summary
EVFILT_TIMER: intialize stop timer list in type-stable proc init, instead of fork

Since kqueue timer may exist after the process that created it exited
(same scenario with rfork(2) as in PR 275286), make the tailq
p_kqtim_stop accessed by filt_timerdetach() type-stable.
EVFILT_SIGNAL: do not use target process pointer on detach

It is enough to know knlist to remove from it, and the list is
autodestroyed on last removal.

PR:     275286
Revert "kqueue: on process exit, force-clear its registered signal events"

This reverts commit 393ac29f0b8be068c8e46f76c2eeee07d20ea4df.  A
different fix is following, which preserves semantic, required by the
sys.kqueue.proc3_test.proc3 test.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Nov 26 2023, 9:33 PM

One root cause of the original problem is that we allow rfork() to create processes which share a kqueue. Why do we allow this at all, when regular fork() disallows it?

sys/sys/proc.h
754

One root cause of the original problem is that we allow rfork() to create processes which share a kqueue. Why do we allow this at all, when regular fork() disallows it?

The processes not just share a kqueue, they share the file descriptors table. So this is natural. Ugliness is IMO in EVFILT_PROC, the issue with the p_klist lifetime is more fundamental.

kib marked an inline comment as done.Nov 26 2023, 10:34 PM
In D42777#975714, @kib wrote:

One root cause of the original problem is that we allow rfork() to create processes which share a kqueue. Why do we allow this at all, when regular fork() disallows it?

The processes not just share a kqueue, they share the file descriptors table. So this is natural.

It is natural for the child of fork() to inherit copies of the parent's file descriptors. But for kqueue we already make an exception, so the situation is already unnatural. It seems wrong to add additional complexity to handle a corner case that has been broken for a long time. Plus, with the patch we now allocate a large table (PID_MAX * 8B on most systems) and require extra locks for common operations like fork() and kill(). We can't even combine this with the main PID hash table, I think, because the kevent table requires a different lock order with respect to the proc lock (because of tdsendsignal()).

In D42777#975714, @kib wrote:

One root cause of the original problem is that we allow rfork() to create processes which share a kqueue. Why do we allow this at all, when regular fork() disallows it?

The processes not just share a kqueue, they share the file descriptors table. So this is natural.

It is natural for the child of fork() to inherit copies of the parent's file descriptors. But for kqueue we already make an exception, so the situation is already unnatural. It seems wrong to add additional complexity to handle a corner case that has been broken for a long time. Plus, with the patch we now allocate a large table (PID_MAX * 8B on most systems) and require extra locks for common operations like fork() and kill(). We can't even combine this with the main PID hash table, I think, because the kevent table requires a different lock order with respect to the proc lock (because of tdsendsignal()).

No, thi is not about copies. Rforked child, in this case, shares file descriptor table, which is fine. Kqueue must work per-fd table, so this case must be handled by the code. And I do think that the only way to architectural cleanly provide the documented semantic for signal and process filters is to attach knotes to pids and not to struct proc.

But for the case at hands, I think I found a way to fix the bug much simpler, with small trick.

kib retitled this revision from EVFILT_PROC: rework live cycle of the knote list to Another fix for PR275286.
kib edited the summary of this revision. (Show Details)
kib added a subscriber: emaste.

Another way to fix.

I think this addresses the PR, but what about other kevent filters which maintain a proc pointer? In particular, I think filt_timerdetach() is susceptible to a similar problem.

This revision is now accepted and ready to land.Nov 28 2023, 3:36 PM

I think this addresses the PR, but what about other kevent filters which maintain a proc pointer? In particular, I think filt_timerdetach() is susceptible to a similar problem.

Yes, I think filt_timerdetach() is problematic, but not due to p_klist. And again, attaching timer to pid would be the right thing to do.

kib edited the summary of this revision. (Show Details)

Fix for timers

This revision now requires review to proceed.Nov 28 2023, 3:55 PM
sys/kern/kern_proc.c
287

Hmm, I think this isn't sufficient. Suppose a timer callout is armed, and the target process exits. If a child process holds the kqueue reference, the callout will not be drained. Then filt_timerexpire_l() loads kc->p, but this will be a pointer to a freed proc.

sys/kern/kern_proc.c
287

The pointer is freed but still valid. The process mutex and (now) p_kqtim_stop are usable.

markj added inline comments.
sys/kern/kern_proc.c
287

Hmm, I see. So, in essence, we use a random process to be the holder of stopped timers. This is still incorrect, since it means that a program can defeat the safeguard which ensures that SIGKILL/SIGSTOP can be delivered to a process which arms timers with a very short period. Also, it means that if this random process is stopped or killed, then the timer will stop firing.

It is a rather strange scenario.

This revision is now accepted and ready to land.Nov 28 2023, 4:17 PM
sys/kern/kern_proc.c
287

I agree, but again, the fix is to attach everything to pid, not process.

sys/kern/kern_proc.c
287

Perhaps it's possible to use the existing PID hash table and associated locks. This didn't seem easy for EVFILT_SIGNAL (there is a LOR with the proc lock), but perhaps for timers it works.