Page MenuHomeFreeBSD

linuxkpi: races between linux_queue_delayed_work_on() and linux_cancel_delayed_work_sync()
ClosedPublic

Authored by kib on Nov 4 2023, 4:56 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 4, 12:14 AM
Unknown Object (File)
Dec 13 2024, 4:07 AM
Unknown Object (File)
Sep 22 2024, 5:30 PM
Unknown Object (File)
Sep 22 2024, 9:31 AM
Unknown Object (File)
Sep 22 2024, 5:47 AM
Unknown Object (File)
Sep 18 2024, 2:04 AM
Unknown Object (File)
Sep 17 2024, 8:12 PM
Unknown Object (File)
Sep 16 2024, 6:29 PM

Details

Summary
1. Suppose that linux_queue_delayed_work_on() is called with
   non-zero delay and found the work.state WORK_ST_IDLE. It
   resets the state to WORK_ST_TIMER and locks timer.mtx. Now, if
   linux_cancel_delayed_work_sync() was also called meantime, read
   state as WORK_ST_TIMER and already taken the mutex, it is executing
   callout_stop() on non-armed callout. Then linux_queue_delayed_work_on()
   continues and schedules callout.  But the return value from cancel() is
   false, making it possible to the requeue from callback to slip in.

2. If linux_cancel_delayed_work_sync() returned true, we need to cancel
   again.  The requeue from callback could have revived the work.

The end result is that we schedule callout that might be freed, since
cancel_delayed_work_sync() claims that everything was stopped.  This
contradicts the way the KPI is used in Linux, where consumers expect
that cancel_delayed_work_sync() is reliable on its own.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Nov 4 2023, 4:56 PM
bz added inline comments.
sys/compat/linuxkpi/common/src/linux_work.c
475

I would call that lkpi_cancel_...internal(); lkpi_ for it's an internal function and spelt out _internal.

529

is this simpler?

res = false;
while (linux_cancel_delayed_work_sync_int(dwork))
  res = true;
kib marked an inline comment as done.Nov 5 2023, 7:38 PM
kib added inline comments.
sys/compat/linuxkpi/common/src/linux_work.c
475

All functions in this file are prefixed with _linux, even static. I can use _internal, _int is just shorter.

My suggestions are for pre-existing nits, since I had to read the code anyway. Please feel free to ignore them.

  1. If linux_cancel_delayed_work_sync() returned true, we need to cancel again. The requeue from callback could have revived the work.

Assuming that the first bug is fixed, how can this happen? taskqueue_drain() should only return once the task has finished executing and it's no longer pending. But, if the task queued itself with a timeout, it'll be the callout that's pending, not the task. Flipping the order of callout_drain() and taskqueue_drain() calls does not solve the problem either. Ok.

sys/compat/linuxkpi/common/src/linux_work.c
209
233
471
This revision is now accepted and ready to land.Nov 6 2023, 9:55 PM
kib marked 3 inline comments as done.Nov 7 2023, 10:55 AM