linuxkpi: races between linux_queue_delayed_work_on() and linux_cancel_delayed_work_sync()
- Suppose that linux_queue_delayed_work_on() is called with non-zero delay and found the work.state WORK_ST_IDLE. It resets the state to WORK_ST_TIMER and locks timer.mtx. Now, if linux_cancel_delayed_work_sync() was also called meantime, read state as WORK_ST_TIMER and already taken the mutex, it is executing callout_stop() on non-armed callout. Then linux_queue_delayed_work_on() continues and schedules callout. But the return value from cancel() is false, making it possible to the requeue from callback to slip in.
- If linux_cancel_delayed_work_sync() returned true, we need to cancel again. The requeue from callback could have revived the work.
The end result is that we schedule callout that might be freed, since
cancel_delayed_work_sync() claims that everything was stopped. This
contradicts the way the KPI is used in Linux, where consumers expect
that cancel_delayed_work_sync() is reliable on its own.
Reviewed by: markj
Discussed with: bz
Sponsored by: NVidia networking
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42468