Schedule fast taskqueue callouts on right CPU.
With fast taskqueues using direct callouts we can reduce number of
CPU wakeups by scheduling callout on current CPU if taskqueue calls
taskqueue_enqueue_timeout() on itself. The trick won't work for
regular taskqueues, since the callout thread will occupy the CPU.
It also may not work in case of multiple threads since we do not
know which thread will pick the task, and we do not want excessive
callout migrations. So we optimize only the other cases we can.
In practice this allows iichid(4) taskqueue to stay on CPU where
underlying ig4(4) interrupts are routed and to not kick CPU 0 with
timer interrupts on each sampling period (every 2nd/3rd sleep).
MFC after: 1 month
(cherry picked from commit 7bbac6419d174c98cc6ea969b68fcfe0f9a9bab8)