Save context switch per I/O for iSCSI and IOCTL frontends.
Introduce new CTL core KPI ctl_run(), preprocessing I/Os in the caller
context instead of scheduling another thread just for that. This call
may sleep, that is not acceptable for some frontends like the original
CAM/FC one, but iSCSI already has separate sleepable per-connection RX
threads, and another thread scheduling is mostly just a waste of time.
IOCTL frontend actually waits for the I/O completion in the caller
thread, so the use of another thread for this has even less sense.
With this change I can measure ~5% IOPS improvement on 4KB iSCSI I/Os
to ZFS.
MFC after: 1 month
(cherry picked from commit 812c9f48a2b7bccc31b2a6077b299822357832e4)