Reimplement BIO_ORDERED handling in nvd(4).
This fixes BIO_ORDERED semantics while also improving performance by:
- sleeping also before BIO_ORDERED bio, as defined, not only after;
- not queueing BIO_ORDERED bio to taskqueue if no other bios running;
- waking up sleeping taskqueue explicitly rather then rely on polling.
On Samsung SSD 970 PRO this shows sync write latency, measured with
diskinfo -wS, reduction from ~2ms to ~1.1ms by not sleeping without
reason till next HZ tick.
On the same device ZFS pool with 8 ZVOLs synchronously writing 4KB blocks
shows ~950 IOPS instead of ~750 IOPS before. I suspect ZFS does not need
BIO_ORDERED on BIO_FLUSH at all, but that will be next question.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.