vfs: per-cpu batched requeuing of free vnodes
Constant requeuing adds significant lock contention in certain
workloads. Lessen the problem by batching it.
Per-cpu areas are locked in order to synchronize against UMA freeing
memory.
vnode's v_mflag is converted to short to prevent the struct from
growing.
Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock: 122.38s user 1780.45s system 6242% cpu 30.480 total
patched: 144.84s user 985.90s system 4856% cpu 23.282 total
Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22998