HPTS timing begins to go off when we reach the threshold of connections (1200 by default)
where we have any returning syscall or LRO stop finding the oldest hpts thread that
has not run but instead using the CPU it is on. This ends up causing quite a lot of times
where hpts threads may not run for extended periods of time. On top of all that which
causes heartburn if you are pacing in tcp, you also have the fact that where AMD's
podded L3 cache may have sets of 8 CPU's that share a L3, hpts is unaware of this
and thus on amd you can generate a lot of cache misses.
So to fix this we will get rid of the CPU mode, and always use oldest. But also make
HPTS aware of the CPU topology and keep the "oldest" to be within the same L3 cache.
This also works nicely for NUMA as well couple with Drew's earlier NUMA changes.