Since r336439 we simply take the session pointer value mod the number of
worker threads (ncpu by default). This does not work very well.
Instead, maintain an incrementing counter with a unique value per
session, and use that to distribute work to completion threads.
We could alternately hash the pointer in some way, but a counter gives
consistent distributions across reboots. It also plays more nicely with
GELI, which creates ncpu sessions per GEOM. It would be preferable to
dynamically re-assign sessions to worker threads periodically based on
load, but we're not there yet. In the meantime, using more than one
worker helps IPSec throughput.