locks: tweak backoff a little bit
Previous limits were chosen when locking primitives had spurious lock
accesses.
Flipping the starting point to 1 (or rather 2 as the first call shifts it)
provides a modest win when mild contention is seen while not hurting worse
cases. Tested on a bunch of one, two and four socket old and new systems
(Westmere, Skylake, Threadreaper and others) by doing concurrent page faults,
buildkernel/buildworld and other stuff (although not all systems got all the
tests).
Another thing is the upper limit. It is semi-arbitrarily chosen as it was
getting out of hand for slightly less small systems (e.g. a 128-thread one).
Note that backoff is fundamentally a speculative bandaid and this change just
makes it fit a little bit better. It remains completely oblivious to the
hardware topology or the contention pattern. This is being experimented with.