Modify lock_delay() to increase the delay time after spinning,
not before. Previously we would spin at least twice instead of once.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 34973 Build 31966: arc lint + arc unit
Event Timeline
It is quite unclear if this is a good idea as is, key point being that these locks are unfair and fiddling with this can further negatively affect the achieved fairness. In particular the current "wait 2 spins" setup make fresh arrivals less likely to grab the lock from under threads which already wait. Replacing this with fair locks in a manner which does not negatively impact single-threaded performance is complicated. Any changes here should come with several benchmarks, showcasing longest time to grab the lock and perhaps some fiddling with min/max for backoff.
I don't have specific benchmark details, but I can definitively say we found this change to have a significant impact on throughput and latency in our performance testing (when compared to the BSD10 code which calls cpu_spinwait() directly).
--ap
Sure, it is a tradeoff, helps some workloads and hurts others. Can you share the hw spec? Most notably cores/threads, cpu model and socket count?
I'll add some extra measures here and give you a different patch to test.
On the specific test that saw an improvement, we ran a 2-socket, Intel Sandy Bridge, 8-core/socket, 2.1GHz.
I get what you're saying. If you have something different to test out, we can try it.
--ap
So I got myself an Ivy Bridge for testing (among some other boxes). I failed to get benefit from the patch at hand, but I did not worsen the stuff I was worried about.
However, there is still significant unfairness here and making it less of a problem will negatively affect throughput. The real thing to do is to drop backoff altogether, but that's quite a lot of work.
tl;d i think the patch can go in for the time being