libpthread: allocate rwlocks and spinlocks in dedicated cachelines
Reduces severe performance degradation due to false-sharing. Note that this
does not account for hardware which can perform adjacent cacheline prefetch.
[mjg: massaged the commit message and the patch to use aligned_alloc
instead of malloc]
PR: 272238
MFC after: 1 week