libpthread: allocate rwlocks and spinlocks in dedicated cachelines
Reduces severe performance degradation due to false-sharing. Note that this
does not account for hardware which can perform adjacent cacheline prefetch.
[mjg: massaged the commit message and the patch to use aligned_alloc
instead of malloc]
PR: 272238
MFC after: 1 week
(cherry picked from commit a6c0d801ca5934bb9b9cca6870ea7406d5db0641)