mpr, mps: Establish busdma boundaries for memory pools
Most all of the memory used by the cards in the mpr(4) and mps(4)
drivers is required, according to the specs and Broadcom developers,
to be within a 4GB segment of memory.
This includes:
System Request Message Frames pool
Reply Free Queues pool
ReplyDescriptorPost Queues pool
Chain Segments pool
Sense Buffers pool
SystemReply message pool
We got a bug report from Dwight Engen, who ran into data corruption
in the BAE port of FreeBSD:
We have a port of the FreeBSD mpr driver to our kernel and recently
I found an issue under heavy load where a DMA may go to the wrong
address. The test system is a Supermicro X10SRH-CLN4F with the
onboard SAS3008 controller setup with 2 enterprise Micron SSDs in
RAID 0 (striped). I have debugged the issue and narrowed down that
the errant DMA is one that has a segment that crosses a 4GB
physical boundary. There are more details I can provide if you'd
like, but with the attached patch in place I can no longer
re-create the issue.
I'm not sure if this is a known limit of the card (have not found a
datasheet/programming docs for the chip) or our system is just
doing something a bit different. Any helpful info or insight would
be welcome.
Anyway, just thought this might be helpful info if you want to
apply a similar fix to FreeBSD. You can ignore/discard the commit
message as it is my internal commit (blkio is our own tool we use
to write/read every block of a device with CRC verification which
is how I found the problem).
The commit message was:
[PATCH 8/9] mpr: fix memory corrupting DMA when sg segment crosses
4GB boundary
Test case was two SSD's in RAID 0 (stripe). The logical disk was
then partitioned into two partitions. One partition had lots of
filesystem I/O and the other was initially filled using blkio with
CRCable data and then read back with blkio CRC verify in a loop.
Eventually blkio would report a bad CRC block because the physical
page being read-ahead into didn't contain the right data. If the
physical address in the arq/segs was for example 0x500003000 the
data would actually be DMAed to 0x400003000.
The original patch was against mpr(4) before busdma templates were
introduced, and only affected the buffer pool (sc->buffer_dmat) in
the mpr(4) driver. After some discussion with Dwight and the
LSI/Broadcom developers and looking through the driver, it looks
like most of the queues in the driver are ok, because they limit
the memory used to memory below 4GB. The buffer queue and the chain
frames seem to be the exceptions.
This is pretty much the same between the mpr(4) and mps(4) drivers.
So, apply a 4GB boundary limitation for the buffer and chain frame pools
in the mpr(4) and mps(4) drivers.
Reported by: Dwight Engen <dwight.engen@gmail.com>
Reviewed by: imp
Obtained from: Dwight Engen <dwight.engen@gmail.com>
Differential Revision: https://reviews.freebsd.org/D43008
(cherry picked from commit 264610a86e14f8e123d94c3c3bd9632d75c078a3)