mana: batch ringing RX queue doorbell on receiving packets
It's inefficient to ring the doorbell page every time a WQE is posted to
the received queue. Excessive MMIO writes result in CPU spending more
time waiting on LOCK instructions (atomic operations), resulting in
poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all
WQEs to the receive queue in mana_poll_rx_cq().
In addition, use the correct WQE count for ringing RQ doorbell.
The hardware specification specifies that WQE_COUNT should set to 0 for
the Receive Queue. Although currently the hardware doesn't enforce the
check, in the future releases it may check on this value.
Tested by: whu
MFC after: 1 week
Sponsored by: Microsoft