Page MenuHomeFreeBSD

rtwn: change the USB TX transfers to only do one pending transfer per endpoint
Needs ReviewPublic

Authored by adrian on Tue, Nov 12, 5:00 AM.
Referenced Files
F102936899: D47522.diff
Mon, Nov 18, 11:43 PM
Unknown Object (File)
Sun, Nov 17, 10:39 PM
Subscribers

Details

Reviewers
bz
Group Reviewers
wireless
Summary

I found I was getting constant device timeouts when doing anything
more complicated than a single SSH on laptop with RTL8811AU.

After digging into it, i found a variety of fun situations, including
traffic stalls that would recover w/ a shorter (1 second) USB transfer
timeout. However, the big one is a straight up hang of any TX endpoint
until the NIC was reset. The RX side kept going just fine; only the
TX endpoints would hang.

Reproducing it was easy - just start up a couple of traffic streams
on different WME AC's - eg a best effort + bulk transfer, like
browsing the web and doing an ssh clone - throw in a ping -i 0.1
to your gateway, and it would very quickly hit device timeouts every
couple of seconds.

I put everything into a single TX EP and the hangs went away.

So after some MORE digging, I found that this driver isn't checking
if the transfers are going into the correct EPs for the packet
WME access category / 802.11 TID; and would frequently be able
to schedule multiple transfers into the same endpoint.

My /guess/ was that the firmware isn't happy with one or both of the
above, and so I solved both.

  • drop the USB transfer timeout to 1 second, not 5 seconds - that way we'll either get a 1 second traffic pause and USB transfer failure, or a 5 second device timeout. Having both the TX timeout and the USB transfer timeout made recovery from a USB transfer timeout (without a NIC reset) almost impossible.
  • enforce one transfer per endpoint;
  • separate pending/active buffer tracking per endpoint;
  • each endpoint now has its own TX callback to make sure the queue / end point ID is known;
  • and only frames from a given endpoint pending queue is going into the active queue and into that endpoint.

This seems (fingers crossed) to have fixed the traffic hangs.

Locally tested:

  • rtl8812AU, 11n STA mode

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 60495
Build 57379: arc lint + arc unit

Event Timeline

emaste added inline comments.
sys/dev/rtwn/usb/rtwn_usb_tx.c
204

Worth a XXX or TODO comment to make this more discoverable?

bz requested changes to this revision.Tue, Nov 12, 2:13 PM
bz added a subscriber: bz.

The #ifdef bits probably need to be addressed; the whitespace and punctuation could be optional.

sys/dev/rtwn/usb/rtwn_usb_attach.c
164

I'd possibly group the "inactive" bits together here.

sys/dev/rtwn/usb/rtwn_usb_tx.c
109

That's going to give warning/errors if D4054 gets defined; #ifdef missing here and initialization preferably still down there before the loop then.

200

sentences end in .

This revision now requires changes to proceed.Tue, Nov 12, 2:13 PM
sys/dev/rtwn/usb/rtwn_usb_tx.c
109

also style says A for loop may declare and initialize its counting variable which will avoid the issue entirely for i.

We don't have anything that actually defines D4054 though, right? It's really just a placeholder for "update this if D4054 lands"?

note - I've tested this on a variety of other rtwn NICs now - RTL8821AU, RTL8188EU, RTL8192CU - they worked fine. But RTL8192EU isn't working; TX seems broken. I don't think it's this patch, but I'm going to test on older laptops and also earlier freebsd version snapshots.

So stay tuned, I really do want to make sure this works on all the realtek USB devices we currently have support for before I land this!

hm, looks like it may be a USB controller problem, hmph. Stay tuned.

adrian marked 3 inline comments as done and an inline comment as not done.Thu, Nov 14, 1:08 AM
adrian added inline comments.
sys/dev/rtwn/usb/rtwn_usb_attach.c
164

what do you mean by group here?

sys/dev/rtwn/usb/rtwn_usb_tx.c
204

sure but I think I'm gonna tackle this one next, it's a common problem in the net80211 drivers

sys/dev/rtwn/usb/rtwn_usb_attach.c
164

what do you mean by group here?

The blank line splits awkwardly: the STAILQ_INIT for uc_tx_inactive and the loop with the STAILQ_INSERT_HEAD; I'd remove the blank line or move it above the STAILQ_INIT(..tx_inactive) given there seem to be two "blocks" of logic now.

Add a device_printf() to log when USB transfer errors happen, so it's easier to figure out what happened

hm, looks like it may be a USB controller problem, hmph. Stay tuned.

I do have a patch for this in my tree. Check PR/247528 for the fix.

oh it's THIS bug, ha! ok, lemme go re-review D38854 and see if we can figure out how to get it in cleanly.

oh it's THIS bug, ha! ok, lemme go re-review D38854 and see if we can figure out how to get it in cleanly.

XHCI vs EHCI.

address spacing comment from bz@

ok, please re-review this!

I'm still seeing some traffic hangs, so there's still something subtle going on, but it's at least not constant crashing anymore!