Page MenuHomeFreeBSD

rtwn: Fix RTL8188EU & RTL8192EU cannot associate in STA mode
Needs ReviewPublic

Authored by cy on Nov 14 2024, 2:43 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 22, 8:10 AM
Unknown Object (File)
Mon, Jan 20, 4:20 AM
Unknown Object (File)
Thu, Dec 26, 5:55 AM
Unknown Object (File)
Dec 25 2024, 12:20 PM
Unknown Object (File)
Dec 25 2024, 11:25 AM
Unknown Object (File)
Dec 25 2024, 7:23 AM
Unknown Object (File)
Dec 25 2024, 4:31 AM
Unknown Object (File)
Dec 19 2024, 4:22 AM
Subscribers

Details

Reviewers
adrian
bz
Group Reviewers
wireless
Summary

On some systems RTL8188EU and RTL8192EU will fail to associate in STA
mode while others it will work fine. On the systems RTL8192EU fails to
associate in STA mode, it works perfectly fine in AP mode. This points
to a USB 3.0 timing issue when selecting a channel while in STA mode.
USB 2.0 is unaffected.

While here fix RTL8192EU power off and power on hang for the same reason.

PR: 247528
Reported by: Mc James <realmcjames@protonmail.ch>, many
Tested by: titus@edc.ro
Concept by: titus@edc.ro

Test Plan

Been running here since Jan 9, 2023.

Refactored version. Currently being tested here (since Nov 26, 2024).

Latest version has been running here since Jan 2, 2025.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

cy requested review of this revision.Nov 14 2024, 2:43 AM

Can this be simplified to simply check if uc_delay_us is > 0? Otherwise from scrolling through uc_write_delay would be a bool or are we expecting more values?

There's /gotta/ be a way to do this without having the channel code try get the USB pointer.

Lemme noodle on it, I think we should be able to move uc_delay_us into the rtwn softc, and make uc_write_delay a softc function? instead of how it's currently done?

What's interesting is what's potentially causing the timing differences on XHCI versus EHCI? Like, are we missing some kind of flush/wait somewhere before issuing the next command(s) ?

There's /gotta/ be a way to do this without having the channel code try get the USB pointer.

Lemme noodle on it, I think we should be able to move uc_delay_us into the rtwn softc, and make uc_write_delay a softc function? instead of how it's currently done?

I can certainly do this. Same problem and fix for RTL8188EU. I can poke around at this tomorrow and maybe Monday.

What's interesting is what's potentially causing the timing differences on XHCI versus EHCI? Like, are we missing some kind of flush/wait somewhere before issuing the next command(s) ?

Probably. This affects RTL8188EU as well. Moving this into its own function would be the correct approach. I'll look at tomorrow and Monday.

I did a local version of this to see, and it doesn't /reliably/ associate, but it does associate.

Honestly I really do wonder what's different between EHCI and XHCI here, aiee.

cy retitled this revision from rtwn: Fix RTL8192EU cannot associate in STA mode to rtwn: Fix RTL8188EU & RTL8192EU cannot associate in STA mode.
cy edited the summary of this revision. (Show Details)
cy edited the test plan for this revision. (Show Details)

This update incorporates adrian@'s requested changes.

Refactor.

Also, the previous patch worked with one of my RTL8192EU (the AC1200) but not the other (TL-WN823N). It now works with both. And with the RTL8188EU as well.

oh interesting, so this now works on more?

This diff looks like it's putting a delay before /every/ write now?

I'm ok with this. Stuff in tree has to work. :-)

I'd like to dig into it some more to see if we can figure out the actual root cause (i swear we're just missing a register read somewhere just to ensure things are flushed) but I'd prefer to have everything in tree working whilst we go digging. And it's super non intrusive, so thank.

Thanks for digging into this!

This revision is now accepted and ready to land.Nov 27 2024, 3:17 AM

oh interesting, so this now works on more?

This diff looks like it's putting a delay before /every/ write now?

Remember, this is needed only with xhci(4). I was not able to reproduce this problem on my ehci(4) laptop but is needed on my xhci(4) laptop. And it affects all rtwn(4) devices.

I'm ok with this. Stuff in tree has to work. :-)

I'd like to dig into it some more to see if we can figure out the actual root cause (i swear we're just missing a register read somewhere just to ensure things are flushed) but I'd prefer to have everything in tree working whilst we go digging. And it's super non intrusive, so thank.

Thanks for digging into this!

There is no way to manage *_chan writes lower down in the stack. Lower down in the stack doesn't know who the caller is. IMO my first patch is preferable as it doesn't delay all writes, only _chan writes.

emaste added inline comments.
sys/dev/rtwn/usb/rtwn_usb_reg.c
99–100

I agree with @bz, why not just

if (uc->uc_delay_us > 0)
        (sc->sc_delay)(sc,uc->uc_delay_us);

Unless we really want to be able to have a 0mS sc_delay()?

In D47562#1089421, @cy wrote:

I'm ok with this. Stuff in tree has to work. :-)

I'd like to dig into it some more to see if we can figure out the actual root cause (i swear we're just missing a register read somewhere just to ensure things are flushed) but I'd prefer to have everything in tree working whilst we go digging. And it's super non intrusive, so thank.

Thanks for digging into this!

There is no way to manage *_chan writes lower down in the stack. Lower down in the stack doesn't know who the caller is. IMO my first patch is preferable as it doesn't delay all writes, only _chan writes.

So what I did locally was to create an sc->sc_* method that's called before and after the workaround is needed. Then the default method would be a null one, and for these chipsets it would set it to a method to set and clear the uc_write_delay flag.

I also had delay_us and write_delay in the sc too rather than uc, because the sc is visible by everything and it doesn't hurt.

sys/dev/rtwn/usb/rtwn_usb_reg.c
99–100

We may want to tweak the delay per chipset at some point.

This version of the patch broke TL-WN823N on the ehci(4) laptop. Moving setting the USB delay flag out of the _chan functions caused this regression. The other two rtwn(4) devices I have still work.

sys/dev/rtwn/usb/rtwn_usb_reg.c
99–100

uc_write_delay is just a flag that tells it whether to apply the delay in uc_delay_us or not.

99–100

I will do this.

The delay can be tweaked by the user with the dev.rtwn.0.delay_us tunable. One can dynamically set it using kenv(8), then inserting the device. The dev.rtwn.0.delay_us sysctl can be used to tune the delay after the device has been inserted.

I will look into setting a per chipset default.

sys/dev/rtwn/usb/rtwn_usb_reg.c
99–100

Right, but why not just use a non-zero value as the indication to apply a delay (set uc_delay_us = 0 rather than uc_write_delay = 0)? We can still set a per-chipset delay and set it to 0 for chipsets that don't need a delay? It's not a big deal, I'm just wondering if I'm missing something?

ah wait a sec, i just did a usbdump to look at the control transfers during init and config, and it LOOKS like the control transfers are happening every 125uS, not every 1ms like USB-2.

I /bet/ that our XHCI / usb transfer code isn't handling USB-1 and USB-2 connected devices correctly, and bumping up the control transfer interval.

Let's go dig into that!

ah wait a sec, i just did a usbdump to look at the control transfers during init and config, and it LOOKS like the control transfers are happening every 125uS, not

ok so I dug into it on my haswell laptop with USB3 on a SS port, and it also worked fine.

I also did go do a quick read of the EHCI/XHCI spec and yes, for HS/SS endpoints, the microframe / interval is 125uS, for USB 1 / FS transfers it's 1mS.

The difference?

  • the transfer interval for control transfers on the haswell laptop is around 500uS
  • the transfer interval for control transfers on the tiger lake laptop is 125uS

Now, I haven't yet found anything in the xhci code that would show the difference between platforms.

Is it quite possible the control transfers are just "too fast" (ie, actually able to be queued / run on spec) on newer hardware/CPUs?
Or is it something lurking in the chipsets, or the driver? I have no idea.

But, this is a fun deep dive into the bowels of things. Thanks for adding USB @bz , we're gonna need to dive a bit deeper!

The other problem is the TL-WN823N, which did work on my old laptop, no longer does. Also tested on the sandbox server downstairs. Something has changed since it was last used. To rule out that this device has stopped working I connected it to my $JOB Windows laptop. It works. It would appear that this is a FreeBSD-specific problem I need to hunt down.

sys/dev/rtwn/usb/rtwn_usb_reg.c
99–100

I could. I'm thinking about adding uc_in_chan instead of using uc_write_delay, and moving the USB code into the USB modules instead of polluting the upper non-USB functions. I'm AFK this weekend (taking laptop) and will try to find some time to rework this code.

So, i noticed that the RTL8192EU even when "fixed" would associate fine once, but if you shut it down (eg killed wpa_supplicant) and started it up again, it would crash the firmware in the NIC after a couple of transfers.

I then tested it on my Haswell -HEAD laptop with EHCI and XHCI ports that doesn't have the issue; and it also has this behaviour.

I'm writing a tool right now to go parse usbdumps to try and turn them into realtek formatted register accesses to see what's going on. So far it's getting only a little bit into the power-up sequence and then hitting a specific register and dying on all transfers after that.

This update of the patch addresses Adrian's concern that *_chan() are polluted by references to USB (*uc) data structures. Instead of setting uc->uc_write_delay directly the *chan() functions set a chan_flag in the sc to indicate that execution is in a sensitive part of channel processing, that a delay should be made before the write.

Unfortunately there is no way for the write routines to implicitly know when they are performing a write for one of the affected *_chan() routines.

I also fixed an oversight, fixed in the previous patch, which was not subsequently uploaded to phab.

This new patch works as well as the previous patch. I am using it here and will continue to test it until committed.

Adrian, please verify if this meets your requirement to move manipulation of the USB write delay flag out of the *_chan() functions.

This revision now requires review to proceed.Fri, Jan 3, 4:29 PM

Same diff but with -U99999 to capture the full files for review.

I'm still trying to chase this down!

  • There's stuff in the r92e_init.c power_off path that isn't in the vendor driver / rtl8xxxu driver, if commented out then the driver survives power down / up - https://reviews.freebsd.org/D48428
  • I'm going through the channel setup path for rtl8188eu / rtl8192eu; the vendor driver / rtl8xxxu driver doesn't have 1ms sleeps, but we are missing some sleeps, esp in the RF register read/write path.

Would you mind testing D48428 yourself with this diff? ie, if you ONLY set/clear sc->chan_flag so it enables the write delay when the channel programming happens, comment out the bits I did in D48428 and NOT do any extra sleeping in the rtl8192eu power-up / down path.
That works locally here!