LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)
ClosedPublic
Actions

Authored by bz on Feb 3 2024, 9:29 PM.

Details

Reviewers

Commits

rGd4b4efc6db6c: LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)
rG184ccc414686: LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)
rG8c450ea1083b: LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)
rG2ac8a2189ac6: LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss)

Summary

With firmware based solutions we cannot just jump from an active session
to a new iv_bss node without tearing down state for the old and bringing
up the new node. This likely used to work on softmac based cards/drivers
where one could essentially set the state and fire at will.

We track (*iv_update_bss) calls from net80211 and set a local flag that
we are out of synch and do not allow any further operations up the state
machine until we hit INIT or SCAN. That means someone will take the state
down, clean up firmware state and then we can join again and build up
state.

Apparently this problem has been "known" for a while as native iwm(4) and
others have similar workarounds (though less strict) and can be equally
pestered into bad states. For LinuxKPI all the KASSERTs just massively
brought this problem out. The solution will be some rewrites in net80211.
Until then, try to keep us more stable at least and not die on second
join1() calls triggered by service netif start wlan0 and similar.

Sponsored by: The FreeBSD Foundation (2023, partial)
MFC after: 3 days

Test Plan

This is currently very verbose; before it goes into main
the ic_printfs should become tracing.
This requires D43389 to be applied to head as well
as 49619f73151aeaca4cef5adf631253da04a46e19

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

bz created this revision.Feb 3 2024, 9:29 PM

Herald added subscribers: linuxkpi, emaste, imp. · View Herald TranscriptFeb 3 2024, 9:29 PM

bz requested review of this revision.Feb 3 2024, 9:29 PM

Harbormaster completed remote builds in B55780: Diff 133814.Feb 3 2024, 9:29 PM

bz added a child revision: D43753: LinuxKPI: 802.11: update the ni/lsta reference cycle.Feb 5 2024, 3:44 PM

Given D43389 is the FIRST of a series, is this the SECOND one or is there any dependence? Please help clarify.

sys/compat/linuxkpi/common/src/linux_80211.c
1212	No need to print a NULL when "ni->ni_drv_data == NULL".
1318	No need to print a NULL when "lvif->lvif_bss == NULL".

cc added inline comments.Feb 9 2024, 6:55 PM

sys/compat/linuxkpi/common/src/linux_80211.c
1605	No need to print a NULL when "lvif->lvif_bss == NULL".
1985	No need to print a NULL when "lvif->lvif_bss == NULL".
2122	No need to print a NULL when "lvif->lvif_bss == NULL".

In D43725#999194, @cc wrote:

Given D43389 is the FIRST of a series, is this the SECOND one or is there any dependence? Please help clarify.

I think there is dependence, as I applied this patch only, restarted netif, and hit the panic:

--- trap 0x9, rip = 0xffffffff80cf8661, rsp = 0xfffffe00ab111d00, rbp = 0xfffffe00ab111d10 ---
node_free() at node_free+0x11/frame 0xfffffe00ab111d10
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x27f/frame 0xfffffe00ab111d80
lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe00ab111df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x226/frame 0xfffffe00ab111e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe00ab111ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00ab111ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00ab111f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ab111f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100192 ]
Stopped at      kdb_enter+0x32: movq    $0,0xe394d3(%rip)
db>

In D43725#999202, @cc wrote:

In D43725#999194, @cc wrote:

Given D43389 is the FIRST of a series, is this the SECOND one or is there any dependence? Please help clarify.

I think there is dependence, as I applied this patch only, restarted netif, and hit the panic:

--- trap 0x9, rip = 0xffffffff80cf8661, rsp = 0xfffffe00ab111d00, rbp = 0xfffffe00ab111d10 ---
node_free() at node_free+0x11/frame 0xfffffe00ab111d10
lkpi_sta_auth_to_scan() at lkpi_sta_auth_to_scan+0x27f/frame 0xfffffe00ab111d80
lkpi_iv_newstate() at lkpi_iv_newstate+0x253/frame 0xfffffe00ab111df0
ieee80211_newstate_cb() at ieee80211_newstate_cb+0x226/frame 0xfffffe00ab111e40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe00ab111ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00ab111ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00ab111f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ab111f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100192 ]
Stopped at      kdb_enter+0x32: movq    $0,0xe394d3(%rip)
db>

With patches from D43389, D43725 and D43753, it looks "service netif restart" does not introduce panic now.

My initial test on the three patches of D43389, D43725, D43753 looks to be good. No more panics. And I need to figure out some issues in my testbed, so I give the approval first as I don't want my test to delay the schedule.

This revision is now accepted and ready to land.Feb 13 2024, 2:54 PM

cc mentioned this in D43753: LinuxKPI: 802.11: update the ni/lsta reference cycle.Feb 13 2024, 2:54 PM

Closed by commit rG2ac8a2189ac6: LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss) (authored by bz). · Explain WhyFeb 14 2024, 7:50 PM

This revision was automatically updated to reflect the committed changes.

bz marked 5 inline comments as done.

bz added a commit: rG2ac8a2189ac6: LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss).

bz added a commit: rG8c450ea1083b: LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss).Feb 18 2024, 9:12 PM

bz added a commit: rG184ccc414686: LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss).Feb 19 2024, 8:09 AM

bz added a commit: rGd4b4efc6db6c: LinuxKPI: 802.11: band-aid for invalid state changes after (*iv_update_bss).Feb 19 2024, 4:10 PM