The lookup for a IPv6 multicast addresses corresponding to the destination addresses in the datagram is protected by the NET_EPOCH section. Access to each PCB is protected by INP_RLOCK during comparing. But access to socket's so_options field is not protected. And in some cases it is possible, that PCB pointer is still valid, but inp_socket is not. The patch wides lock holding to protect access to inp_socket. It copies locking strategy from IPv4 UDP handling. Also fix lock and mbuf leak in udp_input() that can happen in error case, and modify UDP statistic accounting to be the same as for IPv6. PR: 232192
Details
I wrote small program to reproduce panic:
It joins to IPv6 multicast group and reads packets in the loop. Also it periodically closes and reopens socket.
With packet generator I produce needed multicast packets and with ~15kpps rate I can reproduce the panic within 20 seconds.
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 36333 Build 33222: arc lint + arc unit
Event Timeline
I believe the use of this boolean is the wrong approach. If you want to check and then lock you'll need to an an INP_RLOCKED similar to INP_WLOCKED which can be found in in_pcb.h.
Before epochification we had an integer variable that was used to track locking.
#define UH_WLOCKED 2 #define UH_RLOCKED 1 #define UH_UNLOCKED 0
And currently rwlock doesn't provide public function or macro similar to rw_wowned() for shared lock.
Yup, I understand that we do not currently have that function. I think you ought to add that as part of this fix.
I'm not expert in the locking internals, but as I understand from the code, it is not possible to determine that shared lock was held in the current thread. So, I can't just create RW_RLOCKED() macro, because some other thread can hold shared lock and only it should unlock the lock later, otherwise my unlock will decrease shared lock counter and this will lead to the problem. But I can be wrong.
Yes it is impossible to have RW_RLOCKED() functionality without adding significant overhead. Local boolean tracking the lock status is preferable over lock state interrogation anyway.
Missed one case when PCB could be unlocked before check - when PCB is
already dying and imo is NULL. Also remove extra IN6_IS_ADDR_MULTICAST(),
we already checked it few lines before.
The last patch seems to have fixed the panic for us. But I have doubts, probably we should take INP_RLOCK just before checking in6p_moptions. Any thoughts?
Let me remind what panic it is targeted to fix:
Fatal trap 12: page fault while in kernel mode cpuid = 8; apic id = 12 fault virtual address = 0xcc fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80d36a19 stack pointer = 0x0:0xfffffe015b813440 frame pointer = 0x0:0xfffffe015b813560 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq141: mlx5_core1) trap number = 12 panic: page fault cpuid = 8 time = 1611737705 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015b813100 vpanic() at vpanic+0x182/frame 0xfffffe015b813150 panic() at panic+0x43/frame 0xfffffe015b8131b0 trap_fatal() at trap_fatal+0x387/frame 0xfffffe015b813210 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe015b813260 trap() at trap+0x271/frame 0xfffffe015b813370 calltrap() at calltrap+0x8/frame 0xfffffe015b813370 --- trap 0xc, rip = 0xffffffff80d36a19, rsp = 0xfffffe015b813440, rbp = 0xfffffe015b813560 --- udp6_input() at udp6_input+0x759/frame 0xfffffe015b813560 ip6_input() at ip6_input+0xb3a/frame 0xfffffe015b813640 netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe015b813690 ether_demux() at ether_demux+0x138/frame 0xfffffe015b8136c0 ether_nh_input() at ether_nh_input+0x344/frame 0xfffffe015b813720 netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe015b813770 ether_input() at ether_input+0x69/frame 0xfffffe015b8137d0 ether_demux() at ether_demux+0x121/frame 0xfffffe015b813800 ether_nh_input() at ether_nh_input+0x344/frame 0xfffffe015b813860 netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe015b8138b0 ether_input() at ether_input+0x69/frame 0xfffffe015b813910 tcp_lro_queue_mbuf() at tcp_lro_queue_mbuf+0xca/frame 0xfffffe015b813940 mlx5e_rx_cq_comp() at mlx5e_rx_cq_comp+0xfd/frame 0xfffffe015b813a60 mlx5_cq_completion() at mlx5_cq_completion+0x90/frame 0xfffffe015b813ac0 mlx5_eq_int() at mlx5_eq_int+0xb0/frame 0xfffffe015b813b10 mlx5_msix_handler() at mlx5_msix_handler+0x15/frame 0xfffffe015b813b20 ithread_loop() at ithread_loop+0x24d/frame 0xfffffe015b813bb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe015b813bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015b813bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic ... #14 0xffffffff80efab8f in trap_pfault (frame=0xfffffe015b813380, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:732 #15 0xffffffff80efa1f1 in trap (frame=0xfffffe015b813380) at /usr/src/sys/amd64/amd64/trap.c:398 #16 <signal handler called> #17 udp6_input (mp=0xfffffe015b8135a8, offp=<optimized out>, proto=<optimized out>) at /usr/src/sys/netinet6/udp6_usrreq.c:418 #18 0xffffffff80d1caca in ip6_input (m=0xfffff80325811600) at /usr/src/sys/netinet6/ip6_input.c:931 #19 0xffffffff80c5be5a in netisr_dispatch_src (proto=6, source=<optimized out>, m=0xffffffff815742c0 <vnet_entry_udb>) at /usr/src/sys/net/netisr.c:1143 #20 0xffffffff80c4f548 in ether_demux (ifp=0xfffff80003d6b800, m=0x11) at /usr/src/sys/net/if_ethersubr.c:921 #21 0xffffffff80c508a4 in ether_input_internal (ifp=0xfffff80003d6b800, m=0x11) at /usr/src/sys/net/if_ethersubr.c:705 #22 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:735 ... kgdb) f 17 #17 udp6_input (mp=0xfffffe015b8135a8, offp=<optimized out>, proto=<optimized out>) at /usr/src/sys/netinet6/udp6_usrreq.c:418 418 /usr/src/sys/netinet6/udp6_usrreq.c: No such file or directory. (kgdb) i lo inp_locked = false pcblist = <optimized out> last = 0xfffff809753377e0 imo = <optimized out> fromsa = {{sin6_len = 28 '\034', sin6_family = 28 '\034', sin6_port = 56110, sin6_flowinfo = 0, sin6_addr = {__u6_addr = { __u6_addr8 = "\376\200", '\000' <repeats 12 times>, <incomplete sequence \365>, __u6_addr16 = {33022, 0, 0, 0, 0, 0, 0, 62720}, __u6_addr32 = {33022, 0, 0, 4110417920}}}, sin6_scope_id = 12}, {sin6_len = 28 '\034', sin6_family = 28 '\034', sin6_port = 56110, sin6_flowinfo = 0, sin6_addr = {__u6_addr = { __u6_addr8 = "\377\002", '\000' <repeats 12 times>, "\002\020", __u6_addr16 = {767, 0, 0, 0, 0, 0, 0, 4098}, __u6_addr32 = {767, 0, 0, 268566528}}}, sin6_scope_id = 12}} m = 0xfffff80325811600 off = <optimized out> ifp = <optimized out> uh = <optimized out> ip6 = <optimized out> plen = <optimized out> nxt = <optimized out> ulen = <optimized out> cscov_partial = 629216880 uh_sum = <optimized out> pcbinfo = <optimized out> fwd_tag = <optimized out> inp = 0xfffff809753377e0 up = <optimized out> ... (kgdb) p/x last->inp_flags2 $2 = 0x10 (kgdb) p/x last->in6p_moptions $3 = 0x0 (kgdb) p/x last->inp_socket $4 = 0x0 (kgdb) p offsetof (struct socket, so_options) $1 = (int *) 0xcc
Make locking similar to what we heave for IPv4.
Fix mbuf leaks and lock leaks for error case.