This will attempt to use a different thread/core on the same L2
cache when possible, or use the same cpu as the rx thread when not.
If SMP isn't enabled, don't go looking for cores to use. This is mostly
useful when using shared TX/RX queues.
Details
- Reviewers
sbruno mjg - Commits
- rS327013: Support attaching tx queues to cpus
Ensure tasks are bound to the current CPUs, test for performance
regressions.
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 13651 Build 13871: arc lint + arc unit
Event Timeline
I don't reach to apply the patch, and when I've tried to "manually" merge it, I've broke the compilation :-(
root@lame4:/usr/src # svn info Path: . Working Copy Root Path: /usr/src URL: https://svn.freebsd.org/base/head Relative URL: ^/head Repository Root: https://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 325763 Node Kind: directory Schedule: normal Last Changed Author: kib Last Changed Rev: 325759 Last Changed Date: 2017-11-13 11:45:31 +0100 (Mon, 13 Nov 2017) root@lame4:/usr/src # patch -E -p0 < D12446.patch Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |Index: sys/net/iflib.c |=================================================================== |--- sys/net/iflib.c |+++ sys/net/iflib.c -------------------------- Patching file sys/net/iflib.c using Plan A... Hunk #1 succeeded at 4978 (offset 85 lines). Hunk #2 succeeded at 4998 (offset 85 lines). Hunk #3 failed at 5170. Hunk #4 succeeded at 5026 with fuzz 2 (offset -76 lines). Hunk #5 failed at 5052. 2 out of 5 hunks failed--saving rejects to sys/net/iflib.c.rej done
Once I've applied this patch on my system (that is already patched with D11727 and D13096) it panic.
Boot message:
pcib4: <ACPI PCI-PCI bridge> irq 47 at device 2.0 numa-domain 0 on pci2 pci4: <ACPI PCI bus> numa-domain 0 on pcib4 pcib5: <ACPI PCI-PCI bridge> irq 47 at device 3.0 numa-domain 0 on pci2 pci5: <ACPI PCI bus> numa-domain 0 on pcib5 ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver> port 0x2020-0x203f mem 0x91d00000-0x91dfffff,0x91e04000-0x91e07fff irq 44 at device 0.0 numa-domain 0 on pci5 ix0: using 2048 tx descriptors and 2048 rx descriptors ix0: msix_init qsets capped at 32 ix0: pxm cpus: 12 queue msgs: 63 admincnt: 1 ix0: using 12 rx queues 12 tx queues ix0: Using MSIX interrupts with 13 vectors ix0: allocated for 12 queues ix0: allocated for 12 rx queues ix0: Ethernet address: 24:6e:96:5b:92:80 ix0: PCI Express Bus: Speed 5.0GT/s Width x8 ix0: netmap queues/slots: TX 12/2048, RX 12/2048 ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver> port 0x2000-0x201f mem 0x91c00000-0x91cfffff,0x91e00000-0x91e03fff irq 40 at device 0.1 numa-domain 0 on pci5 ix1: using 2048 tx descriptors and 2048 rx descriptors ix1: msix_init qsets capped at 32 ix1: pxm cpus: 12 queue msgs: 63 admincnt: 1 ix1: using 12 rx queues 12 tx queues ix1: Using MSIX interrupts with 13 vectors ix1: allocated for 12 queues ix1: allocated for 12 rx queues
Here it pauses during about 30seconds-1minute then panic:
spin lock 0xffffffff81d916b0 ((null)) held by 0xffffffff81d91960 (tid 0) too long panic: spin lock held too long cpuid = 22 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff821cff80 vpanic() at vpanic+0x19c/frame 0xffffffff821d0000 panic() at panic+0x43/frame 0xffffffff821d0060 _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x339/frame 0xffffffff821d00d0 turnstile_trywait() at turnstile_trywait+0xd3/frame 0xffffffff821d0100 __mtx_lock_sleep() at __mtx_lock_sleep+0xd9/frame 0xffffffff821d0190 bpfattach2() at bpfattach2+0x146/frame 0xffffffff821d01d0 ether_ifattach() at ether_ifattach+0xe4/frame 0xffffffff821d0210 iflib_device_register() at iflib_device_register+0x2706/frame 0xffffffff821d0550 iflib_device_attach() at iflib_device_attach+0xb7/frame 0xffffffff821d0580 device_attach() at device_attach+0x3f5/frame 0xffffffff821d05d0 bus_generic_attach() at bus_generic_attach+0x5c/frame 0xffffffff821d0600 pci_attach() at pci_attach+0xd5/frame 0xffffffff821d0640 device_attach() at device_attach+0x3f5/frame 0xffffffff821d0690 bus_generic_attach() at bus_generic_attach+0x5c/frame 0xffffffff821d06c0 acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0xa1/frame 0xffffffff821d0700 device_attach() at device_attach+0x3f5/frame 0xffffffff821d0750 bus_generic_attach() at bus_generic_attach+0x5c/frame 0xffffffff821d0780 pci_attach() at pci_attach+0xd5/frame 0xffffffff821d07c0 device_attach() at device_attach+0x3f5/frame 0xffffffff821d0810 bus_generic_attach() at bus_generic_attach+0x5c/frame 0xffffffff821d0840 acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x3bc/frame 0xffffffff821d08b0 device_attach() at device_attach+0x3f5/frame 0xffffffff821d0900 bus_generic_attach() at bus_generic_attach+0x5c/frame 0xffffffff821d0930 acpi_attach() at acpi_attach+0xe85/frame 0xffffffff821d09e0 device_attach() at device_attach+0x3f5/frame 0xffffffff821d0a30 bus_generic_attach() at bus_generic_attach+0x5c/frame 0xffffffff821d0a60 nexus_acpi_attach() at nexus_acpi_attach+0x73/frame 0xffffffff821d0a90 device_attach() at device_attach+0x3f5/frame 0xffffffff821d0ae0 bus_generic_new_pass() at bus_generic_new_pass+0x118/frame 0xffffffff821d0b10 root_bus_configure() at root_bus_configure+0x77/frame 0xffffffff821d0b40 configure() at configure+0x9/frame 0xffffffff821d0b50 mi_startup() at mi_startup+0x9c/frame 0xffffffff821d0b70 btext() at btext+0x2c KDB: enter: panic [ thread pid 0 tid 100000 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why
Do you want a panic dump with debug symbol ?
So, it looks like I can't reproduce this, so yes please. I'm guessing something is holding BPF_LOCK(), but I have no clue what that would be.
How to dump a panic when the kernel crash during boot before loading disk controller drivers ?
Can I compile a kernel with .debug embedded into the kernel ?
OK show dumpdev /dev/da1s1b OK boot /boot/kernel/kernel text=0x14e7ab8 data=0x157688+0x4872f0 syms=[0x8+0x16f140+0x8+0x18bb14] Booting... GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2017 The FreeBSD Project. (...) ix1: Using MSIX interrupts with 13 vectors ix1: allocated for 12 queues ix1: allocated for 12 rx queues spin lock 0xffffffff81d916b0 ((null)) held by 0xffffffff81d91960 (tid 0) too long panic: spin lock held too long cpuid = 22 time = 1 KDB: stack backtrace: (...) btext() at btext+0x2c KDB: enter: panic [ thread pid 0 tid 100000 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> dump Cannot dump: no dump device specified.
Hrm... you could build a kernel without ix in it and kldload if_ix after the system is up... though that may cause the issue to not occur.
Ok, I've removed ix drivers, but now it crash after igb drivers, then I've removed em(4) and here is the back-trace after a panic created by loading if_em:
[root@r630]/data# kgdb /usr/lib/debug/boot/kernel/kernel.debug /data/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: mtx_lock() of spin mutex (null) @ /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_unit.c:642 cpuid = 1 time = 1511179338 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe3fdda80680 vpanic() at vpanic+0x19c/frame 0xfffffe3fdda80700 kassert_panic() at kassert_panic+0x126/frame 0xfffffe3fdda80770 __mtx_lock_flags() at __mtx_lock_flags+0x162/frame 0xfffffe3fdda807c0 alloc_unr() at alloc_unr+0x25/frame 0xfffffe3fdda807e0 pipe_stat() at pipe_stat+0xa7/frame 0xfffffe3fdda80830 kern_fstat() at kern_fstat+0xa9/frame 0xfffffe3fdda80880 sys_fstat() at sys_fstat+0x1d/frame 0xfffffe3fdda80980 amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe3fdda80ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe3fdda80ab0 --- syscall (551, FreeBSD ELF64, sys_fstat), rip = 0x800fde1da, rsp = 0x7fffffffe218, rbp = 0x7fffffffe2c0 --- KDB: enter: panic Reading symbols from /data/debug/boot/kernel/if_em.ko.debug...done. Loaded symbols for /data/debug/boot/kernel/if_em.ko.debug #0 doadump (textdump=0) at pcpu.h:232 232 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump (textdump=0) at pcpu.h:232 #1 0xffffffff80397dbb in db_dump (dummy=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/ddb/db_command.c:572 #2 0xffffffff80397b79 in db_command (cmd_table=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/ddb/db_command.c:479 #3 0xffffffff80397914 in db_command_loop () at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/ddb/db_command.c:532 #4 0xffffffff8039ab9f in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/ddb/db_main.c:248 #5 0xffffffff80a3f3a3 in kdb_trap (type=3, code=-61456, tf=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_kdb.c:654 #6 0xffffffff80ea57db in trap (frame=0xfffffe3fdda805b0) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/amd64/amd64/trap.c:536 #7 0xffffffff80e85021 in calltrap () at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/amd64/amd64/exception.S:237 #8 0xffffffff80a3eacb in kdb_enter (why=0xffffffff813e3c8c "panic", msg=<value optimized out>) at cpufunc.h:63 #9 0xffffffff809fb859 in vpanic (fmt=<value optimized out>, ap=0xfffffe3fdda80740) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_shutdown.c:793 #10 0xffffffff809fb696 in kassert_panic (fmt=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_shutdown.c:690 #11 0xffffffff809dab02 in __mtx_lock_flags (c=0xffffffff81d01f30, opts=0, file=0xffffffff813eda4b "/usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_unit.c", line=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_mutex.c:242 #12 0xffffffff80a58ea5 in alloc_unr (uh=0xfffff8012ab5c400) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_unit.c:642 #13 0xffffffff80a68757 in pipe_stat (fp=0xfffff80135dd0050, ub=0xfffffe3fdda80898, active_cred=0xfffff801354fc600, td=0xfffff80135c2f000) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/sys_pipe.c:1534 #14 0xffffffff809a6a69 in kern_fstat (td=0xfffff80135c2f000, fd=<value optimized out>, sbp=0xfffffe3fdda80898) at file.h:339 #15 0xffffffff809a6b6d in sys_fstat (td=<value optimized out>, uap=0xfffff80135c2f3b0) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_descrip.c:1341 #16 0xffffffff80ea6d2b in amd64_syscall (td=0xfffff80135c2f000, traced=0) at subr_syscall.c:132 #17 0xffffffff80e8537b in Xfast_syscall () at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/amd64/amd64/exception.S:419 #18 0x0000000800fde1da in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal
With kldload if_ix panics too:
Fatal trap 12: page fault while in kernel mode
Fatal trap 18: integer divide fault while in kernel mode
cpuid = 36; cpuid = 19; apic id = 17
instruction pointer = 0x20:0xffffffff80b32d99
stack pointer = 0x28:0xfffffe0466e86000
frame pointer = 0x28:0xfffffe0466e86020
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
apic id = 32
fault virtual address = 0x98
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80d94074
stack pointer = 0x28:0xfffffe0466e1d700
frame pointer = 0x28:0xfffffe0466e1d750
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 478 (devd)
curthread () at ./machine/pcpu.h:232
232 asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) bt
#0 __curthread () at ./machine/pcpu.h:232
#1 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:347
#2 0xffffffff8039eaab in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:572
#3 0xffffffff8039e869 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /usr/src/sys/ddb/db_command.c:479
#4 0xffffffff8039e604 in db_command_loop () at /usr/src/sys/ddb/db_command.c:532
#5 0xffffffff803a18ef in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:248
#6 0xffffffff80a61aa3 in kdb_trap (type=12, code=0, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#7 0xffffffff80f26770 in trap_fatal (frame=0xfffffe0466e1d640, eva=152) at /usr/src/sys/amd64/amd64/trap.c:794
#8 0xffffffff80f26869 in trap_pfault (frame=0xfffffe0466e1d640, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:653
#9 0xffffffff80f260b9 in trap (frame=0xfffffe0466e1d640) at /usr/src/sys/amd64/amd64/trap.c:420
#10 <signal handler called>
#11 uma_zfree_arg (zone=0x0, item=0xfffff8003e863300, udata=0x0) at /usr/src/sys/vm/uma_core.c:2611
#12 0xffffffff80a7fc76 in selrescan (td=0xfffff8003ea20560, ibits=<optimized out>, obits=<optimized out>) at /usr/src/sys/kern/sys_generic.c:1270
#13 kern_select (td=<optimized out>, nd=6, fd_in=<optimized out>, fd_ou=0x0, fd_ex=<optimized out>, tvp=0xfffff8003ef30000, abi_nfdbits=<optimized out>) at /usr/src/sys/kern/sys_generic.c:1136
#14 0xffffffff80a7ffa6 in sys_select (td=0xfffff8003ea20560, uap=0xfffff8003ea20910) at /usr/src/sys/kern/sys_generic.c:945
#15 0xffffffff80f277d7 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:132
#16 amd64_syscall (td=0xfffff8003ea20560, traced=0) at /usr/src/sys/amd64/amd64/trap.c:915
#17 <signal handler called>
#18 0x000000000046861a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffc8d8
And you try this after r326033? These issues could have been caused by the memory corruption fixed in that commit.
Still crashing (r326034), but more understanding backtrace:
(kgdb) bt #0 doadump (textdump=0) at pcpu.h:232 #1 0xffffffff80397fdb in db_dump (dummy=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/ddb/db_command.c:572 #2 0xffffffff80397d99 in db_command (cmd_table=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/ddb/db_command.c:479 #3 0xffffffff80397b34 in db_command_loop () at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/ddb/db_command.c:532 #4 0xffffffff8039adbf in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/ddb/db_main.c:248 #5 0xffffffff80a3f693 in kdb_trap (type=3, code=-61456, tf=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_kdb.c:654 #6 0xffffffff80ea57db in trap (frame=0xfffffe3fdd9c6030) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/amd64/amd64/trap.c:538 #7 0xffffffff80e85441 in calltrap () at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/amd64/amd64/exception.S:237 #8 0xffffffff80a3edbb in kdb_enter (why=0xffffffff813e3dbc "panic", msg=<value optimized out>) at cpufunc.h:65 #9 0xffffffff809fbad9 in vpanic (fmt=<value optimized out>, ap=0xfffffe3fdd9c61c0) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_shutdown.c:803 #10 0xffffffff809fbb63 in panic (fmt=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_shutdown.c:741 #11 0xffffffff809db9a2 in _mtx_lock_spin_cookie (c=<value optimized out>, v=<value optimized out>, opts=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_mutex.c:685 #12 0xffffffff809db5d8 in __mtx_lock_spin_flags (c=0xffffffff81d01878, opts=0, file=0xffffffff813ed54c "/usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_turnstile.c", line=541) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_mutex.c:316 #13 0xffffffff80e951ee in pmap_delayed_invl_finished () at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/amd64/amd64/pmap.c:494 #14 0xffffffff80e93801 in pmap_remove (pmap=0xffffffff81e72408, sva=<value optimized out>, eva=18446741875211034624) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/amd64/amd64/pmap.c:3985 #15 0xffffffff80d17b96 in kmem_unback (object=0xffffffff81e22310, addr=18446741875210051584, size=983040) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/vm/vm_kern.c:402 #16 0xffffffff80d182e3 in kmem_free (vmem=0xffffffff81d02840, addr=18446741875210051584, size=983040) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/vm/vm_kern.c:426 #17 0xffffffff80d10f90 in uma_large_free (slab=0xfffff80159bf2d90) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/vm/uma_core.c:1169 #18 0xffffffff809d6716 in free (addr=0xfffffe001f37e000, mtp=0xffffffff81937b00) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_malloc.c:593 #19 0xffffffff80a30604 in device_set_driver (dev=0xfffff801297c2900, driver=0x0) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_bus.c:2766 #20 0xffffffff80a30425 in device_probe_child (dev=0xfffff801297c2c00, child=0xfffff801297c2900) at device_if.h:108 #21 0xffffffff80a31078 in device_probe (dev=0xfffff801297c2900) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_bus.c:2836 #22 0xffffffff80a31132 in device_probe_and_attach (dev=0xfffff801297c2900) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_bus.c:2860 #23 0xffffffff8067cb99 in pci_driver_added (dev=0xfffff801297c2c00, driver=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/dev/pci/pci.c:4367 #24 0xffffffff80a2f16d in devclass_driver_added (dc=0xfffff80126c55a80, driver=0xffffffff82454598) at bus_if.h:204 #25 0xffffffff80a2f094 in devclass_add_driver (dc=0xfffff80126c55a80, driver=0xffffffff82454598, pass=2147483647, dcp=0xffffffff82454c10) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/subr_bus.c:1173 #26 0xffffffff809d9880 in module_register_init (arg=0xffffffff82454550) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_module.c:123 #27 0xffffffff809cc8d8 in linker_load_module (kldname=<value optimized out>, modname=0xfffff8013522ec00 "if_em", parent=0x0, verinfo=<value optimized out>, lfpp=0xfffffe3fdd9c6918) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_linker.c:234 #28 0xffffffff809ce011 in kern_kldload (td=<value optimized out>, file=<value optimized out>, fileid=0xfffffe3fdd9c6964) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_linker.c:1069 #29 0xffffffff809ce13b in sys_kldload (td=0xfffff80135c49560, uap=<value optimized out>) at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/kern/kern_linker.c:1095 #30 0xffffffff80ea6d2b in amd64_syscall (td=0xfffff80135c49560, traced=0) at subr_syscall.c:134 #31 0xffffffff80e8579b in Xfast_syscall () at /usr/local/BSDRP/TESTING/FreeBSD/src/sys/amd64/amd64/exception.S:419 #32 0x000000080086c3ca in ?? () Previous frame inner to this frame (corrupt stack?)
I've got a new panic (head with WITNESS and INVARIANTS enabled):
ix1: allocated for 12 queues ix1: allocated for 12 rx queues panic: Lock (rw) ifnet_rw not locked @ /usr/local/BSDRP/TESTING/FreeBSD/src/sys/net/if.c:262. cpuid = 22 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff824a6f20 vpanic() at vpanic+0x19c/frame 0xffffffff824a6fa0 kassert_panic() at kassert_panic+0x126/frame 0xffffffff824a7010 witness_assert() at witness_assert+0x3c8/frame 0xffffffff824a7070 _rw_runlock_cookie() at _rw_runlock_cookie+0x43/frame 0xffffffff824a70a0 if_attach_internal() at if_attach_internal+0x8c/frame 0xffffffff824a70f0
It's this code (prefixed with line number):
255 struct ifnet * 256 ifnet_byindex(u_short idx) 257 { 258 struct ifnet *ifp; 259 260 IFNET_RLOCK_NOSLEEP(); 261 ifp = ifnet_byindex_locked(idx); 262 IFNET_RUNLOCK_NOSLEEP(); 263 return (ifp); 264 } 265
Would it be possible to get the output of the ddb "ps", "show all locks", and "show witness" commands? This is just weird.
Would it be possible to get the output of the ddb "ps", "show all locks", and "show witness" commands? This is just weird.
I think it's not related to your patch but the current state of -head has a locking problem (cf r326111 " rwlock: unbreak WITNESS builds after r326110" as example).
Then I've refreshed my source tree to r326116, but now I can't no more write the dump core:
[root@r630]~# kldload if_em igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> mem 0x91a80000-0x91afffff,0x91b04000-0x91b07fff irq 19 at device 0.0 numa-domain 0 on pci8 igb0: attach_pre capping queues at 8 igb0: using 1024 tx descriptors and 1024 rx descriptors igb0: msix_init qsets capped at 8 igb0: pxm cpus: 12 queue msgs: 9 admincnt: 1 igb0: using 8 rx queues 8 tx queues igb0: Using MSIX interrupts with 9 vectors igb0: allocated for 8 tx_queues igb0: allocated for 8 rx_queues igb0: Ethernet address: 24:6e:96:5b:92:84 igb0: netmap queues/slots: TX 8/1024, RX 8/1024 igb1: <Intel(R) PRO/1000 PCI-Express Network Driver> mem 0x91a00000-0x91a7ffff,0x91b00000-0x91b03fff irq 18 at device 0.1 numa-domain 0 on pci8 igb1: attach_pre capping queues at 8 igb1: using 1024 tx descriptors and 1024 rx descriptors igb1: msix_init qsets capped at 8 igb1: pxm cpus: 12 queue msgs: 9 admincnt: 1 igb1: using 8 rx queues 8 tx queues igb1: Using MSIX interrupts with 9 vectors igb1: allocated for 8 tx_queues igb1: allocated for 8 rx_queues igb1: Ethernet address: 24:6e:96:5b:92:85 igb1: netmap queues/slots: TX 8/1024, RX 8/1024 Fatal trap 9: general protection fault while in kernel mode cpuid = 8; apic id = 14 instruction pointer = 0x20:0xffffffff809dba17 stack pointer = 0x28:0xfffffe3fddac11f0 frame pointer = 0x28:0xfffffe3fddac1250 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 73741 (kldload) [ thread pid 73741 tid 100298 ] Stopped at _mtx_lock_spin_cookie+0x317: movl 0x9c(%rbx),%r8d db> dump Dumping 7901 out of 262018 MB:panic: _mtx_lock_sleep: recursed on non-recursive mutex mrsas_sim_lock @ /usr/local/BSDRP/TESTING/FreeBSD/src/sys/dev/mrsas/mrsas_cam.c:1322 cpuid = 8 time = 1511428330 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe3fddac0580 vpanic() at vpanic+0x19c/frame 0xfffffe3fddac0600 kassert_panic() at kassert_panic+0x126/frame 0xfffffe3fddac0670 __mtx_lock_sleep() at __mtx_lock_sleep+0x414/frame 0xfffffe3fddac0700 __mtx_lock_flags() at __mtx_lock_flags+0xf9/frame 0xfffffe3fddac0750 mrsas_cmd_done() at mrsas_cmd_done+0x32/frame 0xfffffe3fddac0780 mrsas_complete_cmd() at mrsas_complete_cmd+0x16f/frame 0xfffffe3fddac07f0 mrsas_cam_poll() at mrsas_cam_poll+0x2a/frame 0xfffffe3fddac0810 xpt_polled_action() at xpt_polled_action+0x1d4/frame 0xfffffe3fddac0870 dadump() at dadump+0x116/frame 0xfffffe3fddac0ae0 dump_append() at dump_append+0xa5/frame 0xfffffe3fddac0b00 blk_write() at blk_write+0x28b/frame 0xfffffe3fddac0b40 minidumpsys() at minidumpsys+0x959/frame 0xfffffe3fddac0c00 dumpsys_generic() at dumpsys_generic+0x35/frame 0xfffffe3fddac0cd0
Do you known working revision I can use ?
I'm using r326033 without any panics under INVARIANTS+WITNESS on my dev system, but it only has four cores.
Still the same problem: I've svnuped to r326359 and I've still have this panic (only when this patch is applied):
[root@r630]~# uname -a FreeBSD r630 12.0-CURRENT FreeBSD 12.0-CURRENT r326359M amd64 [root@r630]~# kldload if_ix ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver> port 0x2020-0x203f mem 0x91d00000-0x91dfffff,0x91e04000-0x91e07fff irq 44 at device 0.0 numa-domain 0 on pci5 ix0: using 2048 tx descriptors and 2048 rx descriptors ix0: msix_init qsets capped at 32 ix0: pxm cpus: 12 queue msgs: 63 admincnt: 1 ix0: queue equality override not set, capping rx_queues at 12 and tx_queues at 12 ix0: using 12 rx queues 12 tx queues ix0: Using MSIX interrupts with 13 vectors ix0: allocated for 12 queues ix0: allocated for 12 rx queues ix0: Ethernet address: 24:6e:96:5b:92:80 ix0: PCI Express Bus: Speed 5.0GT/s Width x8 ix0: netmap queues/slots: TX 12/2048, RX 12/2048 ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver> port 0x2000-0x201f mem 0x91c00000-0x91cfffff,0x91e00000-0x91e03fff irq 40 at device 0.1 numa-domain 0 on pci5 ix1: using 2048 tx descriptors and 2048 rx descriptors ix1: msix_init qsets capped at 32 ix1: pxm cpus: 12 queue msgs: 63 admincnt: 1 ix1: queue equality override not set, capping rx_queues at 12 and tx_queues at 12 ix1: using 12 rx queues 12 tx queues ix1: Using MSIX interrupts with 13 vectors ix1: allocated for 12 queues ix1: allocated for 12 rx queues panic: Lock (rw) ifnet_rw not locked @ /usr/local/BSDRP/TESTING/FreeBSD/src/sys/net/if.c:262. cpuid = 16 time = 1511953340 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe018b72af00 vpanic() at vpanic+0x19c/frame 0xfffffe018b72af80 kassert_panic() at kassert_panic+0x126/frame 0xfffffe018b72aff0 witness_assert() at witness_assert+0x3c8/frame 0xfffffe018b72b050 _rw_runlock_cookie_int() at _rw_runlock_cookie_int+0x49/frame 0xfffffe018b72b090 if_attach_internal() at if_attach_internal+0x8c/frame 0xfffffe018b72b0e0 ether_ifattach() at ether_ifattach+0x20/frame 0xfffffe018b72b110 iflib_device_register() at iflib_device_register+0x2ddc/frame 0xfffffe018b72b450 iflib_device_attach() at iflib_device_attach+0xb7/frame 0xfffffe018b72b480 device_attach() at device_attach+0x3f7/frame 0xfffffe018b72b4d0 device_probe_and_attach() at device_probe_and_attach+0x71/frame 0xfffffe018b72b500 pci_driver_added() at pci_driver_added+0xe9/frame 0xfffffe018b72b540 devclass_driver_added() at devclass_driver_added+0x7d/frame 0xfffffe018b72b580 devclass_add_driver() at devclass_add_driver+0x144/frame 0xfffffe018b72b5c0 module_register_init() at module_register_init+0xc0/frame 0xfffffe018b72b5f0 linker_load_module() at linker_load_module+0xb78/frame 0xfffffe018b72b900 kern_kldload() at kern_kldload+0xf1/frame 0xfffffe018b72b950 sys_kldload() at sys_kldload+0x5b/frame 0xfffffe018b72b980 amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe018b72bab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe018b72bab0 --- syscall (304, FreeBSD ELF64, sys_kldload), rip = 0x80086c3ca, rsp = 0x7fffffffe5e8, rbp = 0x7fffffffeb60 --- KDB: enter: panic [ thread pid 49349 tid 100319 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> dump Dumping 7311 out of 262018 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete db> show all locks Process 49349 (kldload) thread 0xfffff801af10d560 (100319) panic: witness_ddb_list: witness_cold cpuid = 16 time = 1511953340 Uptime: 58s
Adding mjg, since
panic: Lock (rw) ifnet_rw not locked @ /usr/local/BSDRP/TESTING/FreeBSD/src/sys/net/if.c:262.
here:
260 IFNET_RLOCK_NOSLEEP(); 261 ifp = ifnet_byindex_locked(idx); 262 IFNET_RUNLOCK_NOSLEEP();
Looks like it may be related to recent changes in kern/kern_rwlock.c... or at least he may have some ideas about how to debug this.
The ctx->ifc_cpus initialization change has been committed as
r326369, update patch to remove that change.
Fix non-SMP build
Remove cpu argument to find_nth() since it's not required anymore.
Use CPU_FIRST() instead of 0 for the non-SMP cpu id.
Same problem on fresh r326378 "panic: Lock (rw) ifnet_rw not locked ".
I can fix this problem by adding kern.smp.disabled="1" into the boot/loader.
Without witness it panics too:
Fatal trap 9: general protection fault while in kernel mode cpuid = 42; apic id = 38 instruction pointer = 0x20:0xffffffff80cb4c31 stack pointer = 0x28:0xfffffe010b7ed030 frame pointer = 0x28:0xfffffe010b7ed050 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1117 (kldload) __curthread () at ./machine/pcpu.h:223 223 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 __curthread () at ./machine/pcpu.h:223 #1 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:349 #2 0xffffffff8039a41b in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:572 #3 0xffffffff8039a1d9 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /usr/src/sys/ddb/db_command.c:479 #4 0xffffffff80399f74 in db_command_loop () at /usr/src/sys/ddb/db_command.c:532 #5 0xffffffff8039d25f in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:248 #6 0xffffffff80a5ebe3 in kdb_trap (type=9, code=0, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:660 #7 0xffffffff80f2d800 in trap_fatal (frame=0xfffffe010b7ecf70, eva=0) at /usr/src/sys/amd64/amd64/trap.c:796 #8 0xffffffff80f2cf0d in trap (frame=0xfffffe010b7ecf70) at /usr/src/sys/amd64/amd64/trap.c:202 #9 <signal handler called> #10 atomic_read (v=<optimized out>) at /usr/src/sys/compat/linuxkpi/common/include/asm/atomic.h:93 #11 linux_queue_work_on (cpu=256, wq=0x202000000000002, work=0xfffff80200000058) at /usr/src/sys/compat/linuxkpi/common/src/linux_work.c:139 #12 0xffffffff80cdf95d in inetaddr_event (this=<optimized out>, event=<optimized out>, ptr=0xfffff801c7ff6800) at /usr/src/sys/compat/linuxkpi/common/include/asm/atomic.h:75 #13 0xffffffff80b0e73c in if_attach_internal (ifp=0xfffff801c7ff6800, vmove=<optimized out>, ifc=<optimized out>) at /usr/src/sys/net/if.c:806 #14 0xffffffff80b18963 in ether_ifattach (ifp=0xfffff801c7ff6800, lla=0xfffff8010a7bdf68 "\220\342\272\230\200\240") at /usr/src/sys/net/if_ethersubr.c:903 #15 0xffffffff80b27036 in iflib_device_register (dev=<optimized out>, sc=<optimized out>, sctx=<optimized out>, ctxp=0xfffffe010b7ed460) at /usr/src/sys/net/iflib.c:4327 #16 0xffffffff80b27e27 in iflib_device_attach (dev=0xfffff8010ad6d800) at /usr/src/sys/net/iflib.c:4365 #17 0xffffffff80a50dc5 in DEVICE_ATTACH (dev=0xfffff8010ad6d800) at ./device_if.h:180 #18 device_attach (dev=0xfffff8010ad6d800) at /usr/src/sys/kern/subr_bus.c:2911 #19 0xffffffff80a509b2 in device_probe_and_attach (dev=0xfffff8010ad6d800) at /usr/src/sys/kern/subr_bus.c:2869 #20 0xffffffff80682c79 in pci_driver_added (dev=0xfffff8010ad6d900, driver=<optimized out>) at /usr/src/sys/dev/pci/pci.c:4369 #21 0xffffffff80a4eaad in BUS_DRIVER_ADDED (_dev=<optimized out>, _driver=0xffffffff82c9d420 <ix_driver>) at ./bus_if.h:204 #22 devclass_driver_added (dc=0xfffff80105e73c80, driver=0xffffffff82c9d420 <ix_driver>) at /usr/src/sys/kern/subr_bus.c:1102 #23 0xffffffff80a4ea15 in devclass_add_driver (dc=0xfffff80105e73c80, driver=0xffffffff82c9d420 <ix_driver>, pass=2147483647, dcp=0xffffffff82c9dac0 <ix_devclass>) at /usr/src/sys/kern/subr_bus.c:1175 #24 0xffffffff809f5c84 in module_register_init (arg=0xffffffff82c9d3d8 <ix_pci_mod>) at /usr/src/sys/kern/kern_module.c:125 #25 0xffffffff809e972f in linker_file_sysinit (lf=<optimized out>) at /usr/src/sys/kern/kern_linker.c:236 #26 linker_load_file (filename=<optimized out>, result=<optimized out>) at /usr/src/sys/kern/kern_linker.c:462 #27 linker_load_module (kldname=<optimized out>, modname=0xfffff801b9879000 "if_ix", parent=0x0, verinfo=<optimized out>, lfpp=<optimized out>) at /usr/src/sys/kern/kern_linker.c:2092 #28 0xffffffff809eaf95 in kern_kldload (td=<optimized out>, file=<optimized out>, fileid=0xfffffe010b7ed964) at /usr/src/sys/kern/kern_linker.c:1071 #29 0xffffffff809eb0db in sys_kldload (td=0xfffff801c7180560, uap=<optimized out>) at /usr/src/sys/kern/kern_linker.c:1097 #30 0xffffffff80f2e8b8 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:134 #31 amd64_syscall (td=0xfffff801c7180560, traced=0) at /usr/src/sys/amd64/amd64/trap.c:917 #32 <signal handler called> #33 0x000000080087228a in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffe368
Well, that wq value certainly looks corrupted, quite possible with a cpuset_t value... perfect, thanks.
We can't call smp_topo() more than once.
Use cpu_top from sched_ule instead of calling smp_topo() ourselves.
Other schedulers will now use the old logic for choosing cores, only
SCHED_ULE is thread-aware.
Yes it's work now, and here are the result: This patch greatly improve D11727 performance!
x head r326964: inet4 packets-per-second + head r326964 with D11727: inet4 packets-per-second * head r326964 with D11727 and D12446: inet4 packets-per-second +--------------------------------------------------------------------------+ | * | |+ + + ++ x x *x* * *| | |_MA__| | ||____M__A______| | | |________AM________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 5189413 5269210 5214772 5224912 34842.015 + 5 4663727 4861937 4717880 4755861.2 90276.033 Difference at 95.0% confidence -469051 +/- 99792.7 -8.9772% +/- 1.88859% (Student's t, pooled s = 68424.1) * 5 5246412 5541188 5359188 5354578.6 117994.53 Difference at 95.0% confidence 129667 +/- 126879 2.4817% +/- 2.43323% (Student's t, pooled s = 86996.2)
OK: This patch improves ixgbe performance with 8core Xeon, but on a 4core Atom with igb, I've got a big degradation (almost half performance):
x FreeBSD head r327012: inet4 packet-per-second forwarded (igb and 4-core Atom) + FreeBSD head r327017: inet4 packet-per-second forwarded (igb and 4-core Atom) +--------------------------------------------------------------------------+ |+ | |+ | |+ xx| |++ x xx| | |A|| |A| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 909219.5 923241 920470 918943.1 5603.9679 + 5 512049.5 515865 512374 513016.7 1615.5548 Difference at 95.0% confidence -405926 +/- 6014.59 -44.1732% +/- 0.395144% (Student's t, pooled s = 4123.98)
Do you need flamegraph output on these 2 revisions ?