This update brings both rack and bbr into sync with the NF tree. There have been
many advancements in the rack stack (pacing, fixed rate pacing, performance improvements etc). As
well as some common fixes between bbr and rack. Both rack and bbr have been broken in
head since the lro changes went in, this fixes that as well so that now lro and bbr/rack work
nicely together and for rack the compressed ack feature adds some good amount of performance.
Details
Bring up the stack and of course make sure it works.
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Thanks! This patch fixes the panic I have met with BBR each time my system was initiating a TCP session:
Unread portion of the kernel message buffer: stack pointer = 0x28:0xfffffe028aa5b7d0 frame pointer = 0x28:0xfffffe028aa5b850 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_54) trap number = 12 panic: page fault cpuid = 54 time = 1619714192 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe028aa5b480 vpanic() at vpanic+0x181/frame 0xfffffe028aa5b4d0 panic() at panic+0x43/frame 0xfffffe028aa5b530 trap_fatal() at trap_fatal+0x387/frame 0xfffffe028aa5b590 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe028aa5b5f0 trap() at trap+0x27d/frame 0xfffffe028aa5b700 calltrap() at calltrap+0x8/frame 0xfffffe028aa5b700 --- trap 0xc, rip = 0xffffffff82bc42e5, rsp = 0xfffffe028aa5b7d0, rbp = 0xfffffe028aa5b850 --- ctf_process_inbound_raw() at ctf_process_inbound_raw+0x345/frame 0xfffffe028aa5b850 ctf_do_queued_segments() at ctf_do_queued_segments+0x3e/frame 0xfffffe028aa5b870 tcp_lro_flush() at tcp_lro_flush+0xb6a/frame 0xfffffe028aa5b910 tcp_lro_flush_all() at tcp_lro_flush_all+0x18b/frame 0xfffffe028aa5b970 iflib_rxeof() at iflib_rxeof+0xc82/frame 0xfffffe028aa5ba80 _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe028aa5bac0 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe028aa5bb40 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xac/frame 0xfffffe028aa5bb70 fork_exit() at fork_exit+0x7d/frame 0xfffffe028aa5bbb0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe028aa5bbb0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 5h34m24s Dumping 7988 out of 261972 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=1) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80bf511a in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486 #3 0xffffffff80bf55a0 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:919 #4 0xffffffff80bf53a3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:843 #5 0xffffffff8108d2a7 in trap_fatal (frame=0xfffffe028aa5b710, eva=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:915 #6 0xffffffff8108d2ff in trap_pfault (frame=frame@entry=0xfffffe028aa5b710, usermode=false, signo=<optimized out>, signo@entry=0x0, ucode=<optimized out>, ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:732 #7 0xffffffff8108c95d in trap (frame=0xfffffe028aa5b710) at /usr/src/sys/amd64/amd64/trap.c:398 #8 <signal handler called> #9 tcp_fields_to_host (th=0x0) at /usr/src/sys/netinet/tcp_var.h:1104 #10 ctf_process_inbound_raw (tp=0xfffffe01d20044d8, so=so@entry=0xfffff8309820b3b0, m=0xfffff8209879c300, has_pkt=0) at /usr/src/sys/modules/tcp/bbr/../../../netinet/tcp_stacks/rack_bbr_common.c:396 #11 0xffffffff82bc449e in ctf_do_queued_segments (so=0xfffff8309820b3b0, tp=0xfffffe01d20044d8, have_pkt=-1736850688) at /usr/src/sys/modules/tcp/bbr/../../../netinet/tcp_stacks/rack_bbr_common.c:457 #12 0xffffffff80dbf67a in tcp_lro_flush_tcphpts (lc=0xfffff801028d8930, le=<optimized out>) at /usr/src/sys/netinet/tcp_lro.c:1316 #13 tcp_lro_flush (lc=lc@entry=0xfffff801028d8930, le=0xfffffe01d35f0658) at /usr/src/sys/netinet/tcp_lro.c:1333 #14 0xffffffff80dbf96b in tcp_lro_rx_done (lc=0xfffff801028d8930) at /usr/src/sys/netinet/tcp_lro.c:557 #15 tcp_lro_flush_all (lc=lc@entry=0xfffff801028d8930) at /usr/src/sys/netinet/tcp_lro.c:1484 #16 0xffffffff80d2ab12 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffff801028d8900, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3057 #17 0xffffffff80d25362 in _task_fn_rx (context=0xfffff801028d8900) at /usr/src/sys/net/iflib.c:3989 #18 0xffffffff80c42b0d in gtaskqueue_run_locked ( queue=queue@entry=0xfffff801026ef200) at /usr/src/sys/kern/subr_gtaskqueue.c:371 #19 0xffffffff80c427ac in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffffe01d33c0518) at /usr/src/sys/kern/subr_gtaskqueue.c:547 #20 0xffffffff80bb244d in fork_exit ( callout=0xffffffff80c42700 <gtaskqueue_thread_loop>, arg=<optimized out>, frame=<optimized out>) at /usr/src/sys/kern/kern_fork.c:1078 #21 <signal handler called>
sys/netinet/tcp_ratelimit.h | ||
---|---|---|
47 | should it be here? |
and yes Oliver this is because when the LRO code went in I forgot to update rack_bbr_common.c. I am glad
it fixes the issue. Neither Michael or I could recreate it since we were running on VM's and we did not
realize vmware fusion does not get LRO enabled .. Drew has given me a patch for that so I should put
it in... at least on my local box ;)
sys/netinet/tcp_ratelimit.h | ||
---|---|---|
47 | I am not understanding your comment. Maybe you are referring to the comment "Temporary".. and yeah The thing I thought, when I put it in, that it was just a nice temp for debugging. But I have found when |
sys/netinet/cc/cc_newreno.c | ||
---|---|---|
225 | Why did you remove this RFC2861 related bound of ssthresh? | |
278 | So, newreno no longer adjusts ssthresh and cwnd on RTO? That doesn't sound right... | |
sys/netinet/tcp_input.c | ||
528 | make this uint8_t flags a uint16_t flags, to cater for the ACE bits? | |
563 | Add the th_x2 << 8 || th_flags to cover all 12 potential TCP flags? Or shall I with the AccECN patch eventually? | |
sys/netinet/tcp_sack.c | ||
165 | SEQ_LEQ exists |
sys/netinet/cc/cc_newreno.c | ||
---|---|---|
225 | The standard does not call for this. The standard calls for | |
278 | Hmm I did not take this out.. but I see it somehow is not | |
sys/netinet/tcp_input.c | ||
528 | Sure we can do that. | |
563 | Lets do that in your patch assuming I get uint16_t in place for iptos | |
sys/netinet/tcp_sack.c | ||
165 | I did not realize someone finally added that.. I will change the code to that :) |
sys/netinet/cc/cc_newreno.c | ||
---|---|---|
225 | Don't quite understand this. This is the after-idle function (thus has no bearing with the initial ssthresh on connection establishment. Also, since this is mid-session, ssthresh should NOT be infinitiy here). RFC2861 suggests to set ssthresh to the maximum of the former ssthresh, or 3/4 of the former cwnd. So taking this out may leave ssthresh at a smaller, former value - but never restrict it as you suggest. If you have observations, that increasing ssthresh in after-idle is counter-productive (e.g. running into extensive burst loss, RTO and Lost Retransmissions) that may be a valuable to know @ IETF. | |
sys/netinet/tcp_sack.c | ||
165 | was apparently always available in fbsd tree, ever since @rgrimes added tcp_seq.h - but yes, not all these macros exist in all tcp stacks. :) https://reviews.freebsd.org/R10:df8bae1de4b67ccf57f4afebd4e2bf258c38910d#change-G7qyRTbZw4J4 |
sys/netinet/cc/cc_newreno.c | ||
---|---|---|
278 | Ok I see whats going on here. Looks like rack does these sets I will fix it in the next patch update with the rest of the fixes |
sys/netinet/cc/cc_newreno.c | ||
---|---|---|
225 | Ahh I thought you were pointing out the initialization code. Actually I never We would never have noticed this since there is a bug in the way being idle |
sys/netinet/tcp_input.c | ||
---|---|---|
528 | Remember if you play with the flags, since they are coming from rack_bbr_common.c you |
sys/netinet/cc/cc_newreno.c | ||
---|---|---|
241 | I wonder if this is a sync-update we missed. Since I only had work on the beta stuff in here and a couple odds and ends. I |
sys/netinet/cc/cc_newreno.c | ||
---|---|---|
241 | Actually I think the idea here is to have a mss to set the cwnd to. And in this a) We don't count options in the cwnd/rwnd ... i.e. sendwin = max(cwnd, rwnd)... and we compare that to (snd_max - snd_una).. that never has any options space in it. b) We want to drive the sending of whole mss pieces. We don't want to have left over 14 bytes or some such. Yes an initial send if sack is on and has data would be smaller than a mss. But Really what is needed here is not tcp_maxseg() but what is in ctf_fixed_maxseg(). |
This addresses Richards comments about mss = tcp_maxseg(). Which actually itself is
incorrect. Instead we need the fixed max segment size i.e. not including all the
dynamic things like sack since we are setting a cwnd that will survive long after
we are out of recovery.
If you think long is better, then just g o ahead.
sys/netinet/tcp_ratelimit.h | ||
---|---|---|
46–48 | Why do you change it from uint64_t to long? |
sys/netinet/tcp_ratelimit.h | ||
---|---|---|
46–48 | uint64_t breaks with 32 bit platforms. This code was yanked after the The other limit is in using, but 2 billion connections using a rate is a bit So I think its an ok compromise though it will be better when the 32 bit |
sys/netinet/tcp_ratelimit.h | ||
---|---|---|
46–48 | Then I'm fine with the changes. Go ahead and commit them. |