Page MenuHomeFreeBSD

tcp: utilize new solisten_clone() and solisten_enqueue()
ClosedPublic

Authored by glebius on Aug 7 2022, 7:16 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 3, 1:06 PM
Unknown Object (File)
Fri, Oct 18, 9:18 PM
Unknown Object (File)
Oct 9 2024, 11:51 AM
Unknown Object (File)
Oct 9 2024, 11:50 AM
Unknown Object (File)
Oct 9 2024, 11:50 AM
Unknown Object (File)
Oct 9 2024, 11:50 AM
Unknown Object (File)
Oct 9 2024, 11:50 AM
Unknown Object (File)
Oct 9 2024, 11:50 AM
Subscribers

Details

Summary

This streamlines cloning of a socket from a listener. Now we do not
drop the inpcb lock during creation of a new socket, do not do useless
state transitions, and put a fully initialized socket+inpcb+tcpcb into
the listen queue.

Before this change, first we would allocate the socket and inpcb+tcpcb via
tcp_usr_attach() as TCPS_CLOSED, link them into global list of pcbs, unlock
pcb and put this onto incomplete queue (see 6f3caa6d815). Then, after
sonewconn() we would lock it again, transition into TCPS_SYN_RECEIVED,
insert into inpcb hash, finalize initialization of tcpcb. And then, in
call into tcp_do_segment() and upon transition to TCPS_ESTABLISHED call
soisconnected(). This call would lock the listening socket once again
with a LOR protection sequence and then we would relocate the socket onto
the complete queue and only now it is ready for accept(2).

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 46797
Build 43686: arc lint + arc unit

Event Timeline

rrs added a reviewer: tuexen.
rrs added a subscriber: rrs.

These look ok but I also think it worth having Michael have a look in
case I am missing something ... :)

sys/netinet/tcp_syncache.c
939

Why don't you use tcp_state_change()? Aren't we now missing the state change when using dtrace?

sys/sys/socketvar.h
501

Should there be a variable name for consistency?

glebius added inline comments.
sys/netinet/tcp_syncache.c
939

Cause there is no state change here. We just allocated newborn tcpcb and freed syncache_entry. Syncache entry goes away without updating any state counters. We inherit its TCP state and its counter and tcpcb is born as TCPS_SYN_RECEIVED.

See deleted comment at line 1285. Previous transition from CLOSING to SYN_RECEIVED was bogus, don't you agree? There is no such transition in TCP diagram.

glebius marked an inline comment as done.

Rebase

sys/netinet/tcp_syncache.c
939

Cause there is no state change here. We just allocated newborn tcpcb and freed syncache_entry. Syncache entry goes away without updating any state counters. We inherit its TCP state and its counter and tcpcb is born as TCPS_SYN_RECEIVED.

I would consider it moving from CLOSED to SYN_RECEIVED.

See deleted comment at line 1285. Previous transition from CLOSING to SYN_RECEIVED was bogus, don't you agree? There is no such transition in TCP diagram.

Not in the protocol specification. But implementations can be different. When using dtrace to observe the state changes I prefer every endpoint to start in CLOSED and eventually end in CLOSED...

sys/netinet/tcp_syncache.c
939

Well, our implementation was with a defect, and now it is not :) Why should we stick to it? Let's look in detail into previous implementation:

  1. The SYN-RCVD state disappears
    • at this point we don't have any state accounted for, but a TCP connection does exist! Ugly! --
  2. A moment later CLOSED appears
  3. A moment later CLOSED transitions to SYN-RCVD
  4. A moment later SYN-RCVD transitions to ESTABLISHED

Note that transitions 1->2->3->4 all happen in the same call into tcp_input(), it is a single network stack event, not a series of events. It can't branch anywhere. With dtrace you see it as series of events, while it is not. In heavy traffic these events potentially can intermix with events from other sockets, while they should not, as the transition is atomic in nature.

New implementation makes one clear transition: SYN-RCVD -> ESTABLISHED. That matches specification and reality. I agree that I missed a dtrace event here, and I will fix that. Maybe add a remark that it is a syncache entry expansion?

sys/netinet/tcp_syncache.c
939

Last sentence wasn't clear. I'm not missing a state transition event with dtrace, SYN-RCVD -> ESTABLISHED happens via tcp_state_change(). However, I can add a dtrace message that notes expansion of syncache entry to tcpcb.

P.S. Also, should look how syncookie path works...

Bugfix: when creating a tcpcb from a syncookie, we would underleak
count of states in TCPS_SYN_RECEIVED.

Oops, used obsoleted git hash in previous 'git arc update'.

sys/netinet/tcp_syncache.c
939

Maybe I wasn't clear: I only wanted to be able to observe the CLOSED->SYN_RECEIVED state change via dtrace.
Do you need a TCPSTATES_INC(TCPS_SYN_RECEIVED); Where is this done?

sys/netinet/tcp_syncache.c
939

Maybe I wasn't clear: I only wanted to be able to observe the CLOSED->SYN_RECEIVED state change via dtrace.

Do you agree that such transition doesn't happen in the protocol and was a misfeature of our implementation? Why stick to it? Here is my suggestion: I'll add probes that would report expansion of syncache entry into tcpcb and a separate probe reporting creation of a tcpcb out of a syncookie. That would give you more information than you got before with that synthetic transition.

Do you need a TCPSTATES_INC(TCPS_SYN_RECEIVED); Where is this done?

This is done when a syncache entry is created. For the syncookie case I just fixed the patch late night yesterday, line 1173 of new version.

sys/netinet/tcp_syncache.c
939

Maybe I wasn't clear: I only wanted to be able to observe the CLOSED->SYN_RECEIVED state change via dtrace.

Do you agree that such transition doesn't happen in the protocol and was a misfeature of our implementation? Why stick to it? Here is my suggestion: I'll add probes that would report expansion of syncache entry into tcpcb and a separate probe reporting creation of a tcpcb out of a syncookie. That would give you more information than you got before with that synthetic transition.

Right now you get:

CPU    DELTA(us)  OLD                     NEW
  2            -  state-closed         -> state-syn-received
  2           21  state-syn-received   -> state-established
  2     49080938  state-established    -> state-close-wait
  2          732  state-close-wait     -> state-last-ack
  2         6722  state-last-ack       -> state-closed

My point is that you always start (and end) in CLOSED. We do represent it often implicitly by not having a tcpcb at all. I don't care much about if you start with CLOSED to SYN-RECEIVED or change that to CLOSED to ESTABLISHED. But all end points are in the CLOSED state if they are not in any other state. I think RFC 793 describes this as a fictional state. socket-based implementations handle the LISTEN state different than what is described in RFC 793.
FreeBSD does (right now) not provide the TIMEWAIT to CLOSED transition, but that can be fixed once you removed the timewait special handling.

sys/netinet/tcp_syncache.c
939

A suggestion, if you prefer to imitate RFC 793:
Add

TCP_PROBE6(state__change, NULL, tp, NULL, tp, NULL, TCPS_LISTEN );

However, you would have to put this after the tcpcb is populated. This would be the behavior Solaris has...

Add dtrace probe reporting LISTEN -> SYN-RCVD transition.

This revision is now accepted and ready to land.Aug 10 2022, 5:24 PM