tcp: keep the inp locked while processing the ACK of the handshake
AbandonedPublic
Actions

Authored by tuexen on Jul 6 2022, 12:43 PM.

Details

Reviewers

rrs
glebius
rscheff

Group Reviewers

transport

Summary

When processing the ACK (the third message) of the three way handshake, keep the inp locked to avoid races.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

tuexen created this revision.Jul 6 2022, 12:43 PM

Herald added a reviewer: transport. · View Herald TranscriptJul 6 2022, 12:43 PM

Herald added subscribers: melifaro, imp. · View Herald Transcript

tuexen requested review of this revision.Jul 6 2022, 12:43 PM

With the improved socket referencing scheme, when a socket created in sonewconn() has a reference immediately, are those races fatal? Would the problem go away completely if we revert 6f3caa6d8159?

In D35730#810477, @glebius wrote:

With the improved socket referencing scheme, when a socket created in sonewconn() has a reference immediately, are those races fatal? Would the problem go away completely if we revert 6f3caa6d8159?

Here is the race I have in mind (unlikely, but possible):

The`ACK` segment is received.
In the TCP syncache code sonewconn() is called, a TCP connection in created in the CLOSED state. After sonewconn() no inp lock is held.
Another TCP segment is received for the TCP connection, the TCP connection is in the CLOSED state, so a RST segment is sent. The peer will drop the connection.
The processing of the ACK segment continues and the TCP connection is moved in the ESTABLSIHED state. The peer however is in CLOSED.

The problem is that during processing of the ACK segment the inp is locked, unlocked, and locked again. The patch here avoids this.

Understood. Let me think if reverting that change that forced incomplete queue will heal such a race.

In D35730#810483, @glebius wrote:

Understood. Let me think if reverting that change that forced incomplete queue will heal such a race.

OK. However, I think this is a TCP specific issue (the processing of the ACK is done in multiple steps) and therefore a TCP specific fix might be appropriate. Can you provide a pointer to the commit you want to revert? I don't have a commit message in my e-mail log for 6f3caa6d8159.

In D35730#810519, @tuexen wrote:

In D35730#810483, @glebius wrote:

Understood. Let me think if reverting that change that forced incomplete queue will heal such a race.

OK. However, I think this is a TCP specific issue (the processing of the ACK is done in multiple steps) and therefore a TCP specific fix might be appropriate. Can you provide a pointer to the commit you want to revert? I don't have a commit message in my e-mail log for 6f3caa6d8159.

Maybe this: https://cgit.freebsd.org/src/commit/?id=6f3caa6d8159

In D35730#810522, @chris_cretaforce.gr wrote:

In D35730#810519, @tuexen wrote:

In D35730#810483, @glebius wrote:

Understood. Let me think if reverting that change that forced incomplete queue will heal such a race.

OK. However, I think this is a TCP specific issue (the processing of the ACK is done in multiple steps) and therefore a TCP specific fix might be appropriate. Can you provide a pointer to the commit you want to revert? I don't have a commit message in my e-mail log for 6f3caa6d8159.

Maybe this: https://cgit.freebsd.org/src/commit/?id=6f3caa6d8159

Yes, that is it. I didn't find it, because it was committed via svn: http://svnweb.freebsd.org/changeset/base/261242

Thanks for the pointer. I looked at the patch. I do not think it changes the race I would like to address with the patch in this review.

I think the race you described is different. The second ACK will not be able to find our CLOSED connection, because it is not yet in the pcb hash. It will find the listening socket instead. The listening socket has exclusive lock through all the processing of the previous packet, as it was a non-SYN. Now the second packet will block on the listening socket until the thread processing the first packet will unlock it after the comment Socket is created in state SYN_RECEIVED (btw, it should be INP_WUNLOCK(), my mistake). The second packet will now proceed through syncache_expand() and latter would fail and we indeed send RST after comment No syncache entry or ACK was. But write locking the new born pcb for a longer time won't help, IMHO.

I agree with your analysis. My mistake. This patch doesn't help.

Hi!

I got a patchset inspired by this revision: D36062, D36063, D36064, D36065. It does not close the discussed race with two packets! However, it definitely streamlines creation of a new socket from a listener. I have done only stability testing so far, but looking into doing benchmarks, too.

I also got a very different patch to fix the discussed race: D36066. I don't yet have a tool to instrument and reproduce the discussed race, so a help here is appreciated. Need a packetdrill guru :)

In D35730#818928, @glebius wrote:

Hi!

I got a patchset inspired by this revision: D36062, D36063, D36064, D36065. It does not close the discussed race with two packets! However, it definitely streamlines creation of a new socket from a listener. I have done only stability testing so far, but looking into doing benchmarks, too.

I also got a very different patch to fix the discussed race: D36066. I don't yet have a tool to instrument and reproduce the discussed race, so a help here is appreciated. Need a packetdrill guru :)

Unfortunately packetdrill is not the right tool for testing race conditions. But I think some pf tests were triggering this kind of issue... You might ask kp@ for a way to reproduce it.