Page MenuHomeFreeBSD

inpcb: Fix SO_REUSEPORT_LB races, part 2
ClosedPublic

Authored by markj on Mon, Jan 20, 7:48 PM.
Tags
None
Referenced Files
F109225352: D48544.id149587.diff
Sun, Feb 2, 7:20 AM
Unknown Object (File)
Sat, Feb 1, 11:07 AM
Unknown Object (File)
Fri, Jan 31, 3:27 PM
Unknown Object (File)
Fri, Jan 31, 4:55 AM
Unknown Object (File)
Fri, Jan 31, 1:00 AM
Unknown Object (File)
Thu, Jan 30, 7:29 PM
Unknown Object (File)
Wed, Jan 29, 6:37 AM
Unknown Object (File)
Tue, Jan 28, 10:58 PM

Details

Summary

Suppose a thread is adds a socket to an existing lbgroup that is
actively accepting connections. It has to do the following operations:

  1. set SO_REUSEPORT_LB on the socket
  2. bind() the socket to the shared address/port
  3. call listen()

Step 2 makes the inpcb visible to incoming connection requests.
However, at this point the inpcb cannot accept new connections. If
in_pcblookup() matches it, the remote end will see ECONNREFUSED even
when other listening sockets are present in the lbgroup. This means
that dynamically adding inpcbs to an lbgroup (e.g., by starting up new
workers) can trigger spurious connection failures for no good reason.
(The same problem exists when removing inpcbs from an lbgroup, but that
is not addressed by this patch.)

Fix this by augmenting each lbgroup with a linked list of inpcbs that
are pending a listen() call. When adding an inpcb to an lbgroup, keep
the inpcb on this list if listen() hasn't been called, so it is not yet
visible to the lookup path. Then, add a new in_pcblisten() routine which
makes the inpcb visible within the lbgroup now that it's safe to let it
handle new connections.

Add a regression test which verifies that we don't get spurious
connection errors while adding sockets to an LB group.

Sponsored by: Klara, Inc.
Sponsored by: Stormshield

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.Mon, Jan 20, 7:48 PM

(The same problem exists when removing inpcbs from an lbgroup, but that is not addressed by this patch.)

This problem is harder to solve. Basically, if userspace closes a listening socket, we should transfer complete child sockets to another listening socket in the same lbgroup. However, this is quite hard to implement, partly due to complicated locking constraints. DragonFlyBSD (where SO_REUSEPORT_LB originates) implements it (see soinherit() and its caller), but its synchronization mechanism is quite different. It's also hard to know whether some tcpcb state needs to be migrated as well. Any pointers here would be appreciated.

Thanks! Just one comment on the PR_CONNREQUIRED check above.

sys/netinet/in_pcb.c
337

The PR_CONNREQUIRED doesn't map perfectly on what we are actually trying to look. Maybe make pr_listen_notsupp() non-statice, put it declaration into socketvar.h and check here:

if ((inp->inp_socket->so_proto->pr_listen != pr_listen_notsupp && !SOLISTENING(inp->inp_socket)
sys/netinet/in_pcb.h
432

Just unrelated observation: this structure declaration can be moved to in_pcb_var.h. It is shared only and only between in_pcb.c and in6_pcb.c.

markj added inline comments.
sys/netinet/in_pcb.h
432

I will move it in a follow-up commit.

markj marked an inline comment as done.

Handle feedback.

This revision was not accepted when it landed; it landed in state Needs Review.Thu, Jan 23, 5:12 PM
This revision was automatically updated to reflect the committed changes.