Page MenuHomeFreeBSD

inpcb: fix a panic with SO_REUSEPORT_LB + connect(2) misuse
ClosedPublic

Authored by glebius on Thu, Feb 27, 4:10 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Mar 16, 1:45 PM
Unknown Object (File)
Tue, Mar 11, 2:39 AM
Unknown Object (File)
Mon, Mar 10, 2:19 PM
Unknown Object (File)
Mon, Mar 10, 1:44 PM
Unknown Object (File)
Sun, Mar 9, 9:20 PM
Unknown Object (File)
Sun, Mar 9, 9:17 PM
Unknown Object (File)
Sun, Mar 9, 6:05 PM
Unknown Object (File)
Sun, Mar 9, 2:43 PM

Details

Summary

This combination doesn't make any sense. However, we shouldn't panic.
There are two options here: refuse connect(2) on a socket that has the
option set previously, or ignore (and clear) the option. We have chosen
the latter, because connect(2) is a SUS API that has a defined set of
error codes, none of which corresponds to "a socket has non-standard
incompatible socket option set". We don't want to extend connect(2) error
codes list to report a awkward misuse of a non-standard socket option.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

What exactly is the panic you see? (I can't easily run the test case at this moment.)

What happens if the socket is bound before connect() is called? It would still be in the lbgroup hash table, but not visible to lookups (since listen() has not been called). Do we need to remove it from the lbgroup structure?

What exactly is the panic you see? (I can't easily run the test case at this moment.)
What happens if the socket is bound before connect() is called? It would still be in the lbgroup hash table, but not visible to lookups (since listen() has not been called). Do we need to remove it from the lbgroup structure?

The lbgroup linkage is aliased with exact hash linkage, so a socket can't be connected one and a member of lbgroup at the same time. That of course makes sense. Actual panic message is something about queue(3) invariants violated.

What exactly is the panic you see? (I can't easily run the test case at this moment.)
What happens if the socket is bound before connect() is called? It would still be in the lbgroup hash table, but not visible to lookups (since listen() has not been called). Do we need to remove it from the lbgroup structure?

The lbgroup linkage is aliased with exact hash linkage, so a socket can't be connected one and a member of lbgroup at the same time. That of course makes sense. Actual panic message is something about queue(3) invariants violated.

Then, isn't the patch incomplete? If the SO_REUSEPORT_LB socket is already bound, connect() needs to fail.

Then, isn't the patch incomplete? If the SO_REUSEPORT_LB socket is already bound, connect() needs to fail.

connect(2) shall fail on any listening socket. Hmm, bound but not listening... I'm going to check that...

Yep, confirmed that if we bind(2), then connect(2) the panic happens and indeed patch is incomplete.

Maybe we'd better fail any connect(2) on the socket that has the flag set?

This just updates with additional test case, that would trigger a panic
that is not _yet_ covered by this revision.

Yep, confirmed that if we bind(2), then connect(2) the panic happens and indeed patch is incomplete.

Maybe we'd better fail any connect(2) on the socket that has the flag set?

Yes, I think so. It doesn't make sense to call connect(2) on a bound socket with SO_REUSEPORT_LB set. Thanks for looking at this and writing tests.

Yep, confirmed that if we bind(2), then connect(2) the panic happens and indeed patch is incomplete.

Maybe we'd better fail any connect(2) on the socket that has the flag set?

Yes, I think so. It doesn't make sense to call connect(2) on a bound socket with SO_REUSEPORT_LB set. Thanks for looking at this and writing tests.

Well, I think it should be consistent with not bound socket. Do you agree? Any socket with SO_REUSEPORT_LB set should fail connect() same way as an already listening one does.

I started to write patch and found immediately two problems here :)

  1. For AF_INET and AF_INET6 we would return different errors on connect(2) after listen(2).
  2. The SOLISTENING() check is done without a lock.

I'll try to come up at least with some improvement here.

Fail instead of silently hiden misuse. Mostly rewrite the commit message.

Yep, confirmed that if we bind(2), then connect(2) the panic happens and indeed patch is incomplete.

Maybe we'd better fail any connect(2) on the socket that has the flag set?

Yes, I think so. It doesn't make sense to call connect(2) on a bound socket with SO_REUSEPORT_LB set. Thanks for looking at this and writing tests.

Well, I think it should be consistent with not bound socket. Do you agree? Any socket with SO_REUSEPORT_LB set should fail connect() same way as an already listening one does.

Yes, you're right.

I started to write patch and found immediately two problems here :)

  1. For AF_INET and AF_INET6 we would return different errors on connect(2) after listen(2).
  2. The SOLISTENING() check is done without a lock.

It is done under the inpcb lock, which is sufficient to interlock with tcp_usr_listen().

sys/netinet/tcp_usrreq.c
596

Should we disallow this for UDP too, just for consistency?

Note that the added check is racy. An application with one thread bombing on setsockopt(2) and other thread doing connect(2) may trigger the panic. This is generic problem with "mislayered" setsockopts. This is how I called setsockopts that are badly named when they were introduced and pretend to belong to one socket level, while actually affecting a different one. In this particular case the option works solely on TCP level, not socket level but is declared in the latter namespace. I have some plan how to close those races, but don't want to pack it in this fairly small revision.

It is done under the inpcb lock, which is sufficient to interlock with tcp_usr_listen().

Right, but nor the setsockopt() :(

P.S. Just updated this review.

sys/netinet/tcp_usrreq.c
596

Do we have any test case or a working application for UDP and SO_REUSEPORT_LB?

Let's check in as is and don't pretend UDP was really in mind at this moment :)

It is done under the inpcb lock, which is sufficient to interlock with tcp_usr_listen().

Right, but nor the setsockopt() :(

Hum, yes. Another potential use for pr_lock...

sys/netinet/tcp_usrreq.c
596

I believe BIND can be configured to use REUSEPORT_LB with UDP sockets.

I'm ok with leaving it as it is though.

This revision was not accepted when it landed; it landed in state Needs Review.Fri, Mar 7, 7:01 AM
This revision was automatically updated to reflect the committed changes.