Currently we use a single hash table per PCB database for connected and
bound PCBs. Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.
Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb. Now the database has one table each
for connected and unconnected sockets. The patch currently refers to
them as "exact" and "wild" but these names aren't very good; it should
probably be just "conn" and "unconn"?
When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use. This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away. There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.
I also made the "rehash" parameter of in(6)_pcbconnect() unused. This
parameter seems confusing and it is simpler to let the in_pcb code
figure out what to do.
There are two wrinkles in this patch, hence the "WIP":
1a. The raw_ip code (ab)uses these hash tables as well, for now I just
hard-coded the "exact" table.
1b. The inpcb iterator functions can be used to search a hash bucket,
but this functionality is currently only used by raw_ip.
- UDP allows a socket to be connected and disconnected multiple times.
For (1) the plan is to modify raw_ip to avoid using these hash tables.
For now the patch works by just using "exact" everywhere, which works
for existing consumers.
For (2), the patch plugs a hole in the inpcb structure and uses it to
store an SMR sequence number. When an inpcb is disconnected, the global
write sequence number is advanced, and in order to re-connect, the
caller must wait for all readers to drain before reusing the hash chain
linkage pointers. I suspect that this is acceptable, but I'm not sure
how common it is to re-use UDP sockets for multiple connections.
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay