Page MenuHomeFreeBSD

inpcb: retire two-level port hash database
ClosedPublic

Authored by glebius on Thu, Feb 27, 5:59 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Mar 7, 8:16 AM
Unknown Object (File)
Sat, Mar 1, 1:35 AM
Unknown Object (File)
Fri, Feb 28, 2:33 PM
Unknown Object (File)
Fri, Feb 28, 9:14 AM
Unknown Object (File)
Thu, Feb 27, 5:42 PM
Unknown Object (File)
Thu, Feb 27, 5:14 PM
Unknown Object (File)
Thu, Feb 27, 5:08 PM
Subscribers

Details

Summary

This structure originates from the pre-FreeBSD times when system RAM was
measured in single digits of MB and Internet speeds were measured in Kb.
At first level the database hashes the port value only to calculate index
into array of pointers to lazily allocated headers that hold lists of
inpcbs with the same local port. This design apparently was made to
preserve kernel memory.

In the modern kernel size of the first level of the hash is derived from
maxsockets, which is derived from maxfiles, which in its turn is derived
from amount of physical memory. Then the size of the hash is capped by
IPPORT_MAX, cause it doesn't make any sense to have hash table larger then
the set of possible values. In practice this cap works even on my laptop.
I haven't done precise calculation or experiments, but my guess is that
any system with > 8 Gb of RAM will be autotuned to IPPORT_MAX sized hash.
Apparently, this hash is a degenerate one: it never has more than one
entries in any slot. You can check this with kgdb:

set $i = 0
while ($i <= tcbinfo->ipi_porthashmask)
    set $p = tcbinfo->ipi_porthashbase[$i].clh_first
    set $c = 0
    while ($p != 0)
        set $c = $c + 1
        set $p = $p->phd_hash.cle_next
    end
    if ($c > 1)
        printf "Slot %u count %u", $i, $c
    end
    set $i = $i + 1
end

Retiring the two level hash we remove a lot of complexity at the cost of
only one comparison 'inp->inp_lport != lport' in the lookup cycle, which
is going to be always false on most machines anyway. This comparison
definitely shall be cheaper than extra pointer traversal.

Another positive change to be singled out is that now we no longer need to
allocate memory in non-sleepable context in in_pcbinshash(), so a
potential ENOMEM on connect(2) is removed.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable