Page MenuHomeFreeBSD

Pre-calculate L2 prepends for routes with gateway and avoid arp/nd lookup
Needs ReviewPublic

Authored by melifaro on Dec 26 2021, 12:18 AM.
Tags
None
Referenced Files
F102617441: D33658.id100573.diff
Thu, Nov 14, 9:10 PM
Unknown Object (File)
Thu, Nov 7, 12:23 PM
Unknown Object (File)
Tue, Nov 5, 9:33 PM
Unknown Object (File)
Tue, Nov 5, 1:10 PM
Unknown Object (File)
Thu, Oct 17, 2:37 PM
Unknown Object (File)
Wed, Oct 16, 3:15 PM
Unknown Object (File)
Wed, Oct 16, 12:12 PM
Unknown Object (File)
Oct 15 2024, 2:43 AM

Details

Reviewers
None
Group Reviewers
network
Summary

Currently each non-TCP packet (and first TCP packet) exercise ARP/ND lookup if transmitted via IFT_ETHER kind of interface.
These lookups account for ~7% of CPU time when doing IP forwarding. Similarly, LLE recounting for short-lived TCP connections going through the default gateway, maybe a contention point.

This change eliminates L2 lookup and LLE refcounting for all output/forward routes that have a gateway.

The diff introduces a "glue" nhop_neigh layer between nexthops and LLE entries. Nexthops "subscribes" for the link layer notifications, and LLE layer provides those notifications.

Implementation details:

  • fast path utilises struct route ro_prepend and ro_plen infrastructure, allowing to bypass most of ether_output().
  • nhop_neigh datastructure is implemented as per-VNET resizable hash table (as nexthops from different fibs can reference the same interface, also IPv4 nexthop can reference IPv6 LLE)
  • datapath feedback ("get the timestamp of the first packet traversing given LLE startinf from now") occupies nearly half of the implemnetation. Effectively when such feedback is requested, sum of all packets gets collected from the matching nexthops. Then, in a global callout, each affected nhop_neigh structure is checked for difference every second.
  • there is an assumption that all prepends are at most 64 bytes (and cache line size is at least 64). This is required to allow atomic updates for both prepend and prepend length.
  • nhop prepends are allocated from a newly-created UMA zone. They use epoch(9) reclamation

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 43600
Build 40488: arc lint + arc unit

Event Timeline

melifaro retitled this revision from fff to [WIP] Pre-calculate L2 prepends for routes with gateway and avoid arp/nd lookup.Dec 26 2021, 12:51 AM
melifaro edited the summary of this revision. (Show Details)
melifaro retitled this revision from [WIP] Pre-calculate L2 prepends for routes with gateway and avoid arp/nd lookup to Pre-calculate L2 prepends for routes with gateway and avoid arp/nd lookup.
melifaro added a reviewer: network.

Hi @melifaro,
I apply this diff and D33662 for test to env (lagg0->vlans->vnet->jail, bird2.0.10). Simply routing between vlans works normally, but when I start bird which has configured OSPF and BGP sessions (full feed, open peerings) system crash immediately. I can provide core dumps.

stable/13-c9b215066

Hi @melifaro,
I apply this diff and D33662 for test to env (lagg0->vlans->vnet->jail, bird2.0.10). Simply routing between vlans works normally, but when I start bird which has configured OSPF and BGP sessions (full feed, open peerings) system crash immediately. I can provide core dumps.

stable/13-c9b215066

Hi Konrad, if you could share the stack trace from the dump, that would probably be good to start.
Meanwhile I'll update this diff to reflect the changes that happened since publishing.

This one doesn't currently apply to 13 as there are a number of outstanding changes to be back-merged. Hopefully I'll land them next week.
@olivier: any chance you could benchmark it?

@olivier: any chance you could benchmark it?

Interresting, on a 8 core ATOM with 10G cheliso, good improvement with inet (+12%) but no difference with inet6:
https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/forwarding-pf-ipfw/results/fbsd14-n277567-D33658/README.md

On the flamegraphes, I notice no more "arpresolve" on inet forwarding with this patch, but still nd6_resolve with inet6 forwarding.

I haven't really read the change but it sees an awefull lot of code to avoid holding a reference for the lle on the gw and adding the pointer there; that usually is a one-time-lookup and after you can likely just lockless check that the state of the lle is still good (given you know the lle pointer is valid) and unless the lle changed you are done; if the lle changed then you take the one-time lock hit and do a new lookup and free the old one but given that is a very rare operation normally ... I just wonder how another (from what I got) pseudo-struct and extra hash will solve better here than a simple pointer to a stable data structure?

@melifaro, Is there a chance to sync to head? I'm interested to do benchmark