Page MenuHomeFreeBSD

Introduce DXR IPv4 LPM as a fib_algo module.
ClosedPublic

Authored by zec on Apr 18 2021, 3:41 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 3, 2:14 AM
Unknown Object (File)
Oct 3 2024, 9:48 AM
Unknown Object (File)
Oct 3 2024, 8:24 AM
Unknown Object (File)
Sep 30 2024, 8:34 PM
Unknown Object (File)
Sep 30 2024, 10:29 AM
Unknown Object (File)
Sep 30 2024, 12:55 AM
Unknown Object (File)
Sep 27 2024, 7:39 PM
Unknown Object (File)
Sep 27 2024, 7:39 PM

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

zec requested review of this revision.Apr 18 2021, 3:41 PM

LGTM!
Please see some minor comments in the diff.

sys/modules/Makefile
112

Should be build only when FIB_ALGO is enabled.
Something similar to _dpdk_lpm4 handling should work.

sys/netinet/dxr.c
62 ↗(On Diff #87653)

This header is internal to the routing subsystem and should not be included.

332 ↗(On Diff #87653)

I'll provide a wrapper (publish rib_lookup_bysa())

335 ↗(On Diff #87653)

You shouldn't access rt fields directly, please consider using rt_get_inet_prefix_plen().

382 ↗(On Diff #87653)

Nit: maybe bool?

423 ↗(On Diff #87653)

Generic question on the lack of M_ZERO in the memory allocations: if the concern here is the performance, would it be possible to quantify the performance loss?
With the large amount of fields in the structures it's a bit hard to track what is initialised and what is not.

641 ↗(On Diff #87653)

Please don't use rt fields directly and consider switching to rt_get_inet_prefix_plen() or rt_get_inet_prefix_pmask().

1142 ↗(On Diff #87653)

Nit: maybe if (new) update_delta++; if (old) update_delta--; ?

zec marked 4 inline comments as done.Apr 21 2021, 12:02 PM
zec added inline comments.
sys/netinet/dxr.c
62 ↗(On Diff #87653)

Not possible (yet), because we need to resolve rnh_walktree_from(), and to lesser extent rnh_matchaddr(), inside struct rib_head, which is defined in net/routr/route_var.h

382 ↗(On Diff #87653)

That change would have to propagated elsewhere, so not explicitly against it, but maybe later :)

423 ↗(On Diff #87653)

In principle I'm against M_ZERO since it acts as a cache flush with huge structures, in addition to wasting cycles when it's not really needed.

The main struct dxr_aux (malloc()ed at line 809), as well as range_tbl and x_tbl which hang of it and which may be occasionally realloc()ed, are repeatedly recycled at each FIB update, so having the bzeroed at allocation time buys us nothing in terms of preventing potential bugs after the first update.

The lookup structures (malloc()ed at line 945) are exact-fit taylored to the currently required size, and are completely filled bottom to top via several memcpy()s, so no reason to M_ZERO that one either.

Range chunk and trie trackers, which are stored in separate UMA zone pools, may be recycled many times, so M_ZERO at uma_zalloc() time buys us nothing there as well.

Plus, the code has been thoroughly stress-tested in a private userspace app over several years, including >1 year of continuous uninterrupted uptime in a production environment.

Once this patch applied, how to check it is dxr_lpm4 that is used ?

kldload dxr
sysctl net.route.algo should list dxr
sysctl net.route.algo.inet.algo=dxr should forcibly select it

Also yhere’s quite a lot of debug if you raise net.route debuglevel to 5 or 6, showing the actual events

Switch to the new rib_walk_from KPI.

LGTM, please see some comments inline :-)
Also: do you have any preference/requirements on the commit message?

sys/netinet/in_fib_dxr.c
58
63–64
66
348
658
1027

The framework provides a guarantee that the old instance won't get deleted until this instance setup is finished (successfully or not).
Thus, my understanding of the current code is that this refcount is not necessary.
Furthermore, if init (or the later dump) fails, we end up never freeing old data.

1249

So now it looks like we've solved all easy problems and the hardest one in CS remains: naming :-)
It would be nice to have a bit more consistency here - e.g. we have fib_dxr as a part of the filename, dxr_lpm4 as an internal identity, dxr as a framework and module identifier. Maybe reducing it to 2 variants is possible? Personally, I'd prefer dxr_lpm4, but fib_dxrlooks good as well.

This revision is now accepted and ready to land.Apr 29 2021, 9:53 PM
This revision now requires review to proceed.May 3 2021, 6:09 PM
zec marked 7 inline comments as done.May 3 2021, 6:19 PM
zec added inline comments.
sys/netinet/in_fib_dxr.c
63–64

Nuked route_var.h, but can't get rid of fib_algo.h.

1027

The refcount is for an auxiliary structure which may get shared by several main dxr instances (generations), thus must stay.

Once the refcount on the aux struct is bumped, dxr_init() can't fail. If the dump later fails, the framework will call the dxr_destroy() destructor which will DTRT WRT dropping the refcount and freeing the aux structure if required.

If there was a scenario where the aux structure would leak that would be easy to track given that it is tagged as M_DXRAUX.

1249

I've settled for "fib_dxr" as the module name, while retaining simply "dxr" as the label for net.route.algo.inet.algo.

melifaro added inline comments.
sys/netinet/in_fib_dxr.c
63–64

fib_algo.h is actually fine. It was moved below in the followup comment.

This revision is now accepted and ready to land.May 4 2021, 8:32 PM
This revision was automatically updated to reflect the committed changes.
zec marked 3 inline comments as done.

It's great to see another, performant route search algo in reworked FreeBSD's routing stack, especially that MFC is planned after 1 week. I have to admit that really, the name "dxr_lpm4" and would be probably more recognizable since at a glance fib_dxr looks to me like an addition or supplementation to dpdk_lpm4 and other (bsearch4, radix4, radix4_lockless) routing algos, but from the above review, I assume that it's another independent routing algo module and should be considered as full replacement, not any supplementation.
As a potential tester on stable/13 branch, I have a question regarding FIB_ALGO auto-selection - as far as I understand it's neither enabled nor planned yet? I want also to ask if any fib_dxr6 for IPv6 is planned?

...

I assume that it's another independent routing algo module and should be considered as full replacement, not any supplementation.

Right, DXR has no dependencies on any of the other algos.

As a potential tester on stable/13 branch, I have a question regarding FIB_ALGO auto-selection - as far as I understand it's neither enabled nor planned yet?

If you kldload fib_dxr and start adding routes, at some point the FIB_ALGO will automagically switch to dxr. The automatic switchover can be always manually overriden.

I want also to ask if any fib_dxr6 for IPv6 is planned?

DXR is tailored to IPv4's densely populated address space. IPv6 address space utilization is less uniform, so DXR with IPv6 is less likely to yield stellar results. To be more specific, I don't have IPv6 version of DXR in the works.

I have cherry-picked "2aca58e16f50 - main - Introduce DXR as an IPv4 longest prefix matching / FIB module" with "aad59c79f5f2 - main - Fix panic when trying to delete non-existent gateway in multipath route" to stable/13 and tested a few hours under light load without problems. Everything was going fine until I tried to switch manually to dpdk_lpm4 what immediately introduced panic. Unfortunately I have no core of this panic. After reboot I tried to reproduce this but it seems to be not so easily reproducible. DXR was working fine, function dxr_build with caller dxr_change_rib_batch seems to be still active, but not overloading this 8-core ATOM.

function dxr_build with caller dxr_change_rib_batch seems to be still active

If you can reproduce the problem pls. also try to provoke it with sysctl net.route.multipath=0.

It seems not to be easily reproducible. I am testing it on stable/13 for more than 1 mont and so far I have had only two panics while switching from dxr to dpdk_lpm4, though neither one was properly recorded (no core dumps). Likely those panics are more related to algo switching than to specific algo, but for some time I have USB key plugged in and configured as a dump device.
The overall performance of DXR algo seems to be decent., but the MFC to stable/13 might help with bringing it to a wider audience and more intensely testing.