Add support for IPoIB lagg devices in FreeBSD.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Differential D26254
Add support for IPoIB lagg devices in FreeBSD • hselasky on Aug 31 2020, 6:36 PM. Authored by Tags None Referenced Files
Subscribers
Details
Add support for IPoIB lagg devices in FreeBSD. MFC after: 1 week # Machine A kldload mlx5en mlx5ib ipoib if_lagg ifconfig ib0 up ifconfig ib1 up ifconfig lagg0 create laggtype infiniband ifconfig lagg0 laggproto failover laggport ib0 laggport ib1 1.1.1.1 netmask 255.255.255.0 opensm -d mlx5_0 -B # Machine B kldload mlx5en mlx5ib ipoib if_lagg ifconfig ib0 up ifconfig ib1 up ifconfig lagg0 create laggtype infiniband ifconfig lagg0 laggproto failover laggport ib0 laggport ib1 1.1.1.2 netmask 255.255.255.0 ping 1.1.1.1 # To test failover use ifconfig to down/up the master link and see that the traffic moves from one link to the other
Diff Detail
Event TimelineThere are a very large number of changes, so older changes are hidden. Show Older Changes Comment Actions Anmol is no longer with Panasas. 🙁 We remain interested in this feature, but we no longer have anyone with the cycles to participate in the review. Comment Actions Looks like a really nice neat incorporation of IB into lagg. Please see some comments inline :-)
Comment Actions Thank you for addressing the comments!
Comment Actions I agree with @melifaro ; Link Aggregation Group covers Ethernet (failover, lacp, loadbalance, roundrobin); why can't it also cover InfiniBand failover as part of the same?
Comment Actions Hi, It is not possible to re-use lagg<N> for infiniband, because we set the type of the network device when it is created. lagg<N> are created like ethernet and bond<N> are created like infiniband. Please suggest a better name than "bond", likely four letters. Else we end up having to do hacks, like in the initial patch. --HPS
Comment Actions Let's try to work it backwards?
(1) benefit that it won't require any userland changes when merging, compared to (2), however it comes at the cost of greater kernel complexity.
I'd really prefer no to go with inventing the name of every permutation of an interface property :-)
I tried to find the approach in the first published revision but failed to do so :-(
Comment Actions I like (2): assume Ethernet LAGG by default, but allow override to InfiniBand LAGG.
iblagg seems not-terrible to me. Or iblag, if six chars is too long. Comment Actions @melfario
I'll look into this. Doing it this way would remove the need for a separate device name! The default would be ethernet then. --HPS
Comment Actions It appears that tests/sys/net/if_lagg_test.sh has been failing since this change (https://ci.freebsd.org/job/FreeBSD-main-amd64-test/16921/) is the first build where this test fails. Comment Actions Is the test failure only related to the LOR's or something else? This change does not affect those code paths mentioned in the LOR, from what I can see. --HPS Comment Actions The test appears to grep for "lagg_" in the LOR messages and there are two that match: Lock order reversal between "in_multi_sx"(sx) and "if_lagg sx"(sx)! Lock order "in_multi_sx"(sx) -> "if_lagg sx"(sx) first seen at: #0 0xffffffff80c7bfcd at witness_checkorder+0x46d #1 0xffffffff80c17b17 at _sx_xlock+0x67 #2 0xffffffff826d1bf0 at lagg_ioctl+0xe0 #3 0xffffffff80d3026d at if_addmulti+0x3fd #4 0xffffffff80dbfbdd at in_joingroup_locked+0x27d #5 0xffffffff80dbf932 at in_joingroup+0x42 #6 0xffffffff80dba825 at in_control+0xa25 #7 0xffffffff80d30a38 at ifioctl+0x3d8 #8 0xffffffff80c824d9 at kern_ioctl+0x289 #9 0xffffffff80c8219a at sys_ioctl+0x12a #10 0xffffffff810bd629 at amd64_syscall+0x749 #11 0xffffffff8109010e at fast_syscall_common+0xf8 and Lock order reversal between "in_control"(sx) and "if_lagg sx"(sx)! Lock order "in_control"(sx) -> "if_lagg sx"(sx) first seen at: #0 0xffffffff80c7bfcd at witness_checkorder+0x46d #1 0xffffffff80c17b17 at _sx_xlock+0x67 #2 0xffffffff826d19ee at lagg_init+0x2e #3 0xffffffff80d36c1e at ether_ioctl+0x1be #4 0xffffffff826d20e1 at lagg_ioctl+0x5d1 #5 0xffffffff80dba7dd at in_control+0x9dd #6 0xffffffff80d30a38 at ifioctl+0x3d8 #7 0xffffffff80c824d9 at kern_ioctl+0x289 #8 0xffffffff80c8219a at sys_ioctl+0x12a #9 0xffffffff810bd629 at amd64_syscall+0x749 #10 0xffffffff8109010e at fast_syscall_common+0xf8 So I'm not sure why it would start failing after this change. |