Page MenuHomeFreeBSD

carp: Fix checking IPv4 multicast address
ClosedPublic

Authored by zlei on Wed, Feb 19, 7:14 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Mar 18, 2:25 PM
Unknown Object (File)
Sun, Mar 16, 3:05 AM
Unknown Object (File)
Tue, Mar 11, 8:38 PM
Unknown Object (File)
Wed, Feb 26, 10:42 AM
Unknown Object (File)
Wed, Feb 26, 10:42 AM
Unknown Object (File)
Wed, Feb 26, 10:42 AM
Unknown Object (File)
Wed, Feb 26, 12:35 AM
Unknown Object (File)
Fri, Feb 21, 5:53 PM

Details

Summary

An IPv4 address stored in struct in_addr is in network byte order but
IN_MULTICAST wants host order.

PR: 284872
Reported by: Brett Merrick <brett.merrick@itcollective.nz>
Fixes: 137818006de5 carp: support unicast
MFC after: 3 days

Test Plan

Setup CARP unicast peer,

#!/bin/sh

kldload -n carp

ifconfig epair create

jail -ic name=foo vnet persist
jail -ic name=bar vnet persist

ifconfig epair0b vnet bar
ifconfig epair0a vnet foo

ifconfig -j foo epair0a inet 192.168.2.224/24
ifconfig -j foo epair0a vhid 1 peer 192.168.2.1 192.168.2.99/32 alias

ifconfig -j bar epair0b inet 192.168.2.1/24
ifconfig -j bar epair0b vhid 1 peer 192.168.2.224 192.168.2.99/32 alias

Make epair0a in jail foo a CARP backup, and observe the traffic in jail bar,

# ifconfig -j foo epair0a vhid 1 state backup
# ifconfig -j foo epair0a
epair0a: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:02:cc:b9:51:0a
	inet 192.168.2.224 netmask 0xffffff00 broadcast 192.168.2.255
	inet 192.168.2.99 netmask 0xffffffff broadcast 192.168.2.99 vhid 1
	groups: epair
	carp: BACKUP vhid 1 advbase 1 advskew 0
	      peer 224.0.0.18 peer6 ff02::12
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
# ifconfig -j bar epair0b
epair0b: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:02:cc:b9:51:0b
	inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
	inet 192.168.2.99 netmask 0xffffffff broadcast 192.168.2.99 vhid 1
	groups: epair
	carp: MASTER vhid 1 advbase 1 advskew 0
	      peer 192.168.2.224 peer6 ff02::12
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

# jexec bar tcpdump -enpi epair0b -T carp -c 3
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on epair0b, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:06:35.769990 02:02:cc:b9:51:0b > 02:02:cc:b9:51:0a, ethertype IPv4 (0x0800), length 70: 192.168.2.1 > 192.168.2.224: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=10996631475009634537
15:06:36.831949 02:02:cc:b9:51:0b > 02:02:cc:b9:51:0a, ethertype IPv4 (0x0800), length 70: 192.168.2.1 > 192.168.2.224: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=10996631475009634538
15:06:37.870947 02:02:cc:b9:51:0b > 02:02:cc:b9:51:0a, ethertype IPv4 (0x0800), length 70: 192.168.2.1 > 192.168.2.224: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=10996631475009634539

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

zlei requested review of this revision.Wed, Feb 19, 7:14 AM

Did the same here https://github.com/opnsense/src/commit/8f86d0fdd37 but haven't heard from the user yet whether the provided test kernel works for them.

This revision is now accepted and ready to land.Wed, Feb 19, 7:17 AM

I have exactly the same patch in my queue.

Ideally I'd like to add a test case too.

I personally do not like this ntohl sprinkle everywhere. Some usages in sys/netlink/route/iface.c are also suspicious.

static uint8_t
ifa_get_scope(const struct ifaddr *ifa)
{
        const struct sockaddr *sa;
        uint8_t addr_scope = RT_SCOPE_UNIVERSE;

        sa = ifa->ifa_addr;
        switch (sa->sa_family) {
#ifdef INET
        case AF_INET:
                {
                        struct in_addr addr;
                        addr = ((const struct sockaddr_in *)sa)->sin_addr;
                        if (IN_LOOPBACK(addr.s_addr))
                                addr_scope = RT_SCOPE_HOST;
                        else if (IN_LINKLOCAL(addr.s_addr))
                                addr_scope = RT_SCOPE_LINK;
                        break;
                }
#endif
#ifdef INET6
        case AF_INET6:
                {
                        const struct in6_addr *addr;
                        addr = &((const struct sockaddr_in6 *)sa)->sin6_addr;
                        if (IN6_IS_ADDR_LOOPBACK(addr))
                                addr_scope = RT_SCOPE_HOST;
                        else if (IN6_IS_ADDR_LINKLOCAL(addr))
                                addr_scope = RT_SCOPE_LINK;
                        break;
                }
#endif
        }

        return (addr_scope);
}

Do you think a IN_IS_ADDR_MULTICAST akin to IN6_IS_ADDR_MULTICAST is valuable ?

+#ifdef _KERNEL
+struct in_addr {
+       union {
+               in_addr_t       s_addr;
+               uint8_t         __u4_addr8[4];
+               uint16_t        __u4_addr16[2];
+       } __u4_addr;
+};
+#define s4_addr   __u4_addr.__u4_addr8
+#define s4_addr8  __u4_addr.__u4_addr8
+#define s4_addr16 __u4_addr.__u4_addr16
+
+#define IN_IS_ADDR_MULTICAST(a)        (((a)->s4_addr8[0] & 0xf0) == 0xe0)
+#define IN_IS_ADDR_LINKLOCAL(a)        ((a)->s4_addr16[0] == htons(0xa9fe))
+#define IN_IS_ADDR_LOOPBACK(a) ((ntohl((a)->s_addr) & V_in_loopback_mask) == 0x7f000000)
+
+#define        _STRUCT_IN_ADDR_DECLARED
+#endif /* _KERNEL */

That may want more attention from the network .

Linux KPI has a pragmatic approach:

sys/compat/linuxkpi/common/include/linux/in.h:#define ipv4_is_multicast(be) IN_MULTICAST(ntohl(be))

IPv6 is better in this regard, but the ambiguity of dealing with a host or network order is still there unless IN_ macros are specifically meant for one or the other? Unsure about the historic intention.

> Do you think a IN_IS_ADDR_MULTICAST akin to IN6_IS_ADDR_MULTICAST is valuable ?

How this should help? You still need to know how an address is stored in variable - in host byte order or in network.

A demo of the replacement,

-               if (IN_MULTICAST(ntohl(sc->sc_carpaddr.s_addr)))
+               if (IN_IS_ADDR_MULTICAST(&sc->sc_carpaddr))

.

In D49053#1118522, @ae wrote:

> Do you think a IN_IS_ADDR_MULTICAST akin to IN6_IS_ADDR_MULTICAST is valuable ?

How this should help? You still need to know how an address is stored in variable - in host byte order or in network.

IIRC, historically FreeBSD convert network byte order to host order on the input routine ( ip_input.c ), and later revert that. So in a mbuf it is ensured to be as is ( that is the network byte order ).

Given an IPv4 address is actually 4 octets, I'd prefer to always keep its form, aka the network byte order everywhere. So developers wont worry in which order it is stored.

Linux KPI has a pragmatic approach:

sys/compat/linuxkpi/common/include/linux/in.h:#define ipv4_is_multicast(be) IN_MULTICAST(ntohl(be))

IPv6 is better in this regard, but the ambiguity of dealing with a host or network order is still there unless IN_ macros are specifically meant for one or the other? Unsure about the historic intention.

IN_ is short for inet but no doubt it is a little ambiguity esp. for newbies. That requires network domain knowledge but apparently IN6_ is much better.

In D49053#1118522, @ae wrote:

> Do you think a IN_IS_ADDR_MULTICAST akin to IN6_IS_ADDR_MULTICAST is valuable ?

How this should help? You still need to know how an address is stored in variable - in host byte order or in network.

IIRC, historically FreeBSD convert network byte order to host order on the input routine ( ip_input.c ), and later revert that. So in a mbuf it is ensured to be as is ( that is the network byte order ).

Given an IPv4 address is actually 4 octets, I'd prefer to always keep its form, aka the network byte order everywhere. So developers wont worry in which order it is stored.

I think that is why in_addr_t was introduced, that clearly denotes to store an IPv4 address but not a plain 32bit integer.

typedef	uint32_t		in_addr_t;

On linux an IPv4 address was declared with __be32 that is big endian ( the network byte order ) 32 bit integer.

/* Internet address. */
struct in_addr {
	__be32	s_addr;
};

IN_ is short for inet but no doubt it is a little ambiguity esp. for newbies. That requires network domain knowledge but apparently IN6_ is much better.

The comparison is difficult. I think that's just because IPv6 is always serialised into a byte array in network order which is much easier to check especially for multicast address. IN_MULTICAST wants host order and adding IN_IS_ADDR_MULTICAST then wants network order when historically IN_MULTICAST should have taken network order as one would expect now it seems.

It is absolutely fine to have those ntohl()s. Together with the macro they all will be optimized out by compiler. And ntohl() also documents that here we are working with data arrived from network.

A new macros and union (why not anonymous?!) will create more obfuscation than bring any new clarity.

@franco_opnsense.org

Did the same here https://github.com/opnsense/src/commit/8f86d0fdd37 but haven't heard from the user yet whether the provided test kernel works for them.

I tested and can confirm that this fix works.

Also confirmed now but the reporters have operational remarks when running with the fix, see https://github.com/opnsense/src/issues/239#issuecomment-2669497329

It is absolutely fine to have those ntohl()s. Together with the macro they all will be optimized out by compiler. And ntohl() also documents that here we are working with data arrived from network.

A new macros and union (why not anonymous?!) will create more obfuscation than bring any new clarity.

OK, I'm landing this revision as is.

In D49053#1118512, @kp wrote:

I have exactly the same patch in my queue.

Ideally I'd like to add a test case too.

The testcase is little complicated, as bridge(4) used in the testcase unicast_v4 forwards all traffic so the testcase is unable to catch this special bug. Post the update of testcase to https://reviews.freebsd.org/D49076.

Also confirmed now but the reporters have operational remarks when running with the fix, see https://github.com/opnsense/src/issues/239#issuecomment-2669497329

I'd say that is expected behavior for unicast CARP setup. I'll response directly in https://github.com/opnsense/src/issues/239 .

This revision was automatically updated to reflect the committed changes.

I personally do not like this ntohl sprinkle everywhere. Some usages in sys/netlink/route/iface.c are also suspicious.

...

That may want more attention from the network .

Post the fix to D49226 .