There are multiple problems both in IPv4 and IPv6 code leading to the messages being generated with incorrect data.
Additionally, due to the lack of proper API v6 code uses ugly hacks to generate rtm messages of needed kindd.
Some background on the routing messages and their handling by the popular daemons is provided below for the interested reader:
rtsock/sysctl interface
Typically routing daemons utilise both rtsock "async" interface and sysctl(3) "sync" interface to keep the information on the interfaces and routes in sync with the kernel.
This allows to recover from both missed (for example, due to an excessive RTM_MISSMSG generation) and incorrect rtsock messages. However, as the sync calls are not cheap, especially for reading the routing tables, these are performed on minutes/hours cadence, thus making recovery not really fast.
Interface address setup
The following resources are generated:
- (1) link-local entry, corresponding to the interface (given the interface is ethernet or similar). This translates to RTM_ADD with RTF_LLDATA rtsock notification
- (2 "newaddr") interface address itself. This translates to RTM_NEWADDRS rtsock notification
- (3 "hostroute") host route for the interface address. This translates to RTM_ADD for the host route
- (4 "prefixroute") prefix route for the interface address&mask. This translates to RTM_ADD for the prefix.
- corresponding multicast group(s)
IPv4 world
Let's see how it works in IPv4:
User's SIOCAIFADDR is handled by the in_aifaddr_ioctl(), which calls rtinit1() via in_addprefix().
rtinit1() generates "newaddrs" message:
RTM_NEWADDR: address being added to iface: len 164, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> vlan2:52.54.0.42.f.ef 10.1.0.1 10.1.0.255
Note the _netmask_ sa: it is actually AF_LINK and not AF_INET. This is one of the things fixed in this change.
Message 2 "hostroute" is simply not generated for IPv4, resulting in routing daemons picking host /32 from the scan at the latter point in time (see testing).
Message 3 "prefixroute" is also generated by rtinit1():
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.1.0.0 link#3 255.255.255.0
As flags do NOT contain RTF_DONE, this message is ignored by bird/quagga. However, they typically don’t care as they derive these prefixes from 2 "newaddrs" message. Please see more details in Testing section.
IPv6 world
IPv6 control plane starts in in6_update_ifa(). IPv6 address setup is a complex one, so currently the code takes "all-or-nothing" approach w.r.t. reporting changes to the userland: it generates rtsock messages IFF everything was successful.
However, this approach has a drawback of not having data (rtentries) to generate these messages from, resulting in a hackish in6_newaddrmsg() creating "newaddrs" and "hostroute" messages.
Message 2 "newaddrs":
got message of size 164 on Fri Dec 27 21:59:50 2019 RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:<HOST> sockaddrs: <NETMASK,IFP,IFA> link#0 vlan2:52.54.0.42.f.ef 2a02:6b8:6::6
Note the _netmask_ sa: it is actually AF_LINK and not AF_INET. This is one of the things fixed in this change.
Message 3 "hostroute":
got message of size 272 on Fri Dec 27 21:59:50 2019 RTM_ADD: Add Route: len 272, pid: 0, seq 0, errno 0, flags:<UP,HOST,STATIC> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 2a02:6b8:6::6 link#0 ffff:ffff:ffff:ffff::
Note RTF_DONE flag is not set, making this message ignored by bird/quagga.
Note useless NETMASK sockaddr being passed, despite the fact that this is host route.
Message 4 "prefixroute" generated by nd6_prefix_onlink() afterwards:
RTM_ADD: Add Route: len 344, pid: 0, seq 0, errno 0, flags:<UP,DONE> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA> 2a02:6b8:6:: link#3 (255) ffff ffff ffff ffff ffff ffff ffff vlan2:52.54.0.42.f.ef 2a02:6b8:6::6
Fine on the first glance, however in reality it does not fill in interface index in rtm->rtm_index (rt_missmsg_fib() does not do that), thus making this message ignored by bird.
How these messages are perceived by the routing daemons
Quagga/FRR
//rtm_read()/ ignores all rtm messages w/o RTF_DONE. quagga/kernel_socket.c at 88d6516676cbcefb6ecdc1828cf59ba3a6e5fe7b · Quagga/quagga · GitHub
Additionally, quagga ignores all routes without a gateway (RTF_GATEWAY): quagga/kernel_socket.c at 88d6516676cbcefb6ecdc1828cf59ba3a6e5fe7b · Quagga/quagga · GitHub
This makes quagga ignore both "hostroute" and "prefixroute" messages, relying only on "newaddrs" message to construct the view. Lastly, the netmask SA in "newaddrs" is incorrect (wrong family/af_size). However, as variations of this were present in *BDSs for decades, routing daemons have already worked around that.
bird
bird also ignores all rtm messages w/o RTF_DONE: bird/krt-sock.c at 822a7ee6d5cd9bf38548026e0dd52fbc4634030d · BIRD/bird · GitHub.
It also relies on "newaddrs" message to construct the prefix routes. Learning host route is postponed till the next route sync:
2019-12-27 21:59:50 <ERR> KRT: Received route 2a02:6b8:6::/64 with unknown ifindex 0 .. 2019-12-27 22:00:05 <TRACE> kernel1: Scanning routing table 2019-12-27 22:00:05 <TRACE> kernel1: 2a02:6b8:6::6/128: [alien] created
Changes
IPv4
Newaddrs
OLD:
RTM_NEWADDR: address being added to iface: len 164, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> vlan2:52.54.0.42.f.ef 10.1.0.1 10.1.0.255
NEW:
got message of size 124 on Mon Dec 30 22:56:57 2019 RTM_NEWADDR: address being added to iface: len 124, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan2:52.54.0.14.e3.19 10.1.0.1 10.1.0.255
Changes:
- proper netmask AF
Prefixroute
OLD:
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.1.0.0 link#3 255.255.255.0
NEW:
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.1.0.0 link#3 255.255.255.0
Changes:
- RTF_DONE is set
IPv6
"newaddrs" message:
OLD:
RTM_NEWADDR: address being added to iface: len 164, metric 0, flags:<HOST> sockaddrs: <NETMASK,IFP,IFA> link#0 vlan2:52.54.0.42.f.ef 2a02:6b8:6::6
NEW:
RTM_NEWADDR: address being added to iface: len 140, metric 0, flags:<HOST> sockaddrs: <NETMASK,IFP,IFA> ffff:ffff:ffff:ffff:: vlan2:52.54.0.14.e3.19 2a02:6b8:6::6
Changes:
- Netmask has proper AF
"Hostroute" message:
OLD:
RTM_ADD: Add Route: len 272, pid: 0, seq 0, errno 0, flags:<UP,HOST,STATIC> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 2a02:6b8:6::6 link#0 ffff:ffff:ffff:ffff::
NEW:
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,HOST,DONE,STATIC,PINNED> locks: inits: sockaddrs: <DST,GATEWAY> 2a02:6b8:6::6 link#3
Changes:
- RTF_DONE is set, along with RTF_PINNED, which is the real flag set on the route by the ifa_maintain_loopback_route()
- No netmask SA
"Prefixroute" message:
OLD:
RTM_ADD: Add Route: len 344, pid: 0, seq 0, errno 0, flags:<UP,DONE> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA> 2a02:6b8:6:: link#3 (255) ffff ffff ffff ffff ffff ffff ffff vlan2:52.54.0.42.f.ef 2a02:6b8:6::6
NEW:
RTM_ADD: Add Route: len 272, pid: 0, seq 0, errno 0, flags:<UP,DONE> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 2a02:6b8:6:: link#3 ffff:ffff:ffff:ffff::
Changes:
- rtm_index is filled in (not visible here)
- IFA /IFA sockaddrs has been removed. Other route messages does not contain this info, as it is excessive - can be easily obtained by getifaddrs(3).
BEFORE - bird:
2019-12-27 21:59:50 <ERR> KRT: Received route 2a02:6b8:6::/64 with unknown ifindex 0 .. 2019-12-27 22:00:05 <TRACE> kernel1: Scanning routing table 2019-12-27 22:00:05 <TRACE> kernel1: 2a02:6b8:6::6/128: [alien] created
AFTER - bird:
2019-12-30 23:01:12 <TRACE> static1 < interface vlan2 goes up 2019-12-30 23:01:12 <TRACE> direct1 < primary address 2a02:6b8:6::/64 on interface vlan2 added 2019-12-30 23:01:12 <TRACE> direct1 > added [best] 2a02:6b8:6::/64 dev vlan2 2019-12-30 23:01:12 <TRACE> kernel1 < rejected by protocol 2a02:6b8:6::/64 dev vlan2 2019-12-30 23:01:12 <TRACE> kernel1: 2a02:6b8:6::6/128: [alien async] created