Page MenuHomeFreeBSD

IPv4: fix redirect sending conditions
ClosedPublic

Authored by bz on Dec 5 2021, 5:17 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 9, 3:39 AM
Unknown Object (File)
Fri, Jan 3, 7:23 AM
Unknown Object (File)
Mon, Dec 30, 9:19 AM
Unknown Object (File)
Fri, Dec 20, 10:51 PM
Unknown Object (File)
Dec 9 2024, 4:39 AM
Unknown Object (File)
Dec 6 2024, 1:49 AM
Unknown Object (File)
Nov 29 2024, 8:48 AM
Unknown Object (File)
Nov 25 2024, 2:22 AM

Details

Summary

RFC792,1009,1122 state the original conditions for sending a redirect.
RFC1812 further refine this.
ip_forward() still sepcifies the checks originally implemented for these
(we do slightly more/different than suggested as makes sense).
The implementation added in 8ad114c082a159c0dde95aa35d2e3e108aa30a75
to ip_tryforward() however is flawed and may send a "multi-hop"
redirects (to a host not on the directly connected network) even with
a "This host on this network" destination (which makes no sense at all).

Do proper checks in ip_tryforward() to stop us from sending redirects
in situations we may not. Move almost all logic from ip_tryforward()
into ip_redir_alloc() apart from the sysctl check to see if redirects
are disabled.

While here enhance and fix comments as to which conditions are handled
for sending redirects in various places.

Reported by: pi (on net@ 2021-12-04)
MFC after: 3 days

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 43549
Build 40437: arc lint + arc unit

Event Timeline

bz requested review of this revision.Dec 5 2021, 5:17 PM
sys/netinet/ip_fastfwd.c
476–487

This creates extra lookup on every packet even if redirects are disabled. Many disable them in production environments. Is it possible to code like this?

sys/netinet/ip_fastfwd.c
476–487

Agreed on the per-packet which was there before the code was rewritten, broken, and a broken fix added I believe.

I really hate embedded lookups like this as they are just messy lines.
I'd rather fold them all together but I still haven't figured out what the current code in ip_redir_alloc() was supposed to do. I can see where it came from looking at ip_forward().

So put it all behind

if (V_ipsendredirects) {
  ...
}

as the ifp comparison should go away in the long way (in both cases here and ip_forward) as indicated in the comment above and then probably also change the order to do all non-lookup code first before doing the actual lookup? It's just all details which went wrong over multiple iterations since at least stable/11.

Ideally I'd also not have a splattered V_ipsendredirects check in multiple files but that would mean incurring a function call and that is worse.

sys/netinet/ip_fastfwd.c
476–487

The idea behind ip_redir_alloc() is simple - the situation when we need to allocate mbuf / send redirect is a corner case, hence we move it away to not bloat the main function /icache.

I'd rather suggest keeping the current condition logic inside ip_forward() as is and implement the tweaked business logic inside ip_redir_alloc(), simply returning NULL when sending redirect is not required.

sys/netinet/ip_fastfwd.c
476–487

I have that separate ip_redir_alloc() logic but not tested yet; and that won't happen anymore tonight; so I'll send it tomorrow;

What I meant with ip_redir_alloc() is not the duplication of the mbuf parts, which is understaood; if is the 2nd bit to decide which "new Gateway Internet Address" to report back; we don't set it in many cases and still report back the mcopy as to trigger the redirect; also both ip_redir_alloc() and ip_forward() set the new Gateway Internet Address to either the gateway from the lookup or the destination address of the packet which makes not much sense as it can be at the other end of the world 15 hops away; it further isn't always the original address anymore anyway in the current code, so we may send a redirect to the address of www.freebsd.org for what it could matter; hence my comments about historically grown misbehaviour (unless I totally misunderstand what the code is supposed to do).

sys/netinet/ip_fastfwd.c
476–487
> if (V_ipsendredirects) {
>   ...
> }

That will also work!

rgrimes added a subscriber: rgrimes.

Pi, thank you for spotting this, and bz thanks for the quick fix. This explains some issues I had when I tried to migrate to 13 on a router, but had no time to investigate what was going wrong.

This revision is now accepted and ready to land.Dec 6 2021, 5:57 PM

Update; migrate all the logic but the sysctl check to see if
redirects are enabled into ip_redir_alloc() and out of ip_tryforward().

Still still does not yet harmonize the code with the one in
ip_forward().

This revision now requires review to proceed.Dec 6 2021, 11:44 PM
bz added a subscriber: karels.
sys/netinet/ip_fastfwd.c
148

This "else" case probably was never needed/fully correct from day one of the code entering CSRG and we should not send a redirect with that as new gateway as it would only happen as a result of a bug IP stack of the source hosts to my understanding; discussed with @karels in private email 2021-12-06.

Should still update ip_forward() with that change or rather call ip_redir_alloc() there; will do that separately as the code there will at least not send a redirect if it should not.

sys/netinet/ip_fastfwd.c
240

extra space to remove

sys/netinet/ip_fastfwd.c
218

If SNAT is applied by pfil(9) and the source address of the packet is changed, then ICMP redirect to original source seems causing side effects.

I'd prefer do NOT send ICMP redirect in this case.

And also a secondary route table lookup can be eliminated.

477

If SNAT is applied by pfil(9) and the source address of the packet is changed, then ICMP redirect to original source seems causing side effects.

I'd prefer do NOT send ICMP redirect in this case.

sys/netinet/ip_fastfwd.c
477

Probably a good check to add; we hadn't dealt with that in the past and I only made the original address (as much as we can still tell) the reference point. I'll update the patch in a few minutes.

add an additional source NAT check and if ip_src does not change
we do not need to do the 2nd route lookup either until we actually
deal with multiple interfaces correctly.

bz marked 5 inline comments as done.Dec 7 2021, 2:09 PM

Anyone else? Looks okay to go in?

sys/netinet/ip_fastfwd.c
153

Nit: shouldn't this be AF_INET rather than PF_INET?

sys/netinet/ip_fastfwd.c
135

Could you elaborate on why this is flawed? I'm not sure what is the problem with ROUTE_MPATH.

172

I'm not sure if this is correct. It is possible (but not straight-forwarded ATM) to specify a preferred source address that (1) does not belong to the transmit interface OR (2) belongs to the same interface but is not on the same subnet as the gateway.
I'd rather iterate over the IPv4 addresses on the interface and check if there is a match,

476–478

I'd really love to see the interface check being the second one instead of src check, so we don't call redir alloc on every packet.

bz marked 3 inline comments as done.
  • Move ifp check into caller (despite function probably being inlined)
  • PF_INET to AF_INET
  • Correct a comment which is wrongly numbered in the RFC itself.
sys/netinet/ip_fastfwd.c
135

Well at the moment there is no full first-hop mpath so that's the end of this there yet.
Without that you can have the following (lab setup to show):

epair2a: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        inet 198.51.100.1 netmask 0xffffff00 broadcast 198.51.100.255
epair4a: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        inet 198.51.100.10 netmask 0xffffff00 broadcast 198.51.100.255

Historically, while this was possible, the "alias" on the second interface would have been a /32 (the routing code apparently still does, see below) but with MPATH that is not actually needed anymore as we can have routes with multiple next-hop (interfaces/links).

Your routing looking like this:

# netstat -rn4
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            203.0.113.254      UGS     epair5a
127.0.0.1          link#1             UH          lo0
192.0.2.0/24       198.51.100.2       UGS     epair2a
198.18.0.0/15      198.51.100.5       UGS     epair2a
198.51.100.0/24    link#2             U       epair2a
198.51.100.1       link#2             UHS         lo0
198.51.100.10      link#3             UHS         lo0
203.0.113.0/24     link#4             U       epair5a
203.0.113.1        link#4             UHS         lo0

Host S sends a packet from 198.51.100.2 to 198.18.7.42 via 198.51.100.10 (either because of a static route or an MPATH route weighted between .1 and .10 as next-hops).
The packet comes in on interface epair4a will not generate a redirect for the destination route 198.18.0.0/15 with the recvifp == nh_ifp check despite being on the same subnet.

Now the the routing table currently does not reflect that either.
Once we'd support connected routes in MPATH as well, we'll find to have the same problem with a packet coming in one and going out the other interface, just "weighted".

And we are still talking about the same subnet on multiple interfaces here (compared to multiple logical subnets on a single interface which is mentioned in the NB further down and currently not considered).

170

Actually (14) but mis-numbered in the RFC.

172

You still assume there is only 1 interface?

476–478

Suggestion: I'll move that change here for the moment and we stay strictly RFC compliant and rather handle some more packets in/out than sending the redirect for now; if you fix the mpath code, then you can re-consider this?

sys/netinet/ip_fastfwd.c
135

Thank you for the detailed explanation!

Let me try to describe my perception of the multipath for the interface routes and redirects.

Firstly, the setup is, well, weird. You should not have a case when the same L2 domain is reachable via multiple L3 routed interfaces. Typically you solve such cases by using lagg (increasing capacity, doing load balancing/failover) or bridge (figuring which devices are behind which port and doing L2 stiching). Without those it is harder for the system to decide which interface to use for outgoing traffic in the general case, w/o some aid from the firewall.
Nothing (in the routing code) prevents doing that, it's explicitly disabled for now as other parts of the stack need to support this properly. I'd rather wait for the compelling use case for that, as implementation will add some complexity at least to the arp code (and arp+bridge is already a mess). Maybe we won't need it at all if IPv6 adoption will progress fast enough.

Though, the support for such setup has been added to Linux a while ago (before 2.6.17 IIRC). The feature requires tweaking ARP stack behaviour (arp_filter=1). I've spent some time searching for the usecases. My impression is that the vast majority of the usecases revolves around splitting the L2 domains without actualy splitting the prefix. The classic example is attaching multiple VMs to the same /24 network, without using a bridge.
In this case, we'd actually do not want to send redirects, as there is no real L2 connectivity between VMs.

Similarly, from what I see in Linux code, is that they always require src==dst to trigger redirect message: net/ipv4/route.c

With all that in mind, I'd vote to make a simpler implementation and not handle such use case in any specific way.

sys/netinet/ip_fastfwd.c
135

In that case (modulo the comments and #if 0 block) the current implementation should be what you want?

This revision is now accepted and ready to land.Dec 14 2021, 12:43 AM
zlei added inline comments.
sys/netinet/ip_fastfwd.c
477

Should be nh->nh_ifp == m->m_pkthdr.rcvif IIUC.

477

Probably a good check to add; we hadn't dealt with that in the past and I only made the original address (as much as we can still tell) the reference point. I'll update the patch in a few minutes.

I did not realize the SNAT issue before I see the new local variable osrc .
Reconsidered the NAT issue I think it is more complicated than I thought.

NAT and related scenes:

  1. SNAT, source address changed
  2. DNAT, destination address changed
  3. PAT, source or destination port changed
  4. other stuff such as TTL / QoS changed.

As the original source and destination hosts do not know the transparent translation( NAT / PAT , etc. ), an ICMP redirect would prevent the source host sending following packets through this firewall and thus the connection flow is disturbed.

Maybe we leave a TODO here to emphasis the side effects.

CC @kp

  • Fix the condition != -> ==
  • remove the not_yet comments about multi-interface or mpath and stay strictly rfc conform for now.
This revision now requires review to proceed.Dec 17 2021, 2:39 PM
bz marked 3 inline comments as done.Dec 17 2021, 2:40 PM
bz marked an inline comment as done.

Are we all good with this now?

melifaro added inline comments.
sys/netinet/ip_fastfwd.c
112–170

Nit: worth cleaning up this part as well.

This revision is now accepted and ready to land.Dec 19 2021, 11:07 PM

remove the notyet variable declarations as well.

This revision now requires review to proceed.Dec 20 2021, 1:10 PM
bz marked an inline comment as done.Dec 20 2021, 1:11 PM

In all the editiing we lost a ntohl() and would never send a redirect; that's fixed now; I tested things again. This will go in within 24h unless there'll be objections.

cy added a subscriber: cy.

looks good

This revision is now accepted and ready to land.Dec 24 2021, 12:15 AM
This revision was automatically updated to reflect the committed changes.