Overview
Current ip6_output() behaviour is not consistent across cached and non-cached lookup versions (followup of D18769).
For example, TCP retransmits (and to some extent to the normal TCP) for the local connections looks the following:
13:58 [0] m@devel2 ifconfig vtnet0 inet6 vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4c04bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6> inet6 fe80::5054:ff:fe14:e319%vtnet0 prefixlen 64 scopeid 0x1 inet6 2a01:4f8:13a:70c:ffff::8 prefixlen 96 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> telnet 2a01:4f8:13a:70c:ffff::8 22 ... ## dtrace probe checking ifp & originifp @ ip6_output_send(): # First TCP SYN: * TX ifp=lo0 origifp=vtnet0 2a01:4f8:13a:70c:ffff::8 # Second TCP SYN: * TX ifp=lo0 origifp=vtnet0 2a01:4f8:13a:70c:ffff::8 # Third TCP SYN: * TX ifp=lo0 origifp=lo0 2a01:4f8:13a:70c:ffff::8
Apart from being inconsistent, it also adds complexity to the recently-added source address validation (D32915).
So, what happens here?
Let's start with a small background - what is originifp and why it is needed?
As opposed to IPv4 world (mostly), IPv6 has a concept of scopes (e.g. non-overlapping zones which an address can belong to). One of such scopes is link-local scope (e.g. link-local address is only "valid" within the link). Traditionally we shortcut traffic to the local addresses via loopback interface, instead of relying on the L2 output route (or the actual NIC) to do the loop. In order to support this shortcut for IPv6 link-local, one needs to somehow pass the original zone/interface to the loopback input, so ip6_input() can properly work. This is what origins is used for - passing the "address" interface. For the sake of simplicity, it is used for all IPv6 traffic, not just link-local one.
Let's look into the dtrace results once again. The first one (ifp=lo0, origifp=vtnet0) is exactly what is expected - transmit interface is loopback, and the original interface is properly retained.
However, this result is achieved in a non-obvious manner. In the middle of ip6_output(), at the routing lookup phase, in6_selectroute() is called.
It returns the correct nexthop, specific for the address in question (2a01:4f8:13a:70c:ffff::8), with proper nh_ifp=lo0& nh_aifp=vtnet0. Surprisingly, the ifp returned by the in6_selectroute() is vtnet0 instead of expected lo0. (In fact, in6_selectroute() explicitly returns nh_aifp in case of successful lookup).
It is changed once again in the Check for valid scope ID section - originifp becomes ifp and ifp is set to be ia->ia_ifp. The latter ia is derived from the same nexthop and is currently ::1 for such routes, but can change in the future, so it looks pretty fragile.
The second result looks exactly like the first, so we jump to the third result for a second. There the origifp suddenly becomes lo0. It happens because the nexthop finally get cached in the inpcb and the call to in6_selectroute() is avoided. Thus, ifp starts with lo0 and the machinery described above results in both origifp and ifp to become lo0.
Why the nexthop is not cached immediately? Because in6_selectroute() (and underlying selectroute()) updates the nexthop in the provided inpcb route, but does not update the route generation id (inp_rt_cookie). Next validation simply wipes the cached nexthop as the route generate id is wrong.
Proposed solution
The proposed idea is relatively simple and is composed of two actions. First is explicitly filling in proper ifp and origifp at the route lookup stage. The second is simplifying source/destination scope Id checks, as no actions other that pass/fail are expected.