Page MenuHomeFreeBSD

if_enc: pullup for ip header if m_len == 0
Needs ReviewPublic

Authored by igoro on Jun 27 2024, 8:40 PM.
Tags
None
Referenced Files
F102544333: D45762.diff
Wed, Nov 13, 8:48 PM
Unknown Object (File)
Oct 3 2024, 10:00 PM
Unknown Object (File)
Sep 25 2024, 6:26 AM
Unknown Object (File)
Sep 25 2024, 4:23 AM
Unknown Object (File)
Sep 22 2024, 11:30 AM
Unknown Object (File)
Sep 8 2024, 4:16 AM
Unknown Object (File)
Sep 7 2024, 4:03 PM
Unknown Object (File)
Aug 20 2024, 5:02 PM

Details

Reviewers
None
Group Reviewers
network
Summary
if_enc(4) can pass IPsec payload to pfil(9) with the outer header or without
it. In case of a small packet like ICMP, when mbuf cluster is not used,
everything works fine. Otherwise, the first mbuf in a chain has m_len == 0
if it is asked to strip the outer header. Both ipfw and pf usually do not
expect such situation, and they may end up reading the outer header. It
leads to unexpected behavior of the firewalls.
Test Plan
# kldload ipsec if_enc pf
# kyua test -k /usr/tests/sys/net/Kyuafile if_enc

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

igoro requested review of this revision.Jun 27 2024, 8:40 PM

I’ve been working on this PR:
[14.0 CURRENT] PF recognizes encrypted IPSec traffic as coming from WAN. | NAT with IPsec Phase 2 Networks
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273198

My initial hypothesis is that the issue mentioned is caused by a misconfiguration. But later I found that I cannot do a usual pf port forwarding for enc0 traffic. The research led me to the conclusion mentioned in the summary.

I was thinking about the place of the fix. My current reasoning is as follows:

  • it looks to be a generic thing for both main firewalls to expect non-empty first mbuf
  • adding such case handling on the firewall side could alter performance or existing stability
  • ipsec side is also could avoid changing if possible
  • if_enc itself looks to be a good place for a change, located for its users only

Another question is the way of fixing. m_pullup looks to be a good enough solution, the standard KPI. The performance optimized custom way (presumably) could be considered in future if there are business needs.

Also, it looks to be more appropriate to limit it only for the m_len == 0 cases.

“IPv4 only patch” seems to be good enough for the start of discussion.

AFAIK, firewalls were always m_pullup-ing for themselves as much as they need, regardless of which interface or layer gave them a packet. Why doesn't it work for enc?

I think it is firewall problem when it can not handle some unexpected data. Pfil hook expects that mbus has M_PKTHDR and m->m_pkthdr.len in this case should not be 0, even when m_len is 0. Thus, I think if doesn't work properly, it should be fixed in firewall.

Yeah, having such case unhandled, for years I guess, makes a feeling of some intention there. That's the source of my doubts. Thanks for sorting this out.

The next iteration goes on the pf side: https://reviews.freebsd.org/D45780

I think you should had not abandon this revision. enc(4) creates suboptimal packets, and this should be improved. Hence I suggested to use __predict_false() in the pf(4) review.

Okay, sure.

Do you expect it to be improved like it's proposed in the patch? (some cleanup is needed I guess, and the test case does not make sense now)
Or you have another vision in mind?

I think you should had not abandon this revision. enc(4) creates suboptimal packets, and this should be improved. Hence I suggested to use __predict_false() in the pf(4) review.

it is not if_enc(4), as I understand, such packets can be produced by m_striphdr() in ipsec_input.c

Ideally, the packets should be produced in a manner that later would not require m_pullup() neither in the driver, nor in a firewall. If we are expecting header(s) to be prepended to the packet, then we should allocate a new mbuf with empty space in front. For historical reasons this is called "aligning" of mbufs.

I've invested into detailed analysis. tl;dr: It seems nothing needs changing for now on ipsec/if_enc side.

The output path:
  • The ipsec4_perform_request() meets a usual outgoing mbuf for a TCP SYN packet with options:

<lspace=16, m_len=60>
, where max_hdr (56 bytes by default) - ip+tcp headers (40 bytes) = 16 bytes left for the link layer

  • HHOOK_TYPE_IPSEC_OUT hhook is called for the IPSEC_ENC_BEFORE case
  • ipsec_encap() / ipsec_prepend() finds out that there is not enough leading space for an IP header. Hence, it prepends a new mbuf:

<lspace=56, m_len=20>, <lspace=16, m_len=60>

  • HHOOK_TYPE_IPSEC_OUT hhook is called for the IPSEC_ENC_AFTER case
The input path

As long as if_enc.sh does the in-kernel round-trip we get the following input path:

  • ipsec4_common_input_cb() meets the same mbuf chain, after IPsec+IV header removal:

<lspace=56, m_len=20>, <lspace=16, m_len=60>

  • HHOOK_TYPE_IPSEC_IN hhook is called for the IPSEC_ENC_BEFORE case
  • m_striphdr() removes the outer IP header:

<lspace=56, m_len=0>, <lspace=16, m_len=60>

  • HHOOK_TYPE_IPSEC_IN hhook is called for the IPSEC_ENC_AFTER case
The thoughts

I have the following reasoning for now:

Output:

  • It looks that the output is not expected to call hhooks with m0.m_len=0. Only if something creates such a chain before the ipsec, that's out of this scope.
  • As a takeaway, we could consider additional improvement here, not to allocate an extra mbuf and not to move pkthdr, etc. But it means to change max_hdr default (it could be quite disputable) or something else. I guess pro users of IPsec are changing it via sysctl already.

Input:

  • It seems to be non-expected to receive such mbuf split upon the input path, only if something creates it specifically before the ipsec.

In general:

  • It sounds impractical to improve the in-kernel round-trip case with possible trade-offs for real world cases, until it's highlighted by someone as a production need (highly unlikely).
  • We could think of some forwarding case, but anything may happen with a chain before it hits ipsec. I guess, if we find such a case we will apply the same rule to that module -- it should not provide suboptimal mbufS.

Firewall:

  • The firewalls still require fixing. As a result of the previous points, having the __predict_false() on their side appears to be a practical thing.

What do you think? Does it help to uncover the topic?