Page MenuHomeFreeBSD

jail: network epoch protection for IP address lists
ClosedPublic

Authored by glebius on Dec 8 2021, 6:43 PM.
Tags
None
Referenced Files
F102894174: D33339.diff
Mon, Nov 18, 10:22 AM
Unknown Object (File)
Tue, Oct 29, 5:04 AM
Unknown Object (File)
Oct 7 2024, 9:43 AM
Unknown Object (File)
Oct 3 2024, 3:09 PM
Unknown Object (File)
Oct 3 2024, 11:17 AM
Unknown Object (File)
Oct 3 2024, 2:05 AM
Unknown Object (File)
Oct 2 2024, 6:15 PM
Unknown Object (File)
Oct 2 2024, 2:03 PM

Details

Summary

To speed up packet processing and also avoid grabbing the prison
mutex in SMR section (a regression from de2d47842e8) allocate and
free prison address lists with the network epoch protection:
o Provide functions prison_ip_alloc()/prison_ip_free() that would

prepend epoch context to the array of addresses.

o Merge prison_restrict_ip[46] into address family agnostic

prison_ip_restrict().  This new function also differs from the
old ones that it doesn't edit address arrays in place, but rather
always allocates (or uses pre-allocated) new array of addresses,
to guarantee stability of the old list to epoch-protected readers.

o prison_check_ip[46]_locked now assert thay either prison mutex is held

or we are in the network epoch.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 43357
Build 40245: arc lint + arc unit

Event Timeline

I like the idea overall, but IMO this should go further and try to encapsulate the address lists properly. That requires more churn, but all of it is confined to kern_jail.c, I believe.

sys/kern/kern_jail.c
567

Why not use atomic(9)? I strongly prefer not to have two sets of intrinsics for the same purpose.

More fundamentally, using ck_pr/equivalent atomic(9) here does not accomplish anything other than preventing compiler reordering of the stores. At least a barrier is needed to prevent other CPUs from observing stores out of order.

Can we not encapsulate all three of the array, array length, and epoch context in a structure (one for each AF), and allocate and free them all together? That seems simpler and won't require this tricky synchronization. That is, have

struct prison_ip4_list {
    epoch_context_t ctx;
    int len;
    struct in_addr a[];
};

and have each prison hold a pointer to the current list.

675

Extra newline.

sys/kern/kern_jail.c
583

prison_qcmp_v4 is in in_jail.c, "optional inet". And likewise prison_qcmp_v6 is "optional inet6." Most of the code in this function is generic, but those will need some #ifdef work, or possibly being passed as arguments.

sys/kern/kern_jail.c
567

Can we not encapsulate all three of the array, array length, and epoch context in a structure (one for each AF), and allocate and free them all together? That seems simpler and won't require this tricky synchronization. That is, have

struct prison_ip4_list {
    epoch_context_t ctx;
    int len;
    struct in_addr a[];
};

and have each prison hold a pointer to the current list.

This is doable and I like it. That would require massive rewrite, since many functions manipulate pr_ip4 and pr_ip6 arrays. I will post an updated patch, but that would take several days.

583

prison_qcmp_v4 is in in_jail.c, "optional inet". And likewise prison_qcmp_v6 is "optional inet6." Most of the code in this function is generic, but those will need some #ifdef work, or possibly being passed as arguments.

Thanks for noticing. With updated patch (see reply to Mark) I will take care of NOINET and NOINET6 builds.

sys/kern/kern_jail.c
567

Am I right in thinking that this would all be confined to kern_jail.c and in_jail.c?

sys/kern/kern_jail.c
567

Yes, 90% would happen in kern_jail.c and in_jail.c and in6_jail.c would shrink a lot. And nothing touched outside of these files.

sys/kern/kern_jail.c
567

Later we would be able to remove pr_mtx acquisition inside accessor functions, but that would require callers to guarantee network epoch protection. That would require some sweeping changes, if we really want to go for it.

A new version which will have a single structure holding epoch context,
address count and array of addresses. It also generalizes lots of v4/v6
code into address agnostic.

sys/sys/jail.h
181

The removed ip_ip4s and pr_ip6s should probably be replaced by some spares to preserve struct offsets. Or maybe a version bump (and then could IMHO remove the other spares).

sys/sys/jail.h
181

What ports kernel modules are known to use struct prison and dereference its fields directly, instead of jail.h KPI?

sys/sys/jail.h
181

I don't really know that any actually exist; it's more of a habit I've always been in, that I think was in place when I started. But then, I'm not familiar with the ports codebase. I know that within the kernel itself, there's a lot of bare use of pr_allow and pr_hostname, and wouldn't be surprised if that was reflected elsewhere.

glebius added inline comments.
sys/sys/jail.h
181

I'd rather not add placeholders (I'd even vote for deleting pr_pspare[3], if asked) and discourage non KPI use of the structure. In the last years we are aiming towards this policy (already done to struct socket, struct inpcb, *pcb, struct ifnet). The fact that we have it hidden under defined(_WANT_PRISON) tells me that same policy is also desired for prison. Cc @pjd who added _WANT_PRSION.

@jamie I actually got one more question about this code. The code that checks for IP addresses clashing, in the old code under comment that starts "Check for conflicting IP addresses", in the new code separated into function prison_ip_conflict_check(), it would not allow to create a child that has IP address of a parent if parent has multiple addresses:

root@bobrik:/home/glebius:|>jail -c  host.hostname=jopa2 ip4.addr=127.0.0.2,127.0.0.3 children.max=10 command=/bin/sh
root@jopa2:/:|>jail -c  host.hostname=jopa3 ip4.addr=127.0.0.3 command=/bin/sh
jail: IPv4 addresses clash

So my patch doesn't change this behavior, but it seems counter-intuitive to me. And the comment actually says it is intended:

If there is a duplicate on a jail with more than one IP stop checking and return error.

Why so?

@jamie I actually got one more question about this code. The code that checks for IP addresses clashing, in the old code under comment that starts "Check for conflicting IP addresses", in the new code separated into function prison_ip_conflict_check(), it would not allow to create a child that has IP address of a parent if parent has multiple addresses
...
So my patch doesn't change this behavior, but it seems counter-intuitive to me. And the comment actually says it is intended:

If there is a duplicate on a jail with more than one IP stop checking and return error.

Why so?

This is probably better answered by bz (who is also watching this), since I'm not certain myself.

My best understanding is that the networking code doesn't know how the handle the case of multiple jails listening on wildcards that cover the same addresses. Since the single-address case collapses the wildcard to the jail's single address, duplicates are handled by the existing machinery that understands single-address conflicts. That's hinted at in the IN[46]_IS_ADDR_UNSPECIFIED check in prison_local_ip[46].

sys/sys/jail.h
181

I wasn't thinking userland - I'm pretty sure nothing there uses struct prison. I thought this was for kernel modules that refer to struct prison, and making sure they don't link into a kernel with an altered structure.

sys/sys/jail.h
181

We guarantee binary compatibility for kernel modules only in the stable branches. In the main branch we change structures all the time. This change is not planned for merging.

If I try to run a poudriere bulk -a with this patch applied, I see:

[00:01:49] Starting/Cloning builders
jail: IPv6 addresses clash
jail: IPv6 addresses clash
jail: IPv6 addresses clash
jail: IPv6 addresses clash
...
[00:01:50] Cleaning up
main-main: removed
main-main-n: removed
[00:01:52] Unmounting file systems

and the run is aborted.

sys/sys/jail.h
441
sys/kern/kern_jail.c
1230

ip6 = prison_ip_copyin(PR_INET6, op, M_WAITOK);

This is the culprit. Must be ip6 count, not M_WAITOK. Don't know how did that happen. Wish we had a typedef for malloc flags.

Two bugfixes:

  • Use ip6 count as argument to prison_ip_copyin(), not stupid typo
  • prison_ip_parent_match() can be legally called with NULL ppip
This revision is now accepted and ready to land.Dec 19 2021, 9:27 PM