Page MenuHomeFreeBSD

[2/2] kern: cpuset: properly rebase when attaching to a jail
ClosedPublic

Authored by kevans on Nov 20 2020, 7:53 PM.
Tags
None
Referenced Files
F102705107: D27298.diff
Sat, Nov 16, 2:57 AM
Unknown Object (File)
Sat, Oct 26, 10:34 AM
Unknown Object (File)
Oct 4 2024, 3:26 PM
Unknown Object (File)
Oct 4 2024, 4:51 AM
Unknown Object (File)
Oct 3 2024, 10:53 PM
Unknown Object (File)
Oct 1 2024, 8:26 PM
Unknown Object (File)
Oct 1 2024, 5:40 PM
Unknown Object (File)
Sep 29 2024, 11:00 AM
Subscribers

Details

Summary

The current logic is a fine choice for a system administrator modifying process cpusets or a process creating a new cpuset(2), but not ideal for processes attaching to a jail.

Currently, when a process attaches to a jail, it does exactly what any other process does and loses any mask it might have applied in the process of doing so because cpuset_setproc() is entirely based around the assumption that non-anonymous cpusets in the process can be replaced with the new parent set.

This approach slightly improves the jail attach integration by modifying cpuset_setproc() callers to indicate if they should rebase their cpuset to the indicated set or not (i.e. cpuset_setproc_update_set).

If we're rebasing and the new root set really is different from the current process's root (perhaps due to a race while we're dropping the proc lock), then allocate an extra cpuset to use as the process's new base set with its parent being the jail's set with the process base mask and domainset applied.

We avoid actually creating a new base set as long as all of the threads in the process are using either an anonymous set or the currently-containing prison's set, indicating that there's really nothing special happening.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 34942

Event Timeline

Slightly less silly version after increased coffee intake; we don't need to inspect every thread in the process to decide, we just need to inspect the base of the first thread... I have a series of cpuset(2) tests that I'm finishing up that also provide sanity checks for the four scenarios we care about (along with some other non-jail-related):

1.) Process attaches to a jail with its own cpuset+mask (e.g. cpuset -c -l 1,2 jail -c path=/ command=/bin/sh)
2.) Process attaches to a jail with its own cpuset (e.g. cpuset -c jail -c path=/ command=/bin/sh)
3.) Process attaches to a jail with the containing jail's root cpuset (e.g. jail -c path=/ command=/bin/sh)
4.) Process attaches to a jail with the containing jail's root cpuset+mask.

#1 and #2 should yield a new base cpuset separate from the jail cpuset, since they were given a distinct cpuset for whatever reason -- we assume someone may already know they'll want to manipulate it, perhaps.

#3 and #4 should yield no new base cpuset, both take the new jail's cpuset and create the appropriate anonymous cpuset for the #4 case.

The tests also test for the small change in cpuset_setproc_setthread(), and I'll likely commit that fix separately. The consequence of using tdset instead of set as the new parent is that cpuset_getid(CPU_LEVEL_ROOT, CPU_WHICH_TID, -1, ...) could return the thread's previous jail's root set instead of the new jail's root set, as the parent chain leads back through the thread's original set. The reproducing circumstances should be just as easy as set an affinity, attach to the jail, then execute the above.

sys/kern/kern_cpuset.c
1199

[I'm still wrapping my head around cpuset+jails as I'm not very familiar with how they interact and so might be asking some misguided questions.]

If we're here because a process is attaching to a jail, set is the jail's cpuset. Jail cpusets are always root cpusets, so I'd expect nroot == set. Is that right? Then, how can tdroot != nroot ever be false in this scenario?

sys/kern/kern_cpuset.c
1199

Correct, nroot == set in that case. It could be the case that tdroot != nroot if the process was previously (or becomes, in the case of below where we're dropping the proc lock) assigned a cpuset that descends from the jail's root, for whatever reason. We don't actually forbid the possibility, just that a process inside the jail cannot see sibling or parent cpusets of the jail's root.

Now that I say that out loud, I wonder if we shouldn't bail out just after this loop if rebase && tdroot == nroot. If the process already had a cpuset inside the jail, I don't necessarily see a reason we should change anything.

sys/kern/kern_cpuset.c
1199

Sorry, that should read "It could be the case that tdroot == nroot ..."

As I had hoped, it took away the expected problem of attaching to a jail (when the process doesn't have its own visible cpuset), and ending up with the process still having its old root cpuset (though under a new anonymous masked bit). That was the real problem I see with the current setup (not that what I didn't see aren't problems as well, but at least there was something I noticed ;-).

But I was kind of thinking that when the process had its own cpuset, it would retain that number. I guess that's not particularly important, since you can find the new cpuset ID from the process, and it would only work if the process was the set's only ref. Maybe I'm just more hung on known numeric IDs than I need to be - a bad habit from jails that I need to lose.

The only thing I don't get is:

If we're rebasing and the new root set really is different from the current process's root (perhaps due to a race while we're dropping the proc lock), then allocate an extra cpuset to use as the process's new base set with its parent being the jail's set with the process base mask and domainset applied.

Isn't that what happens in this case though? It seems a fairly regular and expected thing: you give the process a cpuset for whatever reason you want, and then you attach it to the jail. It doesn't seem like the "perhaps a race" exceptional condition you mention, but then perhaps you're talking there about something else entirely.

Anyway, this seems to be mostly musing to myself about my own misunderstandings. It all works well.

But I was kind of thinking that when the process had its own cpuset, it would retain that number. I guess that's not particularly important, since you can find the new cpuset ID from the process, and it would only work if the process was the set's only ref. Maybe I'm just more hung on known numeric IDs than I need to be - a bad habit from jails that I need to lose.

Right, I think we could make it work if it's the set's only ref, but then you run into the problem that you have to check the ID anyways unless you're 110% sure that that really was the only ref.

The only thing I don't get is:

If we're rebasing and the new root set really is different from the current process's root (perhaps due to a race while we're dropping the proc lock), then allocate an extra cpuset to use as the process's new base set with its parent being the jail's set with the process base mask and domainset applied.

Isn't that what happens in this case though? It seems a fairly regular and expected thing: you give the process a cpuset for whatever reason you want, and then you attach it to the jail. It doesn't seem like the "perhaps a race" exceptional condition you mention, but then perhaps you're talking there about something else entirely.

Ah, sorry, I should have noted it somewhere- the race that annoys me is when you're creating a non-persistent jail and you want to alter the cpuset of the process that's creating and attaching to the jail. Right now you have to either alter the process to do cpuset stuff after attaching, or independently cpuset(1) it after it's attached and hope that it hasn't created new processes and a game of whack-a-mole. Alternatively, you can just restrict the whole jail, but that's perhaps also reliant on the process being a long-lived one since you can't specify a mask for the jail up front (which is maybe another neat feature).

Both of the latter two scenarios have the additional problem that you can cpuset the process/jail mask down to whatever you want, but if it uses cpuset to collect the real number of CPUs available to it early on then it will be working with stale information.

This revision is now accepted and ready to land.Nov 23 2020, 9:36 PM