Page MenuHomeFreeBSD

New setcred() system call and associated MAC hooks
ClosedPublic

Authored by olce on Nov 15 2024, 5:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 9 2024, 2:04 PM
Unknown Object (File)
Dec 8 2024, 1:56 AM
Unknown Object (File)
Nov 22 2024, 4:38 AM
Unknown Object (File)
Nov 18 2024, 3:41 AM

Details

Summary

This revision is part of a series. Click on the Stack tab below to see the context.
This series has also been squeezed into D47633 to provide an overall view.

Commit message:
This new system call allows to set all necessary credentials of
a process in one go: Effective, real and saved UIDs, effective, real and
saved GIDs, supplementary groups and the MAC label. Its advantage over
standard credential-setting system calls (such as setuid(), seteuid(),
etc.) is that it enables MAC modules, such as MAC/do, to restrict the
set of credentials some process may gain in a fine-grained manner.

Traditionally, credential changes rely on setuid binaries that call
multiple credential system calls and in a specific order (setuid() must
be last, so as to remain root for all other credential-setting calls,
which would otherwise fail with insufficient privileges). This
piecewise approach causes the process to transiently hold credentials
that are neither the original nor the final ones. For the kernel to
enforce that only certain transitions of credentials are allowed, either
these possibly non-compliant transient states have to disappear (by
setting all relevant attributes in one go), or the kernel must delay
setting or checking the new credentials. Delaying setting credentials
could be done, e.g., by having some mode where the standard system calls
contribute to building new credentials but without committing them. It
could be started and ended by a special system call. Delaying checking
could mean that, e.g., the kernel only verifies the credentials
transition at the next non-credential-setting system call (we just
mention this possibility for completeness, but are certainly not
endorsing it).

We chose the simpler approach of a new system call, as we don't expect
the set of credentials one can set to change often. It has the
advantages that the traditional system calls' code doesn't have to be
changed and that we can establish a special MAC protocol for it, by
having some cleanup function called just before returning (this is
a requirement for MAC/do), without disturbing the existing ones.

The mac_cred_check_setcred() hook is passed the flags received by
setcred() (including the version) and both the old and new kernel's
'struct ucred' instead of 'struct setcred' as this should simplify
evolving existing hooks as the 'struct setcred' structure evolves. The
mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always
called by pairs around potential calls to mac_cred_check_setcred().
They allow MAC modules to allocate/free data they may need in their
mac_cred_check_setcred() hook, as the latter is called under the current
process' lock, rendering sleepable allocations impossible. MAC/do is
going to leverage these in a subsequent commit. A scheme where
mac_cred_check_setcred() could return ERESTART was considered but is
incompatible with proper composition of MAC modules.

While here, add missing includes and declarations for standalone
inclusion of <sys/ucred.h> both from kernel and userspace (for the
latter, it has been working thanks to <bsm/audit.h> already including
<sys/types.h>).

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 60976
Build 57860: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I like the overall idea. A few issues reviewing from a syscall API perspective:

  • No manpage
  • missing 32-bit compat code
  • I'm not overly sold on the compatibility model beyond the size argument and a flag mask check. I can't convince myself that enough edge cases have been covered to make it worth covering any of them.
sys/kern/kern_prot.c
529

Unfortunately the size will vary when pointer size does so size != sizeof(wcred) isn't right.

I think that ultimately you'll want ABI and version aware copyin helpers that produce the current version of struct setcred with appropriate userspace pointers and then do the copyin for groups and mac from that.

sys/kern/syscalls.master
3347

This is missing a _Contains_ptr_.

sys/security/mac/mac_cred.c
223

Blank lines after nonexistent variable declarations were removed from style(9) long ago.

sys/sys/ucred.h
130

I can see some reasons to do it, but I'm not a fan of exposing the version in the structure name used by the public API. In the likely common case the structure will be extended with things that can be zero. If you want to ensure initialization when the structure changes, then a pthread_attr style API might be appropriate (or just encourage designated initializers.

137

Maybe add a pad before sc_supp_groups_nb since there will be a pad after on 64-bit systems.

olce marked 3 inline comments as done.

Add missing 32-bit compatibility code.

Turn sc_supp_groups_nb field into some u_int, fixing improper boundary checks (ngroups_max, which is an integer, is clamped to reasonable values by some recent commit and in particular cannot be negative, so comparing it directly with sc_supp_groups_nb is OK).

olce marked an inline comment as done and an inline comment as not done.Dec 3 2024, 5:40 PM

I like the overall idea.

Thanks.

A few issues reviewing from a syscall API perspective:

  • No manpage

I will (of course) do it, but after review in case some details change.

  • missing 32-bit compat code

Yes, forgot to handle this case. Does the updated code seem OK?

  • I'm not overly sold on the compatibility model beyond the size argument and a flag mask check. I can't convince myself that enough edge cases have been covered to make it worth covering any of them.

Not sure what you mean here. The current compatibility model has been chosen precisely because it can accommodate *any* change, and it effectively relies on no more than the structure size and the flag mask (passing a version along the flags changes its nature away from just a "flag mask", but in reality does not fundamentally change its role). What are the edge cases, covered or not, that you are talking about? Please see also the inline comments on that matter.

Thanks.

sys/kern/kern_prot.c
529

Unfortunately the size will vary when pointer size does so size != sizeof(wcred) isn't right.

I think it is right from the point of view that sys_setcred() is the "native" entry point for the ABI, which isn't supposed to ever change (so pointer size will never change). But yes, I forgot the 32-bit compatibility entry point. If sys_setcred() was to be the single one for both cases, I agree testing for sizeof(wcred) (which would expand to the size of the structure for the 64-bit variant) would be wrong as it would prevent any call from 32-bit processes to work.

I think that ultimately you'll want ABI and version aware copyin helpers that produce the current version of struct setcred with appropriate userspace pointers and then do the copyin for groups and mac from that.

Indeed. I don't think we have any other practical way for now. I would love to use some automatic ABI-specific structure, but we don't have such machinery in place AFAIK.

sys/kern/syscalls.master
3347

Does _Contains_ptr_ has any relevance on a void * pointer? I'm asking because all uses of _Contains_ptr_ I have seen are with pointers to structures, which is deliberately not the case here (but might change, see the response to another comment below).

sys/security/mac/mac_cred.c
223

Not really, it still says: "Optionally, insert a blank line at the beginning of functions with no local variables". If for anything, I prefer to put some here for consistency with the rest of the file. I'm not against removing these from the whole file to comply, but as part of a separate commit.

sys/sys/ucred.h
130

I can see some reasons to do it, but I'm not a fan of exposing the version in the structure name used by the public API.

That's practically compulsory if the structure is to change significantly, which leads me to:

In the likely common case the structure will be extended with things that can be zero.

I'm not really sure if/how setcred() will be extended in the future. I agree that "natural" extensions would be to add more fields controlling more process security attributes (provided we add some more). Another future use case might be to select one set of credentials from an admin-devised stores, where the user can select one among a pre-defined set, in which case however a completely different structure would be called for. I can imagine other scenarios where the passed structure would have to change significantly. However, the probability of such scenarios materializing seems very low as of today. And, if they occur, it might be best to just create another system call.

I chose the current versioning scheme essentially because I think it is the most flexible (and clean for this flexibility), out of caution. It might be going too far and, given also the existing tooling (e.g., see my question regarding _Contains_ptr_) , it might be better to just have a single version of setcred(), and once something quite different is needed we create a new system call. The drawback of this approach is the potential multiplication of related calls with different names, which in itself isn't worse than some stuff we already have, but I would prefer that we stop piling new names for related functionality.

As just an example, we have the old thr_create() and the thr_new() replacement (since 2005), and I have some (not yet submitted to review) code that extends thr_new() to accept POSIX priority specification, using some bits in a flag field to pass a version, and the new version differing by the main structure (struct thr_param) having some of its fields pointing to a different object. Creating another system call in that case feels wrong as thr_new() is the most appropriate name, and versioning by flags (passed in-structure, in that case) actually allows to extend it.

In general, I hate implementing versioning through structure size, I've seen too many ugly hacks done with or because of it. However, for just the addition of some fields to an existing structure, and given that here we also pass flags indicating which structure fields are meaningful, that could be acceptable to me. But is it worth the trouble now given the more involved, but flexible solution has already been coded here?

If you want to ensure initialization when the structure changes, then a pthread_attr style API might be appropriate (...)

That seems a bit overkill since the fields are currently independent (except for sc_supp_groups_nb and sc_supp_groups).

or just encourage designated initializers

Seems the most practical way for "compatible" structure changes. Since access to fields is gated by flags, it's not really necessary, but can improve security slightly (related to your comment in D47621).

137

I can add a field there later, but at the price of having #ifdefs for 64-bit and 32-bit machines, since the field added there for 64-bit ones would have to be added elsewhere on 32-bit ones. Of course, I agree this would start becoming quite messy.

Else, I can declare the pad unconditionally now, wasting 4 bytes on 32-bit archs (which is not a problem, as this system call is not performance-critical and these archs are going to go away).

I guess you meant the latter. Or were you thinking of something else entirely?

olce marked an inline comment as done and an inline comment as not done.Dec 3 2024, 5:42 PM
olce added inline comments.
sys/compat/freebsd32/freebsd32_misc.c
4224–4225

TODO: This line must be conditional on MAC.

brooks requested changes to this revision.Dec 3 2024, 6:05 PM
brooks added inline comments.
sys/compat/freebsd32/freebsd32_misc.c
4193

I would be nice if there was less duplicated code with sys_setcred. My experience suggests there is little chance people will manage to keep them in sync.

sys/compat/freebsd32/syscalls.conf
37 ↗(On Diff #147408)

This shouldn't be needed if syscalls.master has the right _Contains_ tags.

sys/kern/syscalls.master
3347

Not done.

This revision now requires changes to proceed.Dec 3 2024, 6:05 PM

Oops, it looks like our last comments raced.

sys/kern/syscalls.master
3347

_Contains_ptr_ says the thing pointed to (may) contain pointers. If it's possible it contains pointers then special handling is required in (most) compat ABIs.

sys/security/mac/mac_cred.c
223

Ah, less was removed from style than I remembered. I'll propose a cleanup there.

I feel this whole thing is foolish consistency so I don't add the lines and tend to remove nearby ones in passing since they impact the diff, but not blame.

sys/sys/ucred.h
130

I strongly agree about structure size not being a good key (it's something that's awful about ioctl). I just worry you're spelling out versioning prematurely so taking all the costs today for little value. For example, as long as there's no danger of consuming all the flag bits and you reject disallowed bits, a version of zero is implicit so that code adds nothing today beyond pre-reserving the bits. You can always add code for a version later (even using this exact scheme) when that's needed.

Similarly, embedding the ABI/API in the structure name commits us to supporting code (not just old binaries) using that structure until we're willing to make an API break and removing the structure from the public header. That has some value in that we can break the code to force migration (except that rust/go/etc won't follow so it doesn't achieve that goal.)

I don't feel all that strongly about this, it's just different from other things and not an established pattern.

137

An explicit pad is what I meant. FWIW it's actually only i386 that wouldn't pad.

New system calls often retain the old name,

But nested structures are a royal pain to emulate...

Sorry for bad comments... ignore them, and replace them with this:

So about thr_new: Just create a new thr_new and rename the old one. Don't try to make it versioned. That's useless. We have lots of system call numbers and trying to 'repurpose' an old system call in an overly clever way just makes my job at implementing it for bsd-user a lot harder. And it doesn't really help. We've renamed the old system call several times, I suggest you just keep it simple like that.

I'd also consider just making a new system call rather than worrying about versioning this one. Again, it's more code that has to be written by hand and can't be generated easily (I'm in the process of moving bsd-user to being primarily generated from syscalls.master with the annotations). I really don't see any benefit from just having a new system call when the cred struct changes, wrapping the old versions and calling common code over trying to decode the system call versions inline. That will require explicitly written code to emulate.

sys/security/mac/mac_cred.c
223

I'd further suggest that the tone of this exchange is a bit too combative. You don't need to use such strong words to push back. Just remove the line and accept that "common practice" isn't perfectly reflected in style(9) and the point of code reviews is to communicate the imperfect match that will always be there.

sys/sys/ucred.h
140

Is this a pointer to a user structure we later copy in? That's going to be fun to emulate and translate. I'm not a fan of new instances of interfaces like that, but don't see much of a better way to do that.

259

Doesn't this need to be protected by #if __BSD_VISIBLE?

sys/security/mac/mac_cred.c
223

FYI, my proposed change to style(9) is in D47887 and apparently I wasn't the only one surprised by the current state.

sys/sys/ucred.h
140

I suppose you could embed the struct mac, but you'd still have to translate it so probably no gain.

olce marked 8 inline comments as done.
olce edited the summary of this revision. (Show Details)

Remove the explicit version bits in flags and the _v0 suffix for the setcred()'s structure.

Completely fold the native and 32-bit compatibility setcred() implementations into a single one (sys_setcred_common()) to avoid the risk of divergence.

olce marked an inline comment as done.

Remove the blank line at start of functions with no local variables added to MAC.

olce marked 4 inline comments as done.Dec 4 2024, 6:04 PM
In D47618#1091926, @imp wrote:

So about thr_new: Just create a new thr_new and rename the old one. Don't try to make it versioned. That's useless. We have lots of system call numbers and trying to 'repurpose' an old system call in an overly clever way just makes my job at implementing it for bsd-user a lot harder. And it doesn't really help. We've renamed the old system call several times, I suggest you just keep it simple like that.

So I think underneath what you say, and what @brooks said about embedding the version in the API, lies one of the missing pieces of the puzzle to me: That we generally don't commit to preserving the API over major versions, but just provide ABI compatibility. I hadn't fully realized that despite the several examples I already know of. Additionally, providing API compatibility is not really necessary even when there is a need to recompile old programs. Indeed, with ABI compatibility, one can just install an old userland with old headers in a jail, and compile the old program there (this is less practical of course than just compiling on the pristine host, but works).

Is that correct?

For thr_new(), I don't think there is a need to preserve the old type of data that was passed aside for API preservation, so if what I wrote above is true then there's no real need of co-existence and I'll indeed just rename the old syscall and reuse the name.

I'd also consider just making a new system call rather than worrying about versioning this one. Again, it's more code that has to be written by hand and can't be generated easily (I'm in the process of moving bsd-user to being primarily generated from syscalls.master with the annotations). I really don't see any benefit from just having a new system call when the cred struct changes, wrapping the old versions and calling common code over trying to decode the system call versions inline. That will require explicitly written code to emulate.

In the examples I took, if a new system call is needed, then it should be able to coexist in the API with the old one, which rules out having the same name, but I don't think it is a problem as in the same examples the semantics of setcred() would be significantly changed, and that I agree warrants a new name.

I've removed all the versioning.

As for bsd-user (first time I hear about it; searching a bit it seems you're doing some QEMU userland emulation of FreeBSD), I understand your pain. This new version should be more palatable in this respect. Perhaps your problem highlights another potential use case for augmenting the current syscall generation automation to walk syscall argument structures and produce translation functions at different depths, but this is obviously not an overnight project.

sys/security/mac/mac_cred.c
223

FYI, my proposed change to style(9) is in D47887 and apparently I wasn't the only one surprised by the current state.

Thanks! I've answered there.

223

You don't need to use such strong words to push back.

If this is directed to me also, then would you please make these words explicit, as I don't see any?

Just remove the line and accept that "common practice" isn't perfectly reflected in style(9) and the point of code reviews is to communicate the imperfect match that will always be there.

I've essentially answered this for the most part in a comment in D47887. Additionally, how this very style matter had been handled before the creation of D47887 is in itself already ample evidence that such a(n absence of) process does not scale.

I'd further suggest that the tone of this exchange is a bit too combative.

That may just be over-interpretation. My first comment just reflects reality and doesn't read as combative/offensive AFAICT. I don't think @brooks
took it as such (perhaps I'm mistaken?), and myself I have not been offended by stronger words like "foolish consistency" either.

sys/sys/ucred.h
130

You can always add code for a version later (even using this exact scheme) when that's needed.

That's right for the "versioning through flags" part.

Similarly, embedding the ABI/API in the structure name commits us to supporting code (not just old binaries) using that structure until we're willing to make an API break and removing the structure from the public header. That has some value in that we can break the code to force migration (except that rust/go/etc won't follow so it doesn't achieve that goal.)

I agree, except I don't think that what Rust or Go is doing can take away that value. If the version is embedded in the structure name, then we are free to release new versions as we see fit without breaking the API for consumers that have not evolved, as all these old and new APIs can coexist, and we can remove the old ones again as we see fit. In the other case, application code will automatically use the latest API version, which then has to evolve in a compatible way (and then, it is harder to spot breakage as it may not be visible at compile time).

Anyway, I'll back out all the versioning for now as explained in the answer to @imp's similar concern in my main comment.

137

FWIW it's actually only i386 that wouldn't pad.

Sorry, I don't understand. Why is i386 special here, e.g., compared to some ARM 32-bit platform? If I'm not mistaken, all fields of struct setcred have same length and offset on all 32-bit platforms on one side, and on all 64-bit platforms on the other, and between these two sides only the last two fields (pointers) differ, both in size and alignment.

140

Is this a pointer to a user structure we later copy in?

Yes. I don't see a better way either at the moment.

259

Given setcred() is not standard, I'd say yes. Changed.

olce marked an inline comment as done.Dec 4 2024, 6:05 PM

Looks good! A few quibbles (mostly things that would make things easier for use in CheriBSD), but nothing mandatory.

sys/kern/kern_prot.c
501–507

This should likely be next to sys_setcred.

554

I'd probably name this user_setcred. There's a small amount of in tree precedence and quite a lot of these in CheriBSD.

You don't have to do this as it doesn't effect FreeBSD today, but rather than bool is_32bit, I'd make this function take two function pointers. One for each of the copyin cases. In FreeBSD today this doesn't matter, but with CHERI we have additional ABIs to support so a bool won't do the job.

592

Why not unconditionally PTRIN_CP and then poise the umac assignment out. There's no real value in eliding one load/store pair in the non-MAC case (and the compiler should do it anyway if it sees the store is dead due to the later NULL assignment.)

sys/sys/ucred.h
137

You're right. I was incorrectly thinking of the pointer as 64-bit so it is indeed all 32-arches.

This revision is now accepted and ready to land.Dec 4 2024, 6:32 PM

Looks good! A few quibbles (mostly things that would make things easier for use in CheriBSD), but nothing mandatory.

Sure. I'll handle that (tomorrow, it's quite late here now).

sys/kern/kern_prot.c
501–507

Sure, that was unintentional.

In the final commit, I'll probably move sys_setcred_common()/user_setcred() after kern_setcred(), but just didn't do it here so that you can compare versions more easily within Phabricator.

554

I'd probably name this user_setcred. There's a small amount of in tree precedence and quite a lot of these in CheriBSD.

Oh great, I wasn't aware, and was wondering how I should establish a precedent. AFAIU, kern_XXX() functions are to only get arguments that point to kernel structures, and thus should not include copyin()/copyout() code. The wrappers around kern_setcred() doing copyin()/copyout() were mostly duplicate code, and it was indeed best to factorize the code somewhere (even if it leads to slightly more spaghetti). I didn't have any clue about that "somewhere", so just came up with something ad hoc.

I'll look into it.

You don't have to do this as it doesn't effect FreeBSD today, but rather than bool is_32bit, I'd make this function take two function pointers. One for each of the copyin cases. In FreeBSD today this doesn't matter, but with CHERI we have additional ABIs to support so a bool won't do the job.

I see, but I'm not sure if I completely understand the implications. I can propose something, but would rather do it in a separate revision once the full series here has been committed.

592

I could (still using PTRIN(), as wcred.sc_label is set to NULL below, before being set to the read-in value in the MAC case only), but isn't it better for the reader's understanding if MAC-specific code is clearly delineated as such? Also, if doing that, could we get a warning from the compiler saying that umac is assigned to but never read?

sys/kern/kern_prot.c
592

My thought is that you'd populate wcred unconditionally and then you'd change the bit below the if block to:

#ifdef MAC
umac = wcred.sc_label;
#endif
wcred.sc_label = NULL; /* Defensive measure on !MAC */

I'm not sure I'd keep the comment.

olce marked 3 inline comments as done.

Apply suggestions. Fix some whitespace issues. Remove #if __BSD_VISIBLE.

This revision now requires review to proceed.Dec 5 2024, 8:32 PM

Fix the public setcred() prototype.

sys/kern/kern_prot.c
592

OK, didn't understand initially. I've reworked the comment a bit, but prefer to keep it to ensure the corresponding line is not deleted.

sys/sys/ucred.h
259

I've removed this addition for the time being. That change I made was incomplete (if setcred() is gated, then so should struct setcred), and in the end I thought that gating with __BSD_VISIBLE was not really necessary as the whole sys/ucred.h header is non-standard anyway. Feel free to correct me if this is not the expected practice (I'm not too sure about visibility matters).

brooks added inline comments.
sys/kern/kern_prot.c
612

Might be easier to read without the curly braces

This revision is now accepted and ready to land.Dec 5 2024, 10:26 PM
olce marked an inline comment as done.Dec 6 2024, 8:17 AM
olce added inline comments.
sys/kern/kern_prot.c
612

Funny you say that, as I normally don't put braces around a single line. I still did it here for consistency with the braces after the else, which I think are some kind of useful visual hint that the code after the #endif is syntactically tied to the code before it, which might be overlooked. Unless removing the latter also, I feel it's more readable to keep braces for the if as well.

sys/kern/kern_prot.c
612

The #if makes it a bit awkward but IMO leaving the braces out is more clear, e.g.

#ifdef MAC
        if ((flags & SETCREDF_MAC_LABEL) != 0) {

#ifdef COMPAT_FREEBSD32
                if (is_32bit) 
                        error = mac_label_copyin32(umac, &mac, NULL);
                else
#endif           
                        error = mac_label_copyin(umac, &mac, NULL);

                if (error != 0)
                        goto free_groups;
                wcred.sc_label = &mac;
        }
#endif
olce marked 2 inline comments as done.Dec 6 2024, 2:18 PM
olce added inline comments.
sys/kern/kern_prot.c
612

That was my initial version. Noting it is fine to do that. Changed in my tree.

olce marked an inline comment as done.

Committed as ddb3eb4efe55e57c206f3534263c77b837aff1dc.