This revision is part of a series. Click on the Stack tab below to see the context.
This series has also been squeezed into D47633 to provide an overall view.
Commit message:
TL;DR:
Now monitor setcred() calls, and reject or grant them according to the
new rules specification.
Drop monitoring setuid() and setgroups(). As previously explained in
the commit introducing the setcred() system call, MAC/do must know the
entire new credentials while the old ones are still available to be able
to approve or reject the requested changes. To this end, the chosen
approach was to introduce a new system call, setcred(), instead of
modifying existing ones to be able to participate in a "prepare than
commit"-like protocol.
The MAC framework typically calls several hooks of its registered
policies as part of the privilege checking/granting process. Each
system call calls some dedicated hook early, to which it usually passes
the same arguments it received, whose goal is to forcefully deny access
to the functionality when needed (i.e., a single deny by any policy
globally denies the access). Then, the system call usually calls
priv_check() or priv_check_cred() an unspecified number of times, each
of which may trigger calls to two generic MAC hooks. The first such
call is to mac_priv_check(), and always happens. Its role is to deny
access early and forcefully, as for system calls' dedicated early hooks.
The second, mac_priv_grant(), is called only if the priv_check*() and
prison_priv_check() generic code doesn't handle the request by itself,
i.e., doesn't explicitly grant access (to the super user, or to all
users for a few specific privileges). It allows any single policy to
grant the requested access (regardless of whether the other policies do
so or not).
MAC/do only has an effect on processes spawned from the '/usr/bin/mdo'
executable. It implements all setcred() hooks, called via
mac_cred_setcred_enter(), mac_cred_check_setcred() and
mac_cred_setcred_exit(). In the first one, implemented in
mac_do_setcred_enter(), it checks if MAC/do has to apply to the current
process, allocates (or re-uses) per-thread data to be later used by the
other hooks (those of setcred() and the mac_priv_grant() one, called by
priv_check*()) and fills them with the current context (the rules to
apply). This is both because memory allocations cannot be performed
while holding the process lock and to ensure that all hooks called by
a single setcred() see the same rules to apply (not doing this would be
a security hazard as rules are concurrently changed by the
administrator, as explained in more details below). In the second one
(implemented by mac_do_check_setcred()), it checks for forbidden or
mandatory supplementary groups according to applicable rules, and if
present denies access (which takes precedence over any other MAC policy
that would want to grant the request). If it doesn't deny access, it
stores in MAC/do's per-thread data the new credentials. Indeed, the
next MAC/do's hook implementation to be called, mac_do_priv_grant()
(implementing the mac_priv_grant() hook) must have knowledge of the new
credentials that setcred() wants to install in order to validate them
(or not), which the MAC framework can't provide as the priv_check*() API
only passes the current credentials and a specific privilege number to
the mac_priv_check() and mac_priv_grant() hooks. By contrast, the very
point of MAC/do is to grant or deny the privilege of changing
credentials not only based on the current ones but also on the
seeked-for ones.
This split of checks between two MAC hooks is a consequence of the
necessity to preserve MAC policies composition (i.e., using several of
them at once in a compatible, predictable way, as foreseen when
establishing the MAC framework itself). Both impose further
restrictions on MAC/do's design. Because MAC/do's rules are tied to
jails, accessing the current rules requires holding the corresponding
jail's lock. As other policies might try to grab the same jail's lock,
it is not possible to keep the rules' jail's lock between
mac_do_setcred_enter() and mac_do_priv_grant() to ensure consistency of
the checks. But the latter is necessary as rules can concurrently
change: If newly installed rules start to deny some specific changes,
and some thread is past the mac_cred_check_setcred() hook but before the
mac_priv_grant() one, the latter may grant some privileges that should
have been rejected first by the former (depending on the content of
user-supplied rules). To this end, we have augmented 'struct rules'
with a reference count, and its lifecyle is now decoupled from being
referenced or not by a jail. As a thread enters setcred_enter(), it
grabs a hold on the current rules and keeps a pointer to them in the
per-thread data, ensuring that all hooks have a consistent view of rules
to apply. In its mac_do_setcred_exit(), MAC/do just "frees" the
per-thread data, in particular by dropping the referenced rules (we
wrote "frees" within guillemets, as in fact the per-thread structure is
reused, and only freed when a thread exits or the module is unloaded).
The separate definition of 'struct mac_do_data_header' may seem odd, as
it is only used in 'struct mac_do_setcred_data'. It is a remnant of an
earlier version that was not using setcred(), but rather implemented
hooks for setuid() and setgroups(). We however kept it, as it clearly
separates the machinery to pass data from dedicated system call hooks to
priv_grant() from the actual data that MAC/do needs to monitor a call to
setcred() specifically. It may be useful in the future if we evolve
MAC/do to also grant privileges through other system calls (each seen as
a complete credentials transition on its own).
The target supplementary groups are checked with merge-like algorithms
leveraging the fact that all supplementary groups in credentials
('struct ucred') and in each rule ('struct rule') are sorted, avoiding
to start a binary search for each considered GID which is asymptotically
more costly.