Page MenuHomeFreeBSD

jail: Avoid a use-after-free when destroying jails
ClosedPublic

Authored by markj on Dec 8 2024, 7:37 PM.
Tags
None
Referenced Files
F107347730: D47992.diff
Sun, Jan 12, 8:29 PM
Unknown Object (File)
Mon, Jan 6, 11:35 PM
Unknown Object (File)
Thu, Dec 26, 10:11 AM
Unknown Object (File)
Thu, Dec 26, 3:26 AM
Unknown Object (File)
Thu, Dec 26, 3:20 AM
Unknown Object (File)
Thu, Dec 26, 3:07 AM
Unknown Object (File)
Wed, Dec 25, 10:30 PM
Unknown Object (File)
Wed, Dec 25, 5:54 PM

Details

Summary

prison_deref() and prison_deref_kill() have to handle the case where
destruction of a jail will release the final reference on the jail's
parent, resulting in destruction of the parent jail. They thus maintain
a list of jails whose references have gone away; the loop at the end of
prison_deref() then goes through the list and deallocates resources
associated with each jail. In particular, if a jail's VNET is not the
same as that of its parent, this loop destroys the VNET.

Suppose prison_deref() removes the last reference on a jail, releasing a
reference to its parent and causing the jail to be placed in the
"freeprison" list. Suppose then that the parent jail is destroyed
before the "freeprison" list is processed. When destroying the
now-orphaned child jail, prison_deref() derefences its parent to see
whether the child jail's VNET needs to be freed, but if this race
occurs, this is a use-after-free.

Fix the problem by introducing a flag which determines whether a jail's
VNET is to be destroyed when the jail is destroyed.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 61049
Build 57933: arc lint + arc unit

Event Timeline

markj requested review of this revision.Dec 8 2024, 7:37 PM
sys/kern/kern_jail.c
3213–3215

Perhaps it'd be a good idea to set pr->pr_parent = NULL here?

Suppose then that the parent jail is destroyed before the "freeprison" list is processed

I think that is not possible. Child jails should be always freed / destroyed prior to parent jail. See the above logic

	TAILQ_INSERT_TAIL(&freeprison, pr, pr_list);
...
	pr = pr->pr_parent;

So the freeprison list will have correct order, that child jails come first then parent last, if parent jail's pr_uref reach 0.

sys/kern/kern_jail.c
3239

Maybe we want to asserting that the parent jail is still alive, or in this freeprison list.

Suppose then that the parent jail is destroyed before the "freeprison" list is processed

I think that is not possible. Child jails should be always freed / destroyed prior to parent jail. See the above logic

	TAILQ_INSERT_TAIL(&freeprison, pr, pr_list);
...
	pr = pr->pr_parent;

So the freeprison list will have correct order, that child jails come first then parent last, if parent jail's pr_uref reach 0.

In the race that I hit, freeprison contains only the child jail, freed by a call to jail_remove(2). The parent jail is freed by a different thread.

The bug is that the freeprison loop assumes that each jail's parent is still live, but by that point there is no reference providing that guarantee - it has already been dropped.

sys/kern/kern_jail.c
3213–3215

Yes. Since a race could invalidate that parent, best to indicate it shouldn't be trusted.

3239

I'm not sure how that can be done, since it's a potentially dangling pointer, except by searching the entire prison list (which means locking allprison_lock).

Suppose then that the parent jail is destroyed before the "freeprison" list is processed

I think that is not possible. Child jails should be always freed / destroyed prior to parent jail. See the above logic

	TAILQ_INSERT_TAIL(&freeprison, pr, pr_list);
...
	pr = pr->pr_parent;

So the freeprison list will have correct order, that child jails come first then parent last, if parent jail's pr_uref reach 0.

In the race that I hit, freeprison contains only the child jail, freed by a call to jail_remove(2). The parent jail is freed by a different thread.

Emm, I dislike concurrent destroying. It makes the lifecycle of jail even more complex ;-)

The bug is that the freeprison loop assumes that each jail's parent is still live, but by that point there is no reference providing that guarantee - it has already been dropped.

I see. Should be safe to go.

This revision is now accepted and ready to land.Dec 9 2024, 5:03 AM
sys/kern/kern_jail.c
3213–3215

+1 since no need to reference pr->pr_parent.

sys/sys/jail.h
237

Or, equivalent, prison has a different / dedicated vnet :)

markj marked 4 inline comments as done.

Unset the parent pointer after releasing the child reference.

This requires a modification to FOREACH_PRISON_DESCENDANT_PRE_POST in order to
avoid dereferencing a null pointer in prison_deref_kill(). The change is a bit
ugly, maybe a better solution is possible?

This revision now requires review to proceed.Dec 9 2024, 3:19 PM
sys/kern/kern_jail.c
3341

I would rather not set pr_parent to NULL here. While it's true that later on, in the final cleanup loop of prison_deref, we don't want to depend on pr_parent's value, it does have a use in FOREACH_PRISON_DESCENDANT_PRE_POST.

Given the choice between leaving a non-NULL pointer that shouldn't be used later and prematurely clearing a pointer that should be used, I'd rather keep the useful part around. Then FOREACH_PRISON_DESCENDANT_PRE_POST can remain as it is.

Avoid modifying prison_deref_kill().

markj added inline comments.
sys/kern/kern_jail.c
3341

Ok, I was on the fence about this as well. Updated.

markj marked an inline comment as done.

The new PR_VNET_ROOT flag is unneeded: it's equivalent to PR_VNET, just check
that instead.

sys/kern/kern_jail.c
1690–1696

If you have a strong belief that the question mark belongs with the "test" part instead of the "if" part, go ahead an change the dozen or so places where I have that, but otherwise...

Revert unintended style change.

The new PR_VNET_ROOT flag is unneeded: it's equivalent to PR_VNET, just check
that instead.

Not quite, I'm afraid. This is true once a prison has been created, but errors during the creation process can break that. pr_vnet is set when a new jail is created, but the flags are not actually set until after some parameter checking and other error opportunities. Chances are, this just requires setting the prison's PR_VNET flag at the same time the vnet is created (still saving other flags for later).

markj marked an inline comment as done.

The new PR_VNET_ROOT flag is unneeded: it's equivalent to PR_VNET, just check
that instead.

Not quite, I'm afraid. This is true once a prison has been created, but errors during the creation process can break that. pr_vnet is set when a new jail is created, but the flags are not actually set until after some parameter checking and other error opportunities. Chances are, this just requires setting the prison's PR_VNET flag at the same time the vnet is created (still saving other flags for later).

I see. I think you're right that we can set PR_VNET earlier, but I need to take a close look.

sys/kern/kern_jail.c
1690–1696

style(9) indicates that operators should precede a line break, so I think your way is technically wrong. But, there's no example there to confirm my belief and I didn't mean to change it anyway.

markj marked an inline comment as done.

Set PR_VNET earlier, so that prison_deref() won't leak the VNET if an
error occurs during jail initialization.

@jamie , I wonder if you had any further comments on the patch?

No further comments, except sorry for taking so long to mention that.

This revision is now accepted and ready to land.Mon, Jan 6, 10:40 PM