commit 998a1c313a986dc7c7457d0b5369381f65fa1330 Author: Mateusz Guzik <mjg@FreeBSD.org> Date: Wed Mar 22 20:42:04 2023 +0000 cred: convert the refcount from int to long On 64-bit platforms this sorts out worries about mitigating bugs which overflow the counter without pessimizng anything, most notably it avoids whacking per-thread operation in favor of refcount(9) API. The struct already had two instances of 4 byte padding with 256 bytes in size, cr_flags gets moved around to avoid growing it. 32-bit platforms could also get the extended counter, but I did not do it as one day(tm) the mutex protecting centralized operation should be replaced with atomics and 64-bit ops on 32-bit platforms remain quite penalizing. While worries of counter overflow are addressed, the following is not: - counter *underflows* - buffer overruns from adjacent allocations - UAF due to stale cred pointer As such, while lipstick was placed, the pig should not be participating in any beauty pageants. Reviewed by: Differential Revision: commit ceb8f401fcc5d956b9b92cff6aa6946a932f48bf Author: Mateusz Guzik <mjg@FreeBSD.org> Date: Wed Mar 22 21:44:55 2023 +0000 cred: make ref signed There are asserts on the count being > 0, but they are less useful than they can be because the type itself is unsigned. The kernel is compiled with -frapv, making wraparound perfectly defined.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sys/kern/kern_prot.c | ||
---|---|---|
1854 | Please use __SIZEOF_LONG__ or some other similar variable for these; CHERI architectures are neither ILP32 nor LP64 but the native integer size is still 32/64-bit depending on the base architecture (only ever 64-bit in CheriBSD). Though really for the formats I'd be inclined to say just use intmax_t, it's not performance-critical and involves (undocumented) tight coupling between here and ucred.h. Though why not just use long itself? | |
sys/sys/ucred.h | ||
64 | This was previously unsigned, is signed intended? The summary doesn't make any reference to this change. |
I'm going to have to sleep on whether to do 64 bit everywhere.
Will post an updated review one way or the other tomorrow.
I have no objection, though I suspect that you cannot change the layout of ucred on stable branches.
Not handling underflow seems like a pretty basic weakness. I don't understand why we can't simply check for over/underflow when manipulating the counter. Yes, it has non-zero cost, but we do similar checking for all refcount adjustments already.
sys/kern/kern_prot.c | ||
---|---|---|
1863 | I see crcowget() is called during thread creating. Can we possibly reach 2^32 threads? That sounds horrible. Maybe user want be warned when tuning and kern.maxproc * kern.threads.max_threads_per_proc exceed 2^32 . | |
1870 |
+1 for that. So we might assert a overflow on increase. KASSERT(cr->cr_ref > 0, ("overflow")); And bench on a machine with huge amount of resources. | |
2058 | And assert an underflow on decrease. KASSERT(cr->cr_ref > 0, ("will underflow")); |
And bench on a machine with huge amount of resources.
So we have such a REAL machine with 4096 cpu cores and 8 hardware threads per core, and run average 8192 soft threads per hardware thread, then this machine will have total ~ 2^28 threads. Then a 32bit u_int is still sufficient.
I doubt we will have such machine and real usage in years.
No, but if someone wants this in stable, they can use the spare fields for the new ref.
Not handling underflow seems like a pretty basic weakness. I don't understand why we can't simply check for over/underflow when manipulating the counter. Yes, it has non-zero cost, but we do similar checking for all refcount adjustments already.
How the refcount API could possibly address it? You drop what it sees like the last reference and past that the object is getting freed. Subsequent attempts at manipulating the counter are pretty UB. There is no leeway here.
At best some of the asserts could be converted into runtime checks, but then again the refcount API has no advantage over any of it and has the disadvantage of not scaling.
sys/kern/kern_prot.c | ||
---|---|---|
1870 |
I don't quite understand the statement by summing up all the counts from all threads . As both cr->cr_users and cr->cr_ref is mutex protected, it is impossible that cr->cr_ref got touched by other threads (if they follows the order lock and write), so I think the overflow (of cr_ref) can be safely and stably caught. Am I misreading? That is indeed not enough as I see crunuse() and crunusebatch() also increase cr_ref. | |
1883 | KASSERT(cr->cr_ref >= td->td_ucredref, ("overflow")); | |
1911 | KASSERT(cr->cr_ref >= ref, ("overflow")); |
sys/kern/kern_prot.c | ||
---|---|---|
1870 | i refer you to the comment starting with 'Credential management' in the file |
Certainly it cannot be addressed in general, but there are specific bug patterns which can be mitigated this way. For instance:
- A thread decrements the reference count to zero but erroneously fails to invoke the object's destructor and free it. If something else later tries to manipulate the counter, it'll stay frozen.
- A thread decrements an object refcount to zero and invokes a destructor, which releases other references and invokes other destructors, one of which incorrectly tries to release a reference on the original object (e.g., because some code path incorrectly treats a weak (i.e., not-counted) reference as a strong reference. In this scenario, we'd avoid a double free of the original object.
Such a mitigation might not help in the face of some particular bug, but it's all UB once buggy refcount handling triggers a double free or a UAF anyway.