The umtxq_requeue() changes the key and umtxq_sleep() loads it again.
However, this load isn't synchronized by the mutex, so we can read out
an old key. Close the race by performing a re-check with the lock.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped - Build Status
Buildable 51986 Build 48877: arc lint + arc unit
Event Timeline
I'm not sure this patch is the best, but the issue is real and the patch closes it. The problem easily reproduces playing CS:GO.
Alternative patch, that doesn't use goto, can look like this:
@@ -832,10 +837,21 @@ umtxq_sleep(struct umtx_q *uq, const char *wmesg, if (error != 0) break; } - error = msleep_sbt(uq, &uc->uc_lock, PCATCH | PDROP, wmesg, + error = msleep_sbt(uq, &uc->uc_lock, PCATCH, wmesg, sbt, 0, flags); - uc = umtxq_getchain(&uq->uq_key); - mtx_lock(&uc->uc_lock); + if ((nuc = umtxq_getchain(&uq->uq_key)) != uc) { + /* + * If umtxq_requeue() has changed the key we need to + * change the lock. To load the new key value we need + * to hold the old key. The lock order is the same as + * in linux_futex_requeue(): first old, then new. + */ + mtx_lock(&nuc->uc_lock); + mtx_unlock(&uc->uc_lock); + uc = nuc; + counter_u64_add(umtxq_races, 1); + }
Testing it now with event counter. The old patch has already been tested successfully.
It looks like someone woke up a slept thread while requeue() run, perhaps thread got a signal?
I think this should be fixed in other way, i.e., in requeue() to ensure that uq_key is stable
it would be nice to show panic stack trace and add markj@ or kib@ to reviewers
This is better, it guarantees that the thread will not wake up during requeue time, just lock order most likely should be reverse
This patch locks the new lock before unlocking the old. I wonder what is the lock order policy there and whether it can cause deadlock?
Good question. The lock order in the umtxq_requeue() is first old lock, then new lock. We follow that. However, is it possible that two requeue events happen while we exit from sleep? If so, then the second event can swap the lock order. This is one reason I prefer the first patch, not the second.
The other reason is that with the second alternative patch we are doing extra work of double locking in umtxq_sleep() every time umtxq_requeue() happened. With the first patch the actual race happens ~ 1/1000 times, so there is not performance impact if we care.
What I immediately do not like in the patch is that it seems to allow unbound looping in kernel. If requeueing is done in step with uc_lock, potentially user can control how long the thread spins.
Similar issue with ll/sc took a lot of efforts to solve, I do not want to get more such code added. Why not check for errors before retrying?
I don't understand how this is going to help. The lock may change independent of what msleep returned. And we return from the function holding wrong lock.
I mean, change the lock and check the break-out conditions before looping again. In fact, I do not understand why do you need to loop there (retrylock), but not in all other places that take the uc_lock.
Sorry for delay!
That will overcomplicate the code, IMHO. The race happens not so often that its resolution needs to be optimized.
In fact, I do not understand why do you need to loop there (retrylock), but not in all other places that take the uc_lock.
That's a good question! I think we don't panic anywhere else (yet) cause there are no other places like umtx_sleep() that take uq as argument. Most take key as argument so the value stays stable through the function runtime. That prevents panicing with unlocking wrong mutex, but probably leaves other races in. For example, umtxq_busy() looks suspicious to me. The msleep would drop the lock and sleep on it. What if the lock changes? Is it not going to wakeup then?
I do not think so, but this is probably moot. See below.
In fact, I do not understand why do you need to loop there (retrylock), but not in all other places that take the uc_lock.
That's a good question! I think we don't panic anywhere else (yet) cause there are no other places like umtx_sleep() that take uq as argument. Most take key as argument so the value stays stable through the function runtime. That prevents panicing with unlocking wrong mutex, but probably leaves other races in. For example, umtxq_busy() looks suspicious to me. The msleep would drop the lock and sleep on it. What if the lock changes? Is it not going to wakeup then?
In fact, I do not quite understand how this requeue is useful. Typical umtx blocking operation loops over umtxq_sleep(), and there it re-lookup the key on each iteration. Look at the simplest example of do_lock_normal().
If not that, I would say that the right fix is to delegate requeueing to the victim thread, by recording the indicator that changing of the key is needed by umtxq_requeue(), and doing wakeup. Then the umtxq_sleep() code checks for this bit after wakeup and goes to sleep with the new chain. The wakeup code would need to be fixed too, to remove requeue indicator, otherwise wakeups could be lost.
What's the path forwards for this? This is still fixing and outstanding bug for me but requires manual effort to apply.
I'll note that qemu has a number of hangs in umtx code... I wonder if the emulation changes the timing...
I'll have to see if this band-aide helps that at all and report back.
Since I filed the patch two things happened:
- Steam again stopped working CURRENT (at least for me)
- CS:GO was superseded by CS2. Anybody knows does it work on FreeBSD with linux-steam-utils?
Since playing Counter Strike is a low priority thing for me :) and also @kib has valid concerns on the patch, the patch still sits in TODO list.
Dan, what do you use patch for? What software and on what FreeBSD version exercises the bug we address here?
Re: Warner, the bug manifests itself only in the Linux compat layer. AFAIK, QEMU won't use umtx_requeue() as long as it is native.
Sorry, been MIA for a bit with $WORK
Whatever issue I had seems to have been overcome. It was also Steam related, but The Witcher 3.
The issue hadn't been overcome, it is just hidden. In CS:GO it was only two maps where it reproduced quickly :) The problem is that we don't have ready tools to write a reproducing code, because code needs to be a Linux code.
Is this possibly the same bug reported here? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280667
Kostik has had a valid concern about the patch. It is a big improvement but not a bullet proof fix. Since we don't have a reproducer, it is really hard to work on a proper fix. If anybody is ready to take responsibility, I would agree on pushing this to main and merging to stable/14. Otherwise it is just taking too long and already affected number of people.