Page MenuHomeFreeBSD

umtx: close race between umtxq_sleep() and umtxq_requeue()
AcceptedPublic

Authored by glebius on Jun 9 2023, 11:05 PM.
Tags
None
Referenced Files
F108309231: D40481.id123051.diff
Thu, Jan 23, 6:29 PM
Unknown Object (File)
Fri, Jan 17, 1:55 PM
Unknown Object (File)
Mon, Dec 30, 8:52 PM
Unknown Object (File)
Dec 21 2024, 3:43 PM
Unknown Object (File)
Dec 11 2024, 12:30 AM
Unknown Object (File)
Dec 9 2024, 2:46 AM
Unknown Object (File)
Nov 18 2024, 12:01 PM
Unknown Object (File)
Oct 17 2024, 11:24 AM

Details

Summary

The umtxq_requeue() changes the key and umtxq_sleep() loads it again.
However, this load isn't synchronized by the mutex, so we can read out
an old key. Close the race by performing a re-check with the lock.

Fixes: fd6ca665d206b74970e7c01d06ae06fed71500fc

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 51986
Build 48877: arc lint + arc unit

Event Timeline

I'm not sure this patch is the best, but the issue is real and the patch closes it. The problem easily reproduces playing CS:GO.

Alternative patch, that doesn't use goto, can look like this:

@@ -832,10 +837,21 @@ umtxq_sleep(struct umtx_q *uq, const char *wmesg,
                        if (error != 0)
                                break;
                }
-               error = msleep_sbt(uq, &uc->uc_lock, PCATCH | PDROP, wmesg,
+               error = msleep_sbt(uq, &uc->uc_lock, PCATCH, wmesg,
                    sbt, 0, flags);
-               uc = umtxq_getchain(&uq->uq_key);
-               mtx_lock(&uc->uc_lock);
+               if ((nuc = umtxq_getchain(&uq->uq_key)) != uc) {
+                       /*
+                        * If umtxq_requeue() has changed the key we need to
+                        * change the lock.  To load the new key value we need
+                        * to hold the old key.  The lock order is the same as
+                        * in linux_futex_requeue(): first old, then new.
+                        */
+                       mtx_lock(&nuc->uc_lock);
+                       mtx_unlock(&uc->uc_lock);
+                       uc = nuc;
+                       counter_u64_add(umtxq_races, 1);
+               }

Testing it now with event counter. The old patch has already been tested successfully.

The second patch definitely does some unnecessary relocking, but doesn't use goto.

It looks like someone woke up a slept thread while requeue() run, perhaps thread got a signal?
I think this should be fixed in other way, i.e., in requeue() to ensure that uq_key is stable
it would be nice to show panic stack trace and add markj@ or kib@ to reviewers

I'm not sure this patch is the best, but the issue is real and the patch closes it. The problem easily reproduces playing CS:GO.

Alternative patch, that doesn't use goto, can look like this:

@@ -832,10 +837,21 @@ umtxq_sleep(struct umtx_q *uq, const char *wmesg,
                        if (error != 0)
                                break;
                }
-               error = msleep_sbt(uq, &uc->uc_lock, PCATCH | PDROP, wmesg,
+               error = msleep_sbt(uq, &uc->uc_lock, PCATCH, wmesg,
                    sbt, 0, flags);
-               uc = umtxq_getchain(&uq->uq_key);
-               mtx_lock(&uc->uc_lock);
+               if ((nuc = umtxq_getchain(&uq->uq_key)) != uc) {
+                       /*
+                        * If umtxq_requeue() has changed the key we need to
+                        * change the lock.  To load the new key value we need
+                        * to hold the old key.  The lock order is the same as
+                        * in linux_futex_requeue(): first old, then new.
+                        */
+                       mtx_lock(&nuc->uc_lock);
+                       mtx_unlock(&uc->uc_lock);
+                       uc = nuc;
+                       counter_u64_add(umtxq_races, 1);
+               }

Testing it now with event counter. The old patch has already been tested successfully.

This is better, it guarantees that the thread will not wake up during requeue time, just lock order most likely should be reverse

Alternative patch, that doesn't use goto, can look like this:

This patch locks the new lock before unlocking the old. I wonder what is the lock order policy there and whether it can cause deadlock?

This revision is now accepted and ready to land.Jun 12 2023, 4:22 PM
In D40481#922105, @mav wrote:

Alternative patch, that doesn't use goto, can look like this:

This patch locks the new lock before unlocking the old. I wonder what is the lock order policy there and whether it can cause deadlock?

Good question. The lock order in the umtxq_requeue() is first old lock, then new lock. We follow that. However, is it possible that two requeue events happen while we exit from sleep? If so, then the second event can swap the lock order. This is one reason I prefer the first patch, not the second.

The other reason is that with the second alternative patch we are doing extra work of double locking in umtxq_sleep() every time umtxq_requeue() happened. With the first patch the actual race happens ~ 1/1000 times, so there is not performance impact if we care.

What I immediately do not like in the patch is that it seems to allow unbound looping in kernel. If requeueing is done in step with uc_lock, potentially user can control how long the thread spins.

Similar issue with ll/sc took a lot of efforts to solve, I do not want to get more such code added. Why not check for errors before retrying?

In D40481#922256, @kib wrote:

What I immediately do not like in the patch is that it seems to allow unbound looping in kernel. If requeueing is done in step with uc_lock, potentially user can control how long the thread spins.

Similar issue with ll/sc took a lot of efforts to solve, I do not want to get more such code added. Why not check for errors before retrying?

What errors?

In D40481#922256, @kib wrote:

What I immediately do not like in the patch is that it seems to allow unbound looping in kernel. If requeueing is done in step with uc_lock, potentially user can control how long the thread spins.

Similar issue with ll/sc took a lot of efforts to solve, I do not want to get more such code added. Why not check for errors before retrying?

What errors?

Errors from msleep that break out from the outer loop.

In D40481#922261, @kib wrote:

Errors from msleep that break out from the outer loop.

I don't understand how this is going to help. The lock may change independent of what msleep returned. And we return from the function holding wrong lock.

I propose to use an alternative patch with busy/unbusy second lock

In D40481#922261, @kib wrote:

Errors from msleep that break out from the outer loop.

I don't understand how this is going to help. The lock may change independent of what msleep returned. And we return from the function holding wrong lock.

I mean, change the lock and check the break-out conditions before looping again. In fact, I do not understand why do you need to loop there (retrylock), but not in all other places that take the uc_lock.

Sorry for delay!

In D40481#922309, @kib wrote:

I mean, change the lock and check the break-out conditions before looping again.

That will overcomplicate the code, IMHO. The race happens not so often that its resolution needs to be optimized.

In fact, I do not understand why do you need to loop there (retrylock), but not in all other places that take the uc_lock.

That's a good question! I think we don't panic anywhere else (yet) cause there are no other places like umtx_sleep() that take uq as argument. Most take key as argument so the value stays stable through the function runtime. That prevents panicing with unlocking wrong mutex, but probably leaves other races in. For example, umtxq_busy() looks suspicious to me. The msleep would drop the lock and sleep on it. What if the lock changes? Is it not going to wakeup then?

Sorry for delay!

In D40481#922309, @kib wrote:

I mean, change the lock and check the break-out conditions before looping again.

That will overcomplicate the code, IMHO. The race happens not so often that its resolution needs to be optimized.

I do not think so, but this is probably moot. See below.

In fact, I do not understand why do you need to loop there (retrylock), but not in all other places that take the uc_lock.

That's a good question! I think we don't panic anywhere else (yet) cause there are no other places like umtx_sleep() that take uq as argument. Most take key as argument so the value stays stable through the function runtime. That prevents panicing with unlocking wrong mutex, but probably leaves other races in. For example, umtxq_busy() looks suspicious to me. The msleep would drop the lock and sleep on it. What if the lock changes? Is it not going to wakeup then?

In fact, I do not quite understand how this requeue is useful. Typical umtx blocking operation loops over umtxq_sleep(), and there it re-lookup the key on each iteration. Look at the simplest example of do_lock_normal().

If not that, I would say that the right fix is to delegate requeueing to the victim thread, by recording the indicator that changing of the key is needed by umtxq_requeue(), and doing wakeup. Then the umtxq_sleep() code checks for this bit after wakeup and goes to sleep with the new chain. The wakeup code would need to be fixed too, to remove requeue indicator, otherwise wakeups could be lost.

What's the path forwards for this? This is still fixing and outstanding bug for me but requires manual effort to apply.

I'll note that qemu has a number of hangs in umtx code... I wonder if the emulation changes the timing...
I'll have to see if this band-aide helps that at all and report back.

What's the path forwards for this? This is still fixing and outstanding bug for me but requires manual effort to apply.

Since I filed the patch two things happened:

  • Steam again stopped working CURRENT (at least for me)
  • CS:GO was superseded by CS2. Anybody knows does it work on FreeBSD with linux-steam-utils?

Since playing Counter Strike is a low priority thing for me :) and also @kib has valid concerns on the patch, the patch still sits in TODO list.

Dan, what do you use patch for? What software and on what FreeBSD version exercises the bug we address here?

Re: Warner, the bug manifests itself only in the Linux compat layer. AFAIK, QEMU won't use umtx_requeue() as long as it is native.

Sorry, been MIA for a bit with $WORK

Whatever issue I had seems to have been overcome. It was also Steam related, but The Witcher 3.

Whatever issue I had seems to have been overcome. It was also Steam related, but The Witcher 3.

The issue hadn't been overcome, it is just hidden. In CS:GO it was only two maps where it reproduced quickly :) The problem is that we don't have ready tools to write a reproducing code, because code needs to be a Linux code.

Is this possibly the same bug reported here? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280667

PR author reports their panic does not reproduce after applying this patch.

Kostik has had a valid concern about the patch. It is a big improvement but not a bullet proof fix. Since we don't have a reproducer, it is really hard to work on a proper fix. If anybody is ready to take responsibility, I would agree on pushing this to main and merging to stable/14. Otherwise it is just taking too long and already affected number of people.