Page MenuHomeFreeBSD

riscv pmap_fault: SFENCE.VMA more selectively
AcceptedPublic

Authored by freebsdphab-AX9_cmx.ietfng.org on Jul 9 2021, 1:30 PM.
Tags
Referenced Files
F97387226: D31118.diff
Sun, Sep 29, 12:26 AM
Unknown Object (File)
Thu, Sep 19, 4:34 AM
Unknown Object (File)
Sat, Sep 7, 9:00 AM
Unknown Object (File)
Aug 9 2024, 12:10 PM
Unknown Object (File)
Aug 7 2024, 8:55 PM
Unknown Object (File)
Jul 30 2024, 11:58 AM
Unknown Object (File)
Jul 8 2024, 6:59 AM
Unknown Object (File)
Jun 30 2024, 8:38 AM
Subscribers

Details

Summary

pmap_fault can change at most one page's PTE (either 2M or 4K) and, if I read the RISC-V specification correctly, that means that SFENCE.VMA with rs1 != x0 should be applicable and will spare the rest of the TLB.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

This revision is now accepted and ready to land.Jul 9 2021, 1:47 PM

Although perhaps that's fine if this is in the _fault_ path?

And how does this interact with the fact that we don't invalidate on promotion? I believe sfence.vma with an address is only required to invalidate leaves, but TLBs can cache non-leaves, so we could still return, have the processor use the stale cached non-superpage L2 entry, find the old leaves that are still around (because we keep them around) and reuse those?

Oof, that erratum; thanks for the heads up. I guess that means there should be some mechanism to replace (other uses of) sfence_vma_page with sfence_vma on effected chips? I think specifically for this case, though, it's fine: the ITLB may still fill with the old entry before this sfence.vma, but pmap_fault only changes A/D here. I suppose there could be an extra fault delivered from caching an A-clear PTE (if the load is done now by some very prognosticative speculation, say) despite that pmap_fault just set it, and I imagine the ITLB doesn't care about D at all. This extra fault will land us here again and cause another sfence.vma and even if the ITLB simultaneously refills the PTE about to be sfence.vma-ed, it will definitely see the A-set PTE from the last go around. That is, I don't think this makes anything worse.

As to interaction with transparent superpage promotion... an excellent question and thank you for making me think about it in more detail. My reading of the RISC-V spec is at best guesswork, so any or all of the below may be wrong, but, from the latest draft... Not issuing sfence.vma on promotion seems explicitly called out as tolerated in this bit of informative prose.

A consequence of this specification is that an implementation may use any translation for an address that was valid at any time since the most recent SFENCE.VMA that subsumes that address.
[...]
In a conventional TLB design, it is possible for multiple entries to match a single address if, for example, a page is upgraded to a superpage without first clearing the original non-leaf PTE’s valid bit and executing an SFENCE.VMA with rs1=x0. In this case, a similar remark applies: it is unpredictable whether the old non-leaf PTE or the new leaf PTE is used, but the behavior is otherwise well defined.

Irritatingly, the spec does not rigorously define what it means for sfence.vma to "subsume [an] address". One presumes that, for the address-carrying sfence.vmas, it's determined by mapping the address to a leaf PTE within the appropriate ASID (or a global one) and then treating all addresses in that PTE's bailiwick as subsumed. The closest the spec comes seems to be...

For the common case that the translation data structures have only been modified for a single address mapping (i.e., one page or superpage), rs1 can specify a virtual address within that mapping to effect a translation fence for that mapping only

Thinking about this for quite a while, I think this permits the following pathological behavior: if a run of (suitably aligned) 512 L3 PTEs all have A clear (or have A and W set but D clear), and the L2 PTE is transitioned from non-leaf to leaf, then an access which triggers a fault based on A (or D) could land in pmap_fault, which will set the A (or D) bits for the L2 PTE and issue this sfence.vma with rs1 != x0. The TLB then evicts the L3 PTE in question but does not evict its stale copy of the old, non-leaf L2 PTE. Upon retranslation, it uses the old non-leaf L2 PTE to begin its search and so re-loads the unmodified L3 PTE, and then faults again, forming a tight loop until some broader sfence.vma evicts the old, non-leaf L2 PTE from the TLB. The good news is that after https://reviews.freebsd.org/D30644 either all 512 L3 PTEs within a promoted L2 PTE will have both W and D set or all will have both clear, so pmap_fault will bail on stores to promoted superpages, so this change won't get hit; that is, it doesn't make anything worse. I believe the subsequent trip through vm_fault will land in pmap_enter, which will demote the superpage, install a L3 PTE with W and D both set, and then fail to re-promote the superpage. The sfence.vma issued in response will shoot down either the L2 leaf PTE from promotion and/or the old L3 leaf PTE (possibly leaving the old L2 non-leaf PTE, which now again matches the page tables in memory).

Unfortunately, the same cannot be said for faults arising from A being clear. I think the appropriate fix is to have pmap_fault issue sfence_vma() in response to A-clear L2 PTEs, which should be fairly rare. Alternatively, it would suffice to restrict promotion to runs of L3 PTEs that all have A set.