Page MenuHomeFreeBSD

linuxkpi: Fix arch_io_reserve_memtype_wc
AcceptedPublic

Authored by tijl on May 2 2024, 1:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Nov 27 2024, 4:07 AM
Unknown Object (File)
Oct 9 2024, 1:36 PM
Unknown Object (File)
Oct 9 2024, 1:35 PM
Unknown Object (File)
Oct 9 2024, 1:12 PM
Unknown Object (File)
Sep 27 2024, 2:45 PM
Unknown Object (File)
Sep 24 2024, 3:29 AM
Unknown Object (File)
Sep 22 2024, 7:03 AM
Unknown Object (File)
Sep 21 2024, 3:16 PM

Details

Summary

arch_io_reserve_memtype_wc is implemented using PHYS_TO_DMAP and
pmap_change_attr but not all architectures have a DMAP and this doesn't
match the upstream implementation. This function should register a
given range of IO memory addresses as write-combining for later
mappings, like FreeBSD vm_phys_fictitious_reg_range does, so implement
it as a wrapper around that. For non-x86 it is a no-op.

Include missing headers sys/queue.h from vm/_vm_phys.h for TAILQ, and
sys/param.h from vm/vm_phys.h for NULL so linux/io.h doesn't have to
include them.

Bump __FreeBSD_version for drm-kmod. It has to be patched to remove
calls to vm_phys_fictitious_reg_range that are now done by the drm
drivers calling arch_io_reserve_memtype_wc.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 57541
Build 54429: arc lint + arc unit

Event Timeline

tijl requested review of this revision.May 2 2024, 1:38 PM

The patch for graphics/drm-*-kmod ports is in D45059.
Example use of arch_io_reserve_memtype_wc:
https://github.com/freebsd/drm-kmod/blob/1f0859f54b6f4bcce3862f30c9c1b5cfd08710db/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c#L1048
Note that arch_io_reserve_memtype_wc is only called conditionally here so removing vm_phys_fictitious_reg_range calls like the drm-*-kmod patch does may not be correct. The i915 driver doesn't call arch_io_reserve_memtype_wc at all. It calls io_mapping_init_wc instead, which calls ioremap_wc, which calls pmap_mapdev_attr, so I think it doesn't need to call vm_phys_fictitious_reg_range but I cannot test that.

markj added inline comments.
sys/vm/vm_phys.h
43

You can just include sys/_null.h here then.

This revision is now accepted and ready to land.May 3 2024, 2:08 AM

As I noted, I am not sure that vm_phys_fictitious_reg() KPI is prepared for such use. IMO it needs to be enhanced to check if the registering range is already backed by some vm_page_t storage and not create an overlap. Right now there are only minor piece of such care, for the case where the range intersects with vm_page_array[]. Otherwise the result is UB both for memattr use, and for consequent RBTREE manipulations.

Maybe it would be better to turn arch_io_reserve_memtype_wc into a no-op returning 0 and leave the drm-*-kmod ports unchanged. That would be safer this close to 14.1 release. Before 1e99b2ee9095 the function interpreted a physical address as a virtual address so it must have always failed and after that commit the BSDFIXME comment in the link I posted before suggests it still doesn't work. So whatever this function is supposed to do has not been necessary on FreeBSD. The only benefit of implementing it as a wrapper around vm_phys_fictitious_reg_range is that it makes the FreeBSD specific code in drm-*-kmod smaller, but that's not important. Agreed?

Include missing headers sys/queue.h from vm/_vm_phys.h for TAILQ, and sys/param.h from vm/vm_phys.h for NULL so linux/io.h doesn't have to include them.

I think this part should be committed separately.

arch_io_reserve_memtype_wc is implemented using PHYS_TO_DMAP and pmap_change_attr but not all architectures have a DMAP and this doesn't match the upstream implementation.

Looking at Linux, the old version of this diff actually does partially match the implementation. If the platform doesn't have a direct map, or the PA range isn't backed by RAM, then it does nothing. Otherwise it updates the mapping type of the range in the direct map, presumably to avoid the UB that kib mentions.

I think what this function is supposed to do is check whether the PA range is covered by the vm_page_array and update the direct map if so, otherwise register a fictitious range. Perhaps this is best implemented in the VM system rather than the LinuxKPI?

AFAICS, this function is used in only one place in the AMD drivers, to reserve memory for a frame buffer. I suspect it will never overlap with the vm_page_array[] in its current usage, so the patch is ok for now if we only consider those callers.

Include missing headers sys/queue.h from vm/_vm_phys.h for TAILQ, and sys/param.h from vm/vm_phys.h for NULL so linux/io.h doesn't have to include them.

I think this part should be committed separately.

arch_io_reserve_memtype_wc is implemented using PHYS_TO_DMAP and pmap_change_attr but not all architectures have a DMAP and this doesn't match the upstream implementation.

Looking at Linux, the old version of this diff actually does partially match the implementation. If the platform doesn't have a direct map, or the PA range isn't backed by RAM, then it does nothing. Otherwise it updates the mapping type of the range in the direct map, presumably to avoid the UB that kib mentions.

I think what this function is supposed to do is check whether the PA range is covered by the vm_page_array and update the direct map if so, otherwise register a fictitious range. Perhaps this is best implemented in the VM system rather than the LinuxKPI?

The function should check the intersection with the DMAP, with the vm_page_array, and with the existing registrations. Of course all this must be done in vm_phys_register_range().

AFAICS, this function is used in only one place in the AMD drivers, to reserve memory for a frame buffer. I suspect it will never overlap with the vm_page_array[] in its current usage, so the patch is ok for now if we only consider those callers.

Yes, initially register_range() was created to provide the vm_page_t backing for managed fictitious pages in the i915 aperture. Since then another usages grown, like Xen, and now AMD.

I spent yesterday and today on this but it's taking too much time to familiarise myself with the code so I won't be able to fix this in a timely manner. It's also too risky to make changes to drm-*-kmod this close to 14.1. So I've created D45125 which should be safe to MFC.

Are the following statements correct:

  • Physical RAM may be divided in segments with address ranges that are separated from each other.
  • Fictitious ranges may be registered in between such segments, or before the lowest address or after the highest address.
  • vm_page_array covers physical RAM from lowest address to highest address (minus vm_page_array itself). In the VM_PHYSSEG_DENSE case that means it also covers the address ranges between RAM segments. In the VM_PHYSSEG_SPARSE case it doesn't.
  • Why is rG5ebe728d5300 needed? Overlap at the beginning or the end seems always invalid to me because there's always RAM there.
  • Overlap needs to be tested for using vm_phys_segs and vm_phys_early_segs, not vm_page_array.
  • Overlap with existing registrations seems to be prevented by a panic call in vm_phys_fictitious_cmp.
  • What about overlap with MMIO addresses used by devices?

I spent yesterday and today on this but it's taking too much time to familiarise myself with the code so I won't be able to fix this in a timely manner. It's also too risky to make changes to drm-*-kmod this close to 14.1. So I've created D45125 which should be safe to MFC.

Are the following statements correct:

  • Physical RAM may be divided in segments with address ranges that are separated from each other.

Right

  • Fictitious ranges may be registered in between such segments, or before the lowest address or after the highest address.

Right

  • vm_page_array covers physical RAM from lowest address to highest address (minus vm_page_array itself). In the VM_PHYSSEG_DENSE case that means it also covers the address ranges between RAM segments. In the VM_PHYSSEG_SPARSE case it doesn't.

Strictly the first statement is not true. The vm_page_array[] only covers phys_avail[] chunks of memory, and we carve initial memory allocations from there, reducing sizes of some phys_avail[] elements.

True for the second statement (about VM_PHYSSEG_DENSE/SPARSE).

  • Why is rG5ebe728d5300 needed? Overlap at the beginning or the end seems always invalid to me because there's always RAM there.

This RAM might be used specially by a hypervisor, and the address is mandated by the hypervisor. Or we put vm_page_array[] in a way that extends it into no-RAM region (no idea why).

  • Overlap needs to be tested for using vm_phys_segs and vm_phys_early_segs, not vm_page_array.

No. The final purpose of the vm_phys_fictitious_reg_range() is to make PHYS_TO_VM_PAGE() operational on that range. There you can see in which order which structures are consulted, also dependent on DENSE/SPARSE.

  • Overlap with existing registrations seems to be prevented by a panic call in vm_phys_fictitious_cmp.

When it is called on the overlap, yes. But I am sure this is not guaranteed by the RB_TREE machinery.

  • What about overlap with MMIO addresses used by devices?

I am not sure about you question. But providing the vm_page backing for MMIO (like aperture) was the initial reason for fictitious managed pages and related fictitious range registration.