Page MenuHomeFreeBSD

TLB flush enlightment using Hyper-V hypercall
ClosedPublic

Authored by schakrabarti_microsoft.com on May 13 2024, 10:13 AM.
Tags
None
Referenced Files
F107879390: D45174.diff
Sat, Jan 18, 11:05 PM
F107857535: D45174.id139504.diff
Sat, Jan 18, 6:41 PM
Unknown Object (File)
Thu, Jan 16, 8:07 PM
Unknown Object (File)
Thu, Jan 9, 6:55 PM
Unknown Object (File)
Mon, Jan 6, 9:53 AM
Unknown Object (File)
Sun, Jan 5, 4:38 PM
Unknown Object (File)
Fri, Jan 3, 9:57 PM
Unknown Object (File)
Thu, Jan 2, 6:33 PM
Subscribers

Details

Summary

Currently FreeBSD uses IPI based TLB flushing for remote
TLB flushing. Hyper-V allows hypercalls to flush local and
remote TLB. The use of Hyper-V hypercalls gives significant
performance improvement in TLB operations.

This patch set during test has shown near to 40 percent
performance improvement.

Also this patch adds rep hypercall implementation as well.

Authored-by: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by: Erni Sri Satya Vennela <ernis@microsoft.com>

Signed-off-by: Souradeep Chakrabarti <schakrabarti@microsoft.com>

Test Plan

Have been tested in multiple memory allocation,
deallocation tests.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Updated vmbus_var.h to remove the unused variable.

The patch adds a macro in kernel.mk NO_WNO_GNU_VARIABLE_SIZED_TYPE_NOT_AT_END, to add an exception for compilation of hyperv_mmu.c c in conf/files.x86 .
This is to address the compiler warning for multiple fam attributes in hv_tlb_flush_ex structure. We are working with Hyper-V team on changing the structure definition for the
hypercall HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX and corresponding protocol. Once we reach the agreement, it could be changed on both Linux and FreeBSD guest.

sys/modules/hyperv/vmbus/Makefile
16 ↗(On Diff #138465)

So this bug was not fixed, despite discussion on the lists.

sys/modules/hyperv/vmbus/Makefile
16 ↗(On Diff #138465)

We are working with Hyper-V team on changing the structure definition for the
hypercall HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX and corresponding protocol and internal handling in Hyper-V.
Once we reach the agreement, it could be changed on both Linux and FreeBSD guest.

sys/modules/hyperv/vmbus/Makefile
16 ↗(On Diff #138465)

Also, this is not a bug, but specific to particular compiler. As we defined same structure in Linux and it works without adding any additional flag. There is no warning during compiling time on Linux. I think FreeBSD uses clang which is stricter than the compilers other OSes use. Second, we have expressed the concerns we got from FreeBSD community about using this structure to the team who defined HyperV protocol in Microsoft. It is possible they may change it in the future protocol. For the time being to improve the tlb flush performance, we decide to still use this structure for the current patch.

sys/amd64/amd64/mp_machdep.c
588–589

I suggest you to convert this function ifunc and provide a complete separate implementation for hyperv.

Not last because now kernel cannot be compiled without hyperv support.

Respectfully, this code is non-standards-conforming, since the standard doesn't allow multiple variable sized elements in a structure.

C23, section 6.7.2.1 paragraph 20 (final ballot draft, nearly identical language going back to C99):

As a special case, the last member of a structure with more than one named member may have an incomplete array type; this is called a flexible array member.

Here, the last two elements have an incomplete array type, though with nesting. That's not well defined in the standard. As an extension, gcc allows this specific construct, but that's a compiler-specific extension, not well formed, standards compliant code. One can quibble of what to call that thing, but I bristle over characterizing it as "not a bug" because it happens to be OK on Linux.

sys/dev/hyperv/vmbus/hyperv_mmu.c
242 ↗(On Diff #138465)

You could just as easily recast this as &flush->hv_vp_set.bank_contents[nr_bank], since that's where it's going to go and eliminate the 'offset' argument. That would allow you to replace the not-well-defined-in-C structure with something that's semantically the same (since you are doing the calculation above to find the maximum number you can put there).

sys/modules/hyperv/vmbus/Makefile
16 ↗(On Diff #138465)

Linux's gcc has this bug (or rather extension) that allows this non-standards-conforming construct to have a well-defined meaning. When using non-standards-conforming constructs, that's a bug, regardless of whether or not other compilers don't complain. The fact there's no warning on Linux carries no weight for me in this argument.

Changed the structure to remove extra FAM attribute.

sys/amd64/amd64/mp_machdep.c
516

The move of the enum to pmap.h should be a separate change.

588–589

Still not fixed. Now I think ifunc would not help. Instead, you need e.g. a pointer to system smp_targeted_tlb_shootdown function, inited by our mp_machdep.c for normal IPIs, and set to HyperV method on HyperV. As is this cannot be merged, unconditional dependency of kernel on hyperv code is not a way forward.

sys/conf/kern.mk
18 ↗(On Diff #138630)

Stray change

  • Merge branch 'freebsd:main' into review_branch
  • added some changes
  • refactored with pointer
  • pointer for the function

added changes for function pointer and contigmalloc

  • Merge branch 'freebsd:main' into review_branch
    • added some changes
    • refactored with pointer
    • pointer for the function
sys/amd64/amd64/mp_machdep.c
588–589

I have tried with function pointer, but the system is hitting panic:
I have commented hyper-v side of the pointer assignment but still the same panic is happening. Which makes me wonder if the smp_targeted_tlb_shootdown
should be initialized much before of pmap_pinit0().

Please let me know, where should it be initialized.

The trace :
Loading kernel...
/boot/kernel/kernel text=0x182e30 text=0xd7d038 text=0x42dcfc data=0x180+0xe80 data=0x196430+0x469bd0 0x8+0x199b60+0x8+0x1c3a97/
Loading configured modules...
/boot/kernel/mlx4en.ko size 0x25b68 at 0x215e000
loading required module 'mlx4'
/boot/kernel/mlx4.ko size 0x65258 at 0x2184000
/etc/hostid size=0x25
/boot/entropy size=0x1000

Booting [/boot/kernel/kernel]...
staging 0x36c00000 (not copying) tramp 0x36a97000 PT4 0x36a8e000
Start @ 0xffffffff80383000 ...
EFI framebuffer information:
addr, size 0x40000000, 0x800000
dimensions 1024 x 768
stride 1024
masks 0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x0
fault code = supervisor read instruction, page not resent
instruction pointer = 0x20:0x0
stack pointer = 0x28:0xffffffff8199aa08
frame pointer = 0x28:0xffffffff8199aa50
code segment = base 0x0, limit 0xfffff, type 0x1b

= DPL 0, pres 1, long 1, def32 0, gran 1

processor eflags = resume, IOPL = 0
current process = 0 ()
rdi: ffffffff8195e640 rsi: ffffffff82a10000 rdx: ffffffff82a11000
rcx: ffffffff81019ec0 r8: 0000000000000000 r9: 0000000000000000
rax: ffffffff8195e640 rbx: 0000000000000014 rbp: ffffffff8199aa50
r10: 800000003fffa163 r11: 000000003fffa000 r12: 0000000000001000
r3: ffffffff82a10000 r14: ffffffff82a11000 r15: 0000000000000006
trap mber = 12
panic: page fault
cpuid = 0
time = 1
KDB: stack backtrace:
Uptime: 1s
Rebooting...

sys/amd64/amd64/mp_machdep.c
170

Stray change

589

Why do you call it legacy? It is anything but legacy.

sys/amd64/amd64/pmap.c
4531 ↗(On Diff #138773)

Yes, you need to statically initialize it instead of trying to do it there. Or at least it should go into pmap_bootstrap().

sys/x86/include/x86_smp.h
85 ↗(On Diff #138773)

Why do you need this double definition and protection?

schakrabarti_microsoft.com added inline comments.
sys/amd64/amd64/pmap.c
4531 ↗(On Diff #138773)

Thank you, static initialization has fixed the issue.

sys/x86/include/x86_smp.h
85 ↗(On Diff #138773)

we need the smp_invl_cb_t typedef defined in x86_smp.h in pmap.h. But including x86_smp.h in pmap.h is causing compilation problem:

/datadrive/sandbox/src/sys/dev/acpica/acpi_pxm.c:56:15: error: redefinition of 'cpu_info'                                                                                                                                                                      56 | static struct cpu_info {                                                                                                                                                                                                                                  |               ^                                                                                                                                                                                                                                     ./x86/x86_smp.h:55:8: note: previous definition is here
   55 | struct cpu_info {
      |

Doing forward declaration the same in pmap.h without protection, is causing double definition for some
file compilation. So added this protection. I will try to figure out some cleaner way in next version patch.

  • fixed smp tb shootdown function pointer
  • fixed double declaration and cleanup
  • fixed indentation

The core changes look fine to me. I have no idea about hyper-v part.

sys/vm/pmap.h
190

Indent should be +4 spaces, same as for smp_innvl_local_cb_t line

193

No need to split after 'void', better fill more of the previous line

This revision is now accepted and ready to land.May 21 2024, 5:56 PM

When committing, please push the core changes alone. Then do the hyper-v part as a separate commit.

This revision now requires review to proceed.May 29 2024, 6:00 AM
This revision was not accepted when it landed; it landed in state Needs Review.Jun 5 2024, 12:39 PM
This revision was automatically updated to reflect the committed changes.