It broke interrupt routing on some test hardware. This reverts commit eee6537665ae9830e831d7473dedd6b2cc81c2ea. This reverts commit 2bb16c6352494bf7aba92be700908ddf01e0c08b. Sponsored by: The FreeBSD Foundation
Details
- Reviewers
markj
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
It'd be useful to know where to start looking for the root cause. We have nexus_bind_intr() -> _intr_event_bind() -> ie_assign_cpu, and I think ie_assign_cpu will be intr_assign_cpu() on amd64, and the problem is likely in msi_assign_cpu()?
eee6537665ae9830e831d7473dedd6b2cc81c2ea and 2bb16c6352494bf7aba92be700908ddf01e0c08b are correct/required; this revert "works" because it truncates to CPU<256; adding cpu &= 0xff "works" equally well.
I'm unsure whether it works well as part of any existing differential, but it was suggested there should be a cpu_t type definition. That didn't fit as part of D36901, but does seem worthwhile. That would likely address the breakage which was found.
I think we should indeed have a cpu_t definition, but it would not help the issue here.
For context, I have a system with 512 CPUs. sys/dev/nvme/nvme_ctrlr.c does:
for (i = c = n = 0; i < ctrlr->num_io_queues; i++, c += n) { qpair = &ctrlr->ioq[i]; /* * Admin queue has ID=0. IO queues start at ID=1 - * hence the 'i+1' here. */ qpair->id = i + 1; if (ctrlr->num_io_queues > 1) { /* Find number of CPUs served by this queue. */ for (n = 1; QP(ctrlr, c + n) == i; n++) ; /* Shuffle multiple NVMe devices between CPUs. */ qpair->cpu = c + (device_get_unit(ctrlr->dev)+n/2) % n; qpair->domain = pcpu_find(qpair->cpu)->pc_domain; } else { qpair->cpu = CPU_FFS(&cpuset_domain[ctrlr->domain]) - 1; qpair->domain = ctrlr->domain; } /* * For I/O queues, use the controller-wide max_xfer_size * calculated in nvme_attach(). */ error = nvme_qpair_construct(qpair, num_entries, num_trackers, ctrlr); if (error) return (error); /* * Do not bother binding interrupts if we only have one I/O * interrupt thread for this controller. */ if (ctrlr->num_io_queues > 1) bus_bind_intr(ctrlr->dev, qpair->res, qpair->cpu); }
that is, it has its own logic to choose a CPU for each qpair if more than one, and it ends up choosing CPUs with APIC ID > 255 for some number of qpairs. prior to 2bb16c6352494bf7aba92be700908ddf01e0c08b (or with my cpu &= 0xff hack) we end up binding to a cpu < 256 despite nvme's intent, and interrupts work by being handled by a CPU other than what nvme requested.
With 2bb16c635249 we no longer truncate the value and then fail in msi_map if the APIC ID > 255 (if we don't have an iommu)