Page MenuHomeFreeBSD

iflib: Allow drivers to determine which queue to TX on
ClosedPublic

Authored by erj on Aug 9 2021, 10:06 PM.
Tags
None
Referenced Files
F108310065: D31485.id101873.diff
Thu, Jan 23, 6:36 PM
Unknown Object (File)
Sat, Jan 18, 10:10 PM
Unknown Object (File)
Fri, Jan 17, 11:39 PM
Unknown Object (File)
Sat, Jan 11, 7:03 PM
Unknown Object (File)
Fri, Jan 10, 3:55 AM
Unknown Object (File)
Thu, Jan 9, 7:20 PM
Unknown Object (File)
Thu, Jan 9, 9:27 AM
Unknown Object (File)
Wed, Jan 1, 1:21 AM

Details

Summary

Adds a new function pointer to struct if_txrx in order to allow
drivers to set their own function that will determine which queue
a packet should be sent on.

(This motivation behind this is to allow the driver to examine the
UP in the VLAN tag and determine which queue to TX on based on
that, in support of HW TX traffic shaping.)

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 40954
Build 37843: arc lint + arc unit

Event Timeline

erj held this revision as a draft.
erj published this revision for review.Dec 3 2021, 10:51 PM
kbowling added a subscriber: markj.
kbowling added inline comments.
sys/net/iflib.h
407

I guess we could bump the build version instead (including in the stable branches), that seems to be how openzfs does checks. @markj any advice here?

This revision is now accepted and ready to land.Dec 10 2021, 12:53 AM
sys/net/iflib.h
407

You mean __FreeBSD_version? Yeah, I'd be inclined to go with that approach: the kernel ABI is changing here, so we'd want to bump it to ensure that modules compiled against the old iflib are rejected. iflib.h can define symbolic feature names in terms of __FreeBSD_version to make drivers a bit neater.

sys/net/iflib.h
407

So, then, something like bumping the _FreeBSD_version along with replacing the line above like:

#define IFLIB_FEATURE_QUEUE_SELECT <the new __FreeBSD_version above>

?

Though, part of the problem with this patch is that I'm not entirely sure with what I want the parameters of ift_txq_select to be.

Right now it takes the mbuf pointer which is enough for what we want to do with it in an upcoming ice(4) patch, but in the future I'll need to use DSCP/IP ToS information in that function as well. I think I should be able to put that information inside the mbuf along with something like an M_DSCP flag. Though, one version I had before also had the if_pkt_info_t struct as well as having the headers (partially) parsed before this queue selection function was called in order to avoid having to modify mbufs like that.

sys/net/iflib.h
407

Sure, and drivers could write

#if __FreeBSD_version >= IFLIB_FEATURE_QUEUE_SELECT
<tx queue selection code>
#endif

It'd be nice to have another iflib.h macro to wrap the comparison, so you can write

#if IFLIB_HAS_FEATURE(QUEUE_SELECT)
<tx queue selection code>
#endif

but of course drivers can't rely on IFLIB_HAS_FEATURE being defined.

Bump __FreeBSD_version and define IFLIB_FEATURE_QUEUE_SELECT to that new version

This revision now requires review to proceed.Dec 11 2021, 12:21 AM
erj marked 2 inline comments as done.Dec 11 2021, 12:22 AM
This revision is now accepted and ready to land.Dec 14 2021, 5:51 PM
In D31485#754650, @erj wrote:

Though, part of the problem with this patch is that I'm not entirely sure with what I want the parameters of ift_txq_select to be.

Right now it takes the mbuf pointer which is enough for what we want to do with it in an upcoming ice(4) patch, but in the future I'll need to use DSCP/IP ToS information in that function as well. I think I should be able to put that information inside the mbuf along with something like an M_DSCP flag. Though, one version I had before also had the if_pkt_info_t struct as well as having the headers (partially) parsed before this queue selection function was called in order to avoid having to modify mbufs like that.

Either seems reasonable to me. I would bias toward including the if_pkt_info since it has fields for offload handling but I don't know much about other expected use cases, maybe @jhb has some thoughts from crypto and tcp offloads.

Either seems reasonable to me. I would bias toward including the if_pkt_info since it has fields for offload handling but I don't know much about other expected use cases, maybe @jhb has some thoughts from crypto and tcp offloads.

Yeah, I think his input would be good. Maybe it's worth discussing what I want to do in the future here, too.

I looked through more of the kernel to try to find a good approach, but they all seem like they could be problematic, performance-wise:

  • I thought maybe storing the DSCP value in the mbuf would work, but the first problem is that we seem to be out of M_* flags; there are no spare flags left, and all of the remaining M_PROTO flags are used up by ieee80211 it seems. It looks like some of the code that stores VLAN PCP information uses mbuf tags, so that could be an option, but I'd have to look at every place in the code that sets the IP ToS/DSCP information and wrap that with a function that creates an mbuf tag (and similarly to the vlan_mtag_pcp stuff here, probably enable/disable that with a sysctl). Though, my gut feeling is that this seems like it could have a large negative impact on performance.
  • Instead, dealing with extracting the DSCP value from the packet inside iflib or the driver would be less invasive, but like I said in my previous comment, the thread that initially queues the packet onto a per-queue thread would have to do more work to parse headers to get that value. Though, that may be partially mitigated by the fact that we only need to peer into the outermost L3 header, which may not be too bad and might not require an m_pullup() in most cases. But that m_pullup cruft still needs to be called in there to make sure we can read at least the minimal IPv4/IPv6 header.
    • We could maybe avoid having to have the if_pkt_info_t in the function parameter list by allocating an mbuf tag or using the M_PROTO flags and empty PH_loc space to indicate a valid DSCP value exists. But, there might be more information in there that we could use in the future that might not fit into an mbuf. e.g. maybe something ADQ-like where we check flows and map the flows to one of the 16k Tx queues in ice(4) hardware in the future. I wanted to avoid adding a new parameter because we're already releasing an ice(4) driver that expects the function signature used in this patch, and it wouldn't be feasible to change it. But if we absolutely have to, then it's better to do it early I suppose.

And I think both of these options have performance impacts, but I'm not sure how to quantify them properly. For now, I'm working on patches for both approaches, and I already have a draft of the latter.

sys/net/iflib.c
4113

What's the cost these days on common platforms of following a function pointer vs an if/then/else check? I'm wondering if it might be just as fast to put the default iflib queue selection into a a default function for isc_txq_select()

sys/net/iflib.c
4113

I hope someone else can answer your question. I considered using just a function pointer, but my gut feeling was that an if-else statement was probably faster and less disruptive.

I wish someone would benchmark the difference, but I'd also believe the performance difference is small enough that people wouldn't notice the difference between the two approaches. I don't really know how to approach a benchmark for that either -- something like netmap's pkt-gen, but one that goes through this function? (Does it do that already? I didn't think it did)

gallatin added inline comments.
sys/net/iflib.c
4113

@olivier still has his benchmarking setup, and has volunteered to test this. If there is a measurable regression, do you mind implementing the alternate patch based on just function pointers for him to test as well?

Everyone here (at work) is mostly going to be on vacation for the next couple weeks, so we probably won't do much with this patch until people (and me) come back.

sys/net/iflib.c
4113

Nope, I don't mind.

We're back, and I think I want to get this committed soon because we're about ready to start adding reviews for the ice(4) update that uses this, as well as upstreaming the FreeBSD RDMA driver. Maybe I'll have to worry about the DSCP changes after this patch.

If no one has any objections, then I'll update this patch with a new FreeBSD version number, upload that here, and commit it. Then, I can submit a follow-on ice(4) update that uses it.

Rebase on latest main; change version number to latest

This revision now requires review to proceed.Jan 25 2022, 1:43 AM
This revision was not accepted when it landed; it landed in state Needs Review.Jan 25 2022, 2:25 AM
This revision was automatically updated to reflect the committed changes.