Page MenuHomeFreeBSD

memdesc: Helper function to construct mbuf chain backed by memdesc buffer
ClosedPublic

Authored by jhb on Dec 6 2023, 10:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Oct 22, 5:58 PM
Unknown Object (File)
Tue, Oct 22, 9:53 AM
Unknown Object (File)
Tue, Oct 22, 9:52 AM
Unknown Object (File)
Tue, Oct 22, 9:52 AM
Unknown Object (File)
Tue, Oct 22, 9:52 AM
Unknown Object (File)
Tue, Oct 22, 9:42 AM
Unknown Object (File)
Mon, Oct 21, 5:11 PM
Unknown Object (File)
Mon, Oct 21, 4:55 PM
Subscribers

Details

Summary

memdesc_alloc_ext_mbufs constructs a chain of external (M_EXT or
M_EXTPG) mbufs backed by a data buffer described by a memory
descriptor.

Since memory descriptors are not an actual buffer just a description
of the buffer, the caller is required to supply a couple of helper
routines to manage allocation of the raw mbufs and associating them
with a reference to the underlying buffer.

Sponsored by: Chelsio Communications

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 55141
Build 52030: arc lint + arc unit

Event Timeline

jhb requested review of this revision.Dec 6 2023, 10:51 PM
sys/kern/subr_memdesc.c
43

This comment is in the wrong place.

191

This is a duplicate of the comment added above.

394

I'd assert len == 0 || (pa & PAGE_MASK) == 0 here.

409

I can't see why this assertion is true. What if paddr_ext_mbuf() asks you to put MBUF_PEXT_MAX_PGS + 1 pages of data into an mbuf? The caller expects you to return MBUF_PEXT_MAX_PGS * PAGE_SIZE, at which point it'll allocate the next mbuf in the chain and keep going.

It looks like the check should be len > 0 && len < PAGE_SIZE && m->m_epg_npgs < MBUF_PEXT_MAX_PGS.

432

I don't really understand the rationale for this behaviour. It'd make more sense to truncate only if we'd otherwise have to allocate another mbuf.

444

Is it sufficient to simply check m->m_epg_npgs == MBUF_PEXT_MAX_PGS? I can see an argument for doing it the way you did, just double checking my understanding.

453

Perhaps also assert that appended <= len.

705
709
727
788

Or KASSERT(can_truncate || done == len, ...); to avoid an empty statement when INVARIANTS is not defined.

sys/sys/memdesc.h
172
175
186

Why have a callback at all then? What's an example scenario where you want to customize the allocation function?

sys/kern/subr_memdesc.c
43

I was trying to describe the group of functions as these are all the helpers leading up to memdesc_copyback. Perhaps the comment could be improved in some way? I was wanting to demarcate different parts of the file by the public functions and their associated helpers.

191

It's a bit different (copydata vs copyback and the source and dest buffers are flipped as a result).

409

Yes, I think the check is wrong.

432

The intention of the "can_truncate" flag is that the caller knows that there is indeed additional data (for another mbuf chain) and would like to just send the page in that following mbuf chain rather than splitting it across two chains.

The specific use case is in NVMe over TCP where a data transfer of a buffer can be split into one or more C2H (controller to host, for things like a READ) or H2C (host to controller, like a WRITE) PDUs. The two sides negotiate a maximum size for H2C PDUs and if a data transfer for an individual command is larger than that limit it is split up into multiple PDUs. The "can_truncate" case is used when a request has hit the limit so we know there's another PDU to send after this one and want to avoid splitting page across transactions. When it is the last PDU in the transfer we have to be exact and send the partial page at the end however.

444

Yes, in this case that is true as the first partial page will always go into m and all values of pa after the first call to append_paddr_range should be page-aligned.

sys/sys/memdesc.h
186

The to store a reference on the underlying memory object.

https://github.com/bsdjhb/freebsd/blob/nvmf2/sys/dev/nvmf/nvmf_tcp.c#L1085

is a set of callbacks (along with the internal ext_free routines used to drop a reference).

They are used by the function defined here:

https://github.com/bsdjhb/freebsd/blob/nvmf2/sys/dev/nvmf/nvmf_tcp.c#L1137

to build an mbuf chain for a subset of a command buffer (the logical data buffer) associated with an NVMe command transfer.

You can see an example of how a transfer can be split into multiple PDUs here:

https://github.com/bsdjhb/freebsd/blob/nvmf2/sys/dev/nvmf/nvmf_tcp.c#L1236

186

s/The to store/To store/

sys/kern/subr_memdesc.c
409

Actually I think it's right and I meant to delete my comment before submitting. :)

The m->m_epg_npgs < MBUF_PEXT_MAX_PGS check will catch the case I'm talking about.

jhb marked 9 inline comments as done.Dec 27 2023, 6:53 PM
jhb added inline comments.
sys/kern/subr_memdesc.c
705

This was copied from m_copym and should perhaps be fixed there as well?

788

I actually added an assertion for done <= len for the can_truncate case instead.

add some assertions and fix comments

markj added inline comments.
sys/kern/subr_memdesc.c
688

mbuf_subseg or _subchain would be a bit more accurate. This function is returning a contiguous subregion of the input chain, and "subset" sounds a bit strange to me for that purpose.

This revision is now accepted and ready to land.Jan 3 2024, 3:31 PM