Page MenuHomeFreeBSD

Netdump: coalesce writes less than buffer size for more efficient use of network
ClosedPublic

Authored by sam_samgwydir.com on May 19 2019, 4:38 PM.
Tags
None
Referenced Files
F108274675: D20317.diff
Thu, Jan 23, 9:31 AM
F108212951: D20317.id57563.diff
Wed, Jan 22, 5:26 PM
F108212456: D20317.id58118.diff
Wed, Jan 22, 5:21 PM
Unknown Object (File)
Sat, Jan 18, 10:16 PM
Unknown Object (File)
Sat, Jan 18, 9:55 PM
Unknown Object (File)
Sat, Jan 18, 5:22 PM
Unknown Object (File)
Sat, Jan 18, 5:20 PM
Unknown Object (File)
Sat, Jan 18, 5:12 PM
Subscribers

Details

Summary

The netdump-client currently utilizes a 64KB buffer to buffer pages while sending packets to the server. When writing out 64K blocks this buffer is fully utilized, but dump writes only come in 64K chunks when dumping the page directories, all other writes are in chunks of 4K. This patch buffers (at most) 16 of these blocks before sending for a slight performance improvement especially for smaller dump sizes (where the leader and headers are a larger portion of the dump).

This patch won't have a large impact on high memory systems because pages are already dumped in chunks of 64K, further work is needed to add a larger buffer and sliding window to the UDP stack in netdump_send() to see a larger improvement in those cases.

Test Plan

I have tested this patch using a bhyve VM as the netdump client, and the vm host as the server.

VM w/ 2GB of RAM (50.83% improvement)
Without: 3.2839 Gbps (2008 MB in 4.8917 seconds)
With: 4.953 Gbps (2008 MB in 3.243 seconds)

VM w/ 8GB of RAM (15.96% improvement)
Without the patch: 7.247707 Gbps (8153 MB in 8.999260 s)
With: of 8.4043 Gbps (8153 MB in 7.760788 s)

I still need to test with real hardware, especially on 10G and 100G cards

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

I think this is probably fine - we are lying to the upper layers about having successfully written a block when in fact we're only caching it, but that shouldn't cause problems. You might consider doing this in the minidump code instead, since the coalescing will help disk dumps as well. But I don't mind having this be netdump-specific for now.

When uploading patches by hand it helps a lot if you use -U9999 as suggested in https://wiki.freebsd.org/Phabricator . If you use the command-line arc client this will happen automatically.

sys/netinet/netdump/netdump_client.c
117

I think these can be made local to netdump_dumper(). Or, define a context structure containing these fields and initialize dumper.priv.

953

This file is pretty consistent about omitting braces around single statements, so I prefer to keep that style.

The changes look straightforward and correct to me, modulo stylistic nits. I agree with everything Mark has already said.

sys/netinet/netdump/netdump_client.c
118

This might be better named something related to "size" or "length" rather than offset, since it seems to track the amount of buffered data.

Also, buffer sizes are canonically size_t rather than off_t. off_t is fine (and correct) for nd_tx_offset.

sam_samgwydir.com marked 3 inline comments as done.

Thanks! Some more personal preference stuff, Mark can veto me :-).

sys/netinet/netdump/netdump.h
81 ↗(On Diff #57636)

This can just be private to the c file.

sys/netinet/netdump/netdump_client.c
946

Cast from void* is redundant in C

1424

If this dumper is unregistered, does something free this? Sorry, I’m on my phone and it’s difficult to check.

This revision is now accepted and ready to land.May 21 2019, 4:44 PM
sys/netinet/netdump/netdump_client.c
946

Also the declaration should be at the beginning of the function.

969

Indentation should be by four spaces.

1418

You can add M_ZERO to the malloc() flags instead.

1424

Hmm, no, this will be leaked. We could perhaps avoid the problem by adding the fields to nd_conf instead.

sys/netinet/netdump/netdump_client.c
1424

I'd prefer non-global netdump configuration at some point (something like this use of priv) but it doesn't have to be in this change.

sys/netinet/netdump/netdump_client.c
1424

Right, but nd_conf would become non-global at that point too, no?

sys/netinet/netdump/netdump_client.c
1424

Yep.

sam_samgwydir.com marked 5 inline comments as done.

I put the transfer state variables under nd_conf but kept the nd_ prefix to differentiate them from config.

This revision now requires review to proceed.May 22 2019, 4:25 PM

Stylistic nits aside, the change LGTM.

sys/netinet/netdump/netdump_client.c
133

I'd suggest a comment separating these members from the configuration state too. Can be a one-liner, "runtime state" or something seems fine.

969

It looks like we lost 4-space line wrap indentation.

This revision is now accepted and ready to land.May 22 2019, 5:01 PM

+1 to Conrad's comments.

sys/netinet/netdump/netdump_client.c
952

Indentation is wrong here too.

Followed style(9), added comment.

This revision now requires review to proceed.May 23 2019, 12:33 AM
This revision is now accepted and ready to land.May 23 2019, 12:43 AM

I was just about to commit this, but then I noticed that we are assuming writes from the upper layer are contiguous. I think that's always true today, but we would get silent corruption if it stopped being true. So, before appending to the buffer, we should also check that nd_tx_off + nd_buf_len == offset.

I was just about to commit this, but then I noticed that we are assuming writes from the upper layer are contiguous. I think that's always true today, but we would get silent corruption if it stopped being true. So, before appending to the buffer, we should also check that nd_tx_off + nd_buf_len == offset.

Didn't we recently change the dump APIs to be streaming-only anyway (dump_append; there is no dump_write at offset anymore)? It seems like maybe a safe assumption.

In D20317#440195, @cem wrote:

I was just about to commit this, but then I noticed that we are assuming writes from the upper layer are contiguous. I think that's always true today, but we would get silent corruption if it stopped being true. So, before appending to the buffer, we should also check that nd_tx_off + nd_buf_len == offset.

Didn't we recently change the dump APIs to be streaming-only anyway (dump_append; there is no dump_write at offset anymore)? It seems like maybe a safe assumption.

That's true, but the compression and encryption code are their own upper layer and specify an offset with each write. At least, I think we should assert the condition above as an invariant.

In D20317#440195, @cem wrote:

I was just about to commit this, but then I noticed that we are assuming writes from the upper layer are contiguous. I think that's always true today, but we would get silent corruption if it stopped being true. So, before appending to the buffer, we should also check that nd_tx_off + nd_buf_len == offset.

Didn't we recently change the dump APIs to be streaming-only anyway (dump_append; there is no dump_write at offset anymore)? It seems like maybe a safe assumption.

That's true, but the compression and encryption code are their own upper layer and specify an offset with each write. At least, I think we should assert the condition above as an invariant.

Does a kassert fit well during panic dump? I’m fine with asserting contiguous writes in some way, ie print and return with error?

Long term it might make sense to push offset tracking down a layer, if it isn’t too inconvenient for the disk dumper.

In D20317#440220, @cem wrote:
In D20317#440195, @cem wrote:

I was just about to commit this, but then I noticed that we are assuming writes from the upper layer are contiguous. I think that's always true today, but we would get silent corruption if it stopped being true. So, before appending to the buffer, we should also check that nd_tx_off + nd_buf_len == offset.

Didn't we recently change the dump APIs to be streaming-only anyway (dump_append; there is no dump_write at offset anymore)? It seems like maybe a safe assumption.

That's true, but the compression and encryption code are their own upper layer and specify an offset with each write. At least, I think we should assert the condition above as an invariant.

Does a kassert fit well during panic dump? I’m fine with asserting contiguous writes in some way, ie print and return with error?

I mean, it's not much more work to just handle non-contiguous writes: just add a check to the beginning of the function:

if (virtual == NULL || (nd_conf.nd_buf_len != 0 && nd_conf.nd_tx_off + nd_conf.nd_buf_len != offset))
    <flush buffer>

To add an assert or error check you'd need to write the same predicate anyway.

Long term it might make sense to push offset tracking down a layer, if it isn’t too inconvenient for the disk dumper.

Could be, though then we'd get some duplication. I do think the MD kernel dump code should be doing this buffering in the first place.

I mean, it's not much more work to just handle non-contiguous writes: just add a check to the beginning of the function:

if (virtual == NULL || (nd_conf.nd_buf_len != 0 && nd_conf.nd_tx_off + nd_conf.nd_buf_len != offset))
    <flush buffer>

To add an assert or error check you'd need to write the same predicate anyway.

Sorry, I didn't realize netdump_send could express non-contiguous offsets. I guess that makes sense. Does netdumpd on the other end tolerate them? :-)

Long term it might make sense to push offset tracking down a layer, if it isn’t too inconvenient for the disk dumper.

Could be, though then we'd get some duplication. I do think the MD kernel dump code should be doing this buffering in the first place.

Yeah, that's true. Buffering up to MAXPHYS or DFLTPHYS (or some other way of detecting dumper iosize) could be beneficial on conventional disk media as well.

In D20317#440267, @cem wrote:

I mean, it's not much more work to just handle non-contiguous writes: just add a check to the beginning of the function:

if (virtual == NULL || (nd_conf.nd_buf_len != 0 && nd_conf.nd_tx_off + nd_conf.nd_buf_len != offset))
    <flush buffer>

To add an assert or error check you'd need to write the same predicate anyway.

Sorry, I didn't realize netdump_send could express non-contiguous offsets. I guess that makes sense. Does netdumpd on the other end tolerate them? :-)

Yep, it has to because we're using UDP and packets can be reordered.

Try this.

I felt it was clearer if the buffer flushing due to non-contiguous writes and a leftover unfilled buffer were separate.

I added (offset != 0) to the predicate to ignore the final len=0, offset=0 write.

This revision now requires review to proceed.May 30 2019, 7:59 PM

I added (offset != 0) to the predicate to ignore the final len=0, offset=0 write.

This could be avoided by moving the non-contiguous write flush below the if (virtual == NULL) { check, which is also true of the final len=0,offset=0 "flush" pseudo-write.

I felt it was clearer if the buffer flushing due to non-contiguous writes and a leftover unfilled buffer were separate.

I'd prefer to just merge the cases and flush buffer in one location. An if statement can have an OR clause between the two reasons. If it isn't immediately obvious from the code, a comment can be added explaining the two distinct conditions check for. If we extend the existing case at line ~980, the offset != 0 condition is not necessary.

sys/netinet/netdump/netdump_client.c
951

I would leave these out of the final commit — too noisy. Fine for testing.

Moved the non-contiguous write to the other other flush buffer case -- it's cleaner that way.

The (virtual == NULL) case still needs special handling because it cannot return the error where it would in the other case.

Missed an "if" -- this should be it.

This revision is now accepted and ready to land.May 30 2019, 8:34 PM
This revision was automatically updated to reflect the committed changes.