Page MenuHomeFreeBSD

cp: Add a -S option which preserves holes in sparse files.
AbandonedPublic

Authored by des on Jan 25 2023, 3:36 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 9, 12:47 AM
Unknown Object (File)
Fri, Nov 8, 8:27 PM
Unknown Object (File)
Oct 4 2024, 9:35 AM
Unknown Object (File)
Oct 2 2024, 4:55 PM
Unknown Object (File)
Oct 2 2024, 4:55 PM
Unknown Object (File)
Oct 2 2024, 4:55 PM
Unknown Object (File)
Oct 2 2024, 4:43 PM
Unknown Object (File)
Sep 17 2024, 8:16 PM
Subscribers

Details

Reviewers
mckusick
Group Reviewers
Klara
Summary

MFC after: 1 week
Sponsored by: Klara, Inc.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 49349
Build 46239: arc lint + arc unit

Event Timeline

des requested review of this revision.Jan 25 2023, 3:36 PM

cp already preserves holes in files when it uses copy_file_range(). At least, the generic implementation used by UFS and ZFS does. So do we really need a new option? To give consistent behaviour, it seems like it should be sufficient to look for a hole somewhere in the file and refuse to use mmap() if one is present.

cp already preserves holes in files when it uses copy_file_range(). At least, the generic implementation used by UFS and ZFS does. So do we really need a new option? To give consistent behaviour, it seems like it should be sufficient to look for a hole somewhere in the file and refuse to use mmap() if one is present.

The copy_file_range(2) man page explicitly says it shouldn't be relied upon to preserve holes...

In D38191#868007, @des wrote:

cp already preserves holes in files when it uses copy_file_range(). At least, the generic implementation used by UFS and ZFS does. So do we really need a new option? To give consistent behaviour, it seems like it should be sufficient to look for a hole somewhere in the file and refuse to use mmap() if one is present.

The copy_file_range(2) man page explicitly says it shouldn't be relied upon to preserve holes...

Well, certainly it can't in general since the destination might be on a filesystem that doesn't support them or has a different minimum hole size, but that's true of this implementation as well. Are there some known cases where this implementation preserves holes but copy_file_range() does not? All of your tests pass for me on UFS and ZFS with an unmodified cp(1) if I remove "-S" from the test invocations.

That verbiage in the man page seems similar to that in Linux. Maybe @rmacklem can comment as to whether it applies here?

For local file systems that support SEEK_HOLE/SEEK_DATA,
copy_file_range(2) should retain holes.
It also tries to handle the case where the input file system
does not support SEEK_HOLE/SEEK_DATA by scanning read
blocks to see if they are all 0 bytes and not doing the write
for that case.
(At this time, using the default VOP_COPY_FILE_RANGE(9).)

If file systems implement custom VOP_COPY_FILE_RANGE(9)
VOPs then all bets are off.
For NFS, holes are only retained if the mount is NFSv4.2 and
the server supports the Seek operation and if the source and
destination are on the same mount.
--> Since I'm an NFS guy, that is why the man page says what
it says.

I'll look at the patch later, but I will note that it might be
interesting to benchmark doing small files via mmap(2)
vs copy_file_range(2). It might turn out that mmap(2)
is not faster and, if so, just using copy_file_range(2) for
all copies might make sense?

You might want to add asomers@, since he did the conversion of
cp(1) to use copy_file_range(2).

I'd suggest you test copying of sparse files
via "cp" without your patch and then focus
on any cases where the holes are not retained.

I suspect there will be few of them, unless you
test small files, where "cp" uses mmap(2) instead
of copy_file_range(2), assuming regular files on
file system(s) that support SEEK_DATA/SEEK_HOLE.

This patch seems to assume that copy_file_range(2)
does not use SEEK_DATA/SEEK_HOLE. If you look
at vn_generic_copy_file_range(), you will see that it
does use them for local file systems that support them.

Independently of the matter of sparse files, should we remove the mmap() code entirely? Even in the absence of copy_file_range(), there is no real benefit outside of a narrow range of file sizes — much narrower, I suspect, than the [0, 8MB] range set out in the code.

Moved test cases to D38290.

Abandoning in favor of D38290 and D38291.