Page MenuHomeFreeBSD

makefs: Add ZFS support
ClosedPublic

Authored by markj on May 18 2022, 6:58 PM.
Tags
None
Referenced Files
F102854007: D35248.id106145.diff
Mon, Nov 18, 12:50 AM
Unknown Object (File)
Sun, Nov 3, 4:24 AM
Unknown Object (File)
Tue, Oct 22, 2:42 PM
Unknown Object (File)
Sun, Oct 20, 2:02 AM
Unknown Object (File)
Sat, Oct 19, 8:12 PM
Unknown Object (File)
Oct 10 2024, 1:07 PM
Unknown Object (File)
Oct 10 2024, 11:16 AM
Unknown Object (File)
Oct 10 2024, 11:15 AM

Details

Summary

Note: this is a work in progress, there are some minor code issues still left to address

This allows one to create a ZFS pool out of a directory tree. It's
intended to be used to build VM images so that users can take advantage
of the features of ZFS as a root filesystem.

The goals of the implementation are to:

  1. provide reproducible images
  2. not require any special privileges to create the image
  3. be reasonably fast

The implementation makes use of some ZFS code from the loader,
specifically a small nvlist implementation, fletcher4 and SHA
implementations for checksums, and zfsimpl.h. There is no dependency on
OpenZFS for now, so this will hopefully be easy to maintain.

Test Plan

regression tests, using VM images created by this code

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Warnings
SeverityLocationCodeMessage
Warningusr.sbin/makefs/zfs.c:1676SPELL1Possible Spelling Mistake
Warningusr.sbin/makefs/zfs.c:1685SPELL1Possible Spelling Mistake
Warningusr.sbin/makefs/zfs.c:1688SPELL1Possible Spelling Mistake
Warningusr.sbin/makefs/zfs.c:1690SPELL1Possible Spelling Mistake
Warningusr.sbin/makefs/zfs.c:1691SPELL1Possible Spelling Mistake
Warningusr.sbin/makefs/zfs.c:1692SPELL1Possible Spelling Mistake
Warningusr.sbin/makefs/zfs.c:1692SPELL1Possible Spelling Mistake
Warningusr.sbin/makefs/zfs.c:1693SPELL1Possible Spelling Mistake
Warningusr.sbin/makefs/zfs/zfsimpl.h:1723SPELL1Possible Spelling Mistake
Unit
No Test Coverage
Build Status
Buildable 45654
Build 42542: arc lint + arc unit

Event Timeline

markj requested review of this revision.May 18 2022, 6:58 PM
markj edited the test plan for this revision. (Show Details)
markj added a subscriber: emaste.

Shouldn't you reach over into the boot loader to get the shared files?
If there's some reason you can't do this, it should be documented and/or discussed.

In D35248#798943, @imp wrote:

Shouldn't you reach over into the boot loader to get the shared files?
If there's some reason you can't do this, it should be documented and/or discussed.

Yes, there's no real reason not to. I just did copied them because I had to modify some of the shared files and wanted to keep things self-contained.

I also wonder if there's precedent for sharing code with the loader like that. Should it perhaps be moved to some common area, or should I just set .PATH=stand/libsa/zfs and not worry about it?

Drop a few spurious diffs.

We build with LOCAL_XTOOL_DIRS="lib/libnetbsd usr.sbin/makefs usr.bin/mkimg" so makefs(+mkimg) can be used on non-FreeBSD to build disk images. Do you know if this code builds fine on Linux and macOS? Having it turned off for defined(BOOTSTRAPPING) && ${.MAKE.OS} != "FreeBSD" would be fine if not and it's awkward to fix.

We build with LOCAL_XTOOL_DIRS="lib/libnetbsd usr.sbin/makefs usr.bin/mkimg" so makefs(+mkimg) can be used on non-FreeBSD to build disk images. Do you know if this code builds fine on Linux and macOS? Having it turned off for defined(BOOTSTRAPPING) && ${.MAKE.OS} != "FreeBSD" would be fine if not and it's awkward to fix.

I have no idea if it'd work. I can try it and see. If not it should be easy to exclude it, and this reminds me that I have to do that anyway if WITHOUT_ZFS is set.

should I just set .PATH=stand/libsa/zfs and not worry about it?

I think this is fine, at least initially.

usr.sbin/makefs/Makefile
51

can drop the trailing \

usr.sbin/makefs/makefs.c
272

Heh,

Supercede has occurred as a spelling variant of supersede since the 17th century, and it is common in current published writing. It continues, however, to be widely regarded as an error.

https://www.merriam-webster.com/dictionary/supercede

usr.sbin/makefs/tests/Makefile
16

huh, what do the existing tests do

usr.sbin/makefs/tests/makefs_zfs_tests.sh
174–175

Interesting point - ideally we'd be able to generate sparse files within a makefs-created image.

usr.sbin/makefs/zfs.c
1632

(sigh)

I also wonder if there's precedent for sharing code with the loader like that. Should it perhaps be moved to some common area, or should I just set .PATH=stand/libsa/zfs and not worry about it?

We don't have a current plan for this. We reach over to the first place that had it. But for things like this, maybe we need a new top-level common directory?

In D35248#798943, @imp wrote:

Shouldn't you reach over into the boot loader to get the shared files?
If there's some reason you can't do this, it should be documented and/or discussed.

Yes, there's no real reason not to. I just did copied them because I had to modify some of the shared files and wanted to keep things self-contained.

I also wonder if there's precedent for sharing code with the loader like that. Should it perhaps be moved to some common area, or should I just set .PATH=stand/libsa/zfs and not worry about it?

In the past some of the code that was shared between the loaders and/or other parts not from upstream lived under sys/cddl/boot/zfs

usr.sbin/makefs/zfs.c
748

when ZFS grows the pool, it will add more metaslabs of this same size.

So we want to keep the number of slabs small, so that growing the pool to 100x the size doesn't result in 16,000 metaslabs.

The only downside is, any slack that doesn't fit into a metaslab will be unused, so if we just set a metaslab size of 4G, then we'd only have 8G usable from a 10G image.

Although without knowing how large they are going to grow it, it can be difficult to guess what number is right.

In D35248#798943, @imp wrote:

Shouldn't you reach over into the boot loader to get the shared files?
If there's some reason you can't do this, it should be documented and/or discussed.

Yes, there's no real reason not to. I just did copied them because I had to modify some of the shared files and wanted to keep things self-contained.

I also wonder if there's precedent for sharing code with the loader like that. Should it perhaps be moved to some common area, or should I just set .PATH=stand/libsa/zfs and not worry about it?

In the past some of the code that was shared between the loaders and/or other parts not from upstream lived under sys/cddl/boot/zfs

I'll be using some of the stuff under sys/cddl/boot/zfs, but the nvlist implementation is BSD-licensed and so does not belong there. For now I'll just keep it in stand/libsa.

usr.sbin/makefs/tests/Makefile
16

It was just something I did early on to make it easier to run tests, where I had a bunch of tests that didn't require any privileges (since I hadn't yet gotten to the point of successfully importing the generated pool). Dropped now, I just forgot to revert that bit.

usr.sbin/makefs/tests/makefs_zfs_tests.sh
174–175

Yeah, that's a future TODO. For now I could just have the tests write data from /dev/random or something. Maybe that'd be a bit too slow under qemu though, not sure yet.

usr.sbin/makefs/zfs.c
748

Ahh, I thought that ZFS would potentially resize the metaslabs themselves. Apparently not. I'm not sure how best to handle this, given that OpenZFS itself doesn't seem to bother (at least based on the end of the comment in vdev_metaslab_set_size()).

It's not really possible to know which metaslab size is the "right" one, though. The tradeoffs involved also depend on whether certain optional features are enabled, like the log spacemap and spacemap histograms, I think. Right now, makefs' choice is too closely tied to the initial image size and will give excessively small metaslabs for most purposes. If our chosen metaslab size is large, we can also adjust the output image size to compensate based on the -m and -M parameters, so unused space isn't a major problem.

I'm inclined to keep it simple and do the following:

  • let the metaslab size be 10% of the image size, up to 512MB
  • add a -o msshift option to let the user override the heuristic

That'll give basically the same size as OpenZFS for images expanded to up to 100GB. Beyond that, the answer might just be to create a separate pool if you really need a larger metaslab size, or somehow teach OpenZFS to resize metaslabs if it can determine that it'd be profitable to do so.

markj marked an inline comment as done.
  • Change the metaslab sizing heuristic to use larger metaslabs
  • Add a -o mssize option to provide user control over the metaslab size
  • Share libsa's nvlist implementation rather than copying it
  • Handle WITHOUT_ZFS in the makefs build
  • Fix file ACL definitions to properly respect the input file mode
  • Define a guid for DSL datasets, previously it was always just zero
  • Give the root and disk vdevs different GUIDs, otherwise autoexpand doesn't work
  • Add a few test cases, including one for autoexpand, clean up some existing ones
gbe added a subscriber: gbe.

LGTM for the man page part.

Manual page English looks good to me. Can't attest to consistency with source code.

usr.sbin/makefs/makefs.c
272

Either way, "superscedes" (both s and c) is a misspelling. (I'd quibble with MW's "common in current published writing". It's attested, but much less than "supersede". But I'm wandering.)

This revision is now accepted and ready to land.May 21 2022, 4:15 PM
markj marked an inline comment as done.
  • Rebase.
  • Share zfsimpl.h with the boot loader rather than copying it.
  • Split zfs.c into multiple files, extend various interfaces so that structure definitions (e.g., objsets, ZAPs, can be private to a single file).
  • Man page updates.
  • Allocate dnodes lazily rather than requiring the consumer to determine how many are needed up front when creating an object set.
This revision now requires review to proceed.May 26 2022, 3:45 PM

I don't plan to make any further functional changes at the moment, so if anyone is interested in reviewing the implementation, it should be safe to look.

I don't plan to make any further functional changes at the moment, so if anyone is interested in reviewing the implementation, it should be safe to look.

Search box (top right) hints ZFS may be a good group reviewer to add.

usr.sbin/makefs/Makefile
22

We might want to update the WITHOUT_ZFS description also - it includes "libraries and user commands" now, but doesn't reference ZFS support in tools that are not exclusively ZFS.

Although now that I look at it, stand doesn't use MK_ZFS at all right now, but probably should? A separate issue in any case.

usr.sbin/makefs/Makefile
22

In most of the Makefiles, it's spelled MK_LOADER_ZFS because it's traditionally been decoupled from the host for a number of reasons, not least in that not all loaders support ZFS. There's not a good, general mechanism for a lot of these loader variables. I did a lot of cleanup when I imported lua (I retired a lot of accumulated technical debt, but we have a long ways to go still). Detangling it would be more than just including MK_ZFS somewhere, I'd think.

usr.sbin/makefs/Makefile
22

Hmm, maybe this should be spelled MK_CDDL (as in libprocstat) instead. The makefs code itself is BSD-licensed but makes use of the CDDL-licensed zfsimpl.h.

usr.sbin/makefs/Makefile
22

Downstream we changed that to MK_ZFS so we could turn it off for CHERI as ZFS has not been ported, and MK_ZFS makes more sense to me there. ZFS is CDDL-licensed but gets its own knob, so why should using bits of ZFS elsewhere not use the same knob just because they're CDDL-licensed? Turning off CDDL implies turning off (LOADER_)ZFS anyway.

usr.sbin/makefs/Makefile
22

That makes sense to me. All of the other WITHOUT_CDDL uses are related to DTrace.

src.conf.5 update for WITHOUT_ZFS: https://reviews.freebsd.org/D35337

If there's no objection to changing the option for libprocstat, I can do that.

usr.sbin/makefs/Makefile
22

Ah, I see. It seems the same could apply here since we can have ZFS support in makefs regardless of whether we have kernel / userland support, but not really a practical concern.

usr.sbin/makefs/tests/makefs_zfs_tests.sh
33

Curious about this - just to avoid issues from simultaneous test execution? (Is this one case where $$ might actually be a good option?)

markj marked an inline comment as done.Jun 1 2022, 2:03 PM
markj added inline comments.
usr.sbin/makefs/tests/makefs_zfs_tests.sh
33

Yeah, $$ could be used here.

The randomized name ensures that tests can be executed simultaneously. I was also a bit paranoid about colliding with an existing pool, even though it's already quite unlikely that anyone has a pool called "makefstest"...

This revision was not accepted when it landed; it landed in state Needs Review.Aug 5 2022, 5:44 PM
This revision was landed with ongoing or failed builds.
Closed by commit rG240afd8c1fcc: makefs: Add ZFS support (authored by markj). · Explain Why
This revision was automatically updated to reflect the committed changes.
markj marked an inline comment as done.