Page MenuHomeFreeBSD

Makefile.inc1: Make reproducible release tarballs
Needs ReviewPublic

Authored by guest-patmaddox on Dec 9 2024, 10:47 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 27, 2:33 PM
Unknown Object (File)
Thu, Dec 26, 10:43 PM
Unknown Object (File)
Thu, Dec 26, 9:29 PM
Unknown Object (File)
Thu, Dec 26, 8:34 PM
Unknown Object (File)
Sun, Dec 22, 5:51 PM
Unknown Object (File)
Thu, Dec 19, 7:02 PM
Unknown Object (File)
Sat, Dec 14, 7:04 PM
Unknown Object (File)
Sat, Dec 14, 6:43 PM

Details

Reviewers
dch
Group Reviewers
releng
docs
Summary

moved to https://github.com/freebsd/freebsd-src/pull/1550


When WITH_REPRODUCIBLE_BUILD is on, use mtree to pin the file times to PKG_TIMESTAMP.
mtree lists files in lexicographical order, removing the non-determinism from tar -cf .

newvers.sh: set explicit --short length, which is otherwise a configurable value

PR: 283214


status: in development
base: c6e56e65

notes

make -j $(sysctl -n hw.cpu) packagesystem is currently non-deterministic for base.txz

dev work

TODO:

  • DONE verify schg
  • DONE do not apply file time when WITHOUT_REPRODUCIBLE_BUILD
  • DONE apply for all dist files when run as root
  • DONE document
  • move mtree generation from package* to distribute*

TRY:

  • fix -j non-determinism

DEFER:

  • consolidate NO_ROOT
Test Plan
  • (TODO) enumerate build variations to test
  • compare mtrees of tarballs produced by this patch to those without it, to check that it only changes the file times and order within the tarball
  • check that it does not apply file time when WITHOUT_REPRODUCIBLE_BUILD

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

guest-patmaddox edited the test plan for this revision. (Show Details)

Looks like a generally viable solution to me.

Please upload the diff with full context (-U99999).

I guess for the NO_ROOT case we should either append time= when generating METALOG entries or process the file to add it.

Makefile.inc1
606

Awk should be able to handle any comments with another match line.

1575

Looks like a debugging leftover?

1981–1982

I'm tempted to drop the ability to specify a source directory since it seems fragile and instead generate an mtree 2.0 file (which requires a pipe through mtree -C) like is used in the NO_ROOT case.

Makefile.inc1
606

It will... I just wasn't sure how to do it without tracking state. Something like:

awk 'do_print=1; /^\#/ { do_print = 0 }; ...; do_print { print }'

maybe that's just fine though. I'll condense it to just awk the best I can. Let me know if that basic idea looks good, or you have something else.

1575

yep... will remove when patch is finalized

1981–1982

I like that, will update.

Makefile.inc1
606

I had to refresh my memory, but you can use next:

printf "#foo\nbar\n" | awk '/^\#/ {print; next}  /./ {print "xxx" $0}'

produces

#foo
xxxbar
guest-patmaddox marked an inline comment as done and an inline comment as not done.
guest-patmaddox edited the summary of this revision. (Show Details)
  • fix mtree link entry
  • verify schg
  • do not apply file time when WITHOUT_REPRODUCIBLE_BUILD
  • apply for all dist files when run as root

with this diff, base.txz and kernel.txz are deterministic

mtree -R time -f before/kernel.mtree -f after/kernel.mtree lists no difference; same for base

Updated release(7). I'd like feedback on content before I get into the nitty-gritty of formatting.

share/man/man7/release.7
287–288
guest-patmaddox edited the summary of this revision. (Show Details)
  • add reproducible.sh to demonstrate usage
  • edit release(7) language

Added some notes to potential reviewers. Done for now, until I get feedback.

Also, I am new to contributing to FreeBSD. If there are ways to improve how I'm going about proposing this contribution, please let me know!

share/man/man7/release.7
286

@doc I welcome feedback on the content, before I format it.

287–288

@releng I now realize I am making quite a claim here... one that I don't have the authority to back up :)

My proposal with this change is that releng would build with a PKG_TIMESTAMP of their choosing, and publish that along with the REVISION in release notes. Anyone could then verify the build by running it with the same info.

If that's too much, I can back this off to simply state that it's possible to make reproducible builds, and not mention official releases.

Well, I sure made a major miscalculation... buildworld buildkernel has to be reproducible first :facepalm:

sorry to disturb folks, clearly I got carried away. But - I know there has been a lot of effort towards reproducible builds before. So if you have any suggestions, or ways that I may be able to help, please let me know.

buildworld buildkernel has to be reproducible first :facepalm:

The src.conf knob WITH_REPRODUCIBLE_BUILD is set by default on releases but not on main. With it off the only non-reproducibility should be metadata like build time, user, and host inserted into the kernel and boot loaders. If anything is not reproducible with WITH_REPRODUCIBLE_BUILD set that's a bug to be fixed. We had a couple of compiler-induced non-reproducibility issues recently fixed.

Update reproducible.sh to build a clean tree and env

Makefile.inc1
1564

If we keep doing this and don't just force the NO_ROOT path, I'd suggest hoisting the mtree commands up to this point an a .else branch and unconditionally creating ${dist}.meta and ${dist}.debug.meta as part of distributeworld. We could then eliminate the difference between the NO_ROOT and non-NO_ROOT paths.

1590

This is the only place where NO_ROOT and non-NO_ROOT differ (the L flag). I tried to figure out where it came from and it arrived without explanation in the original commit. I think it's most likely an error that slipped through review.

Makefile.inc1
606

I'm not a fan of implicitly using . here. Can we split out the mtree and awk parts so the path argument can be explicitly given at the uses?

Makefile.inc1
606

@brooks had the opposite preference: https://reviews.freebsd.org/D48010/new/#1094353

I prefer it together as well because it ends up as a single ${MTREE_DIST} to produce the mtree file

Makefile.inc1
606

Well, I particularly don't like hiding pipes, it looks like ${MTREE_DIST} foo bar should pass the arguments to mtree, but that's not true. Really though we should just be unconditionally creating the mtree files in distributeworld rather than reconstructing them in packageworld, and doing so with the right timestamps.

Makefile.inc1
1590

Yes, for reference the commit is 2d0bcb76c8530. It looks like I sent a draft "first attempt at debug.txz" as a start to this work. I reported having not tried even building with the patch, which was:

 diff --git a/Makefile.inc1 b/Makefile.inc1
index d65dd31..f9512ab 100644
--- a/Makefile.inc1
+++ b/Makefile.inc1
@@ -834,12 +834,31 @@ packageworld:
 .if defined(NO_ROOT)
        ${_+_}cd ${DESTDIR}/${DISTDIR}/${dist}; \
            tar cvJf ${DESTDIR}/${DISTDIR}/${dist}.txz \
+           --exclude usr/lib/debug \
            @${DESTDIR}/${DISTDIR}/${dist}.meta
 .else
        ${_+_}cd ${DESTDIR}/${DISTDIR}/${dist}; \
-           tar cvJf ${DESTDIR}/${DISTDIR}/${dist}.txz .
+           tar cvJf ${DESTDIR}/${DISTDIR}/${dist}.txz \
+           --exclude usr/lib/debug .
+.endif
+.endfor
+.if ${MK_DEBUG_FILES} != "no"
+       ${_+_}:>  ${DESTDIR}/${DISTDIR}/debug.tar
+.for dist in base ${EXTRA_DISTRIBUTIONS}
+.if defined(NO_ROOT)
+       ${_+_}cd ${DESTDIR}/${DISTDIR}/${dist}; \
+           tar rvf ${DESTDIR}/${DISTDIR}/debug.tar \
+           --include usr/lib/debug \
+           @${DESTDIR}/${DISTDIR}/${dist}.meta
+.else
+       ${_+_}cd ${DESTDIR}/${DISTDIR}/${dist}; \
+           tar rvf ${DESTDIR}/${DISTDIR}/debug.tar \
+           --include usr/lib/debug .
 .endif
 .endfor
+       ${_+_}xz ${DESTDIR}/${DISTDIR}/debug.tar
+.endif
+

 #
 # reinstall

The patch that got committed had tar cvJfL (which doesn't work), changed to tar cvJLf in 3b66656a204dc8b167c72863efb75b53a5707001.

So I concur, this was probably just an error that slipped through review. I don't see off hand why it would make sense to pass -L for the creating the debug tarball.

guest-patmaddox edited the summary of this revision. (Show Details)

Remove additional sources of non-determinism:

  • mtree -R flags: ZFS sets uarch for some files, resulting in a different tarball than one built on UFS
  • exclude .bak and .out files left behind with high -j value
  • newvers.sh: set explicit --short length, which is otherwise a configurable value

This patch produces identical MANIFEST on:

  • 14.2 ZFS
  • 14.2 UFS
  • 14.1 UFS
  • 13.4 UFS
Makefile.inc1
606

I think unconditional creation in distributeworld is the right way ago. Even if we decide to reconstruct them in the non-NO_ROOT case, we could just open code most of this since we'd only be writing it once.

(FWIW, what I didn't like about the . being exposed is it would be fragile to changing the command arguments because it depends -p being last.)

Makefile.inc1
607

You can't just unilaterally remove flags. Otherwise you lose flags=schg on various key system files. -DNO_ROOT is really the best way to get deterministic output, because all the metadata comes from the build system itself, not whatever ended up on disk and affected by the environment.

Makefile.inc1
606

we should just be unconditionally creating the mtree files in distributeworld rather than reconstructing them in packageworld, and doing so with the right timestamps.

Agreed

607

Addition of -R flags is because ZFS systems set uarch flag on a number of files, whereas UFS does not. I don't think this is the right place for the fix - it should be in distribute*.

1578

Similar to -R flags, excluding *.bak and *.old should not happen here.

Should they even be necessary at all? I only got these extra files with -j20 buildworld buildkernel. Seems like some cleanup may be needed before building mtree.

1579

One consequence of piping into ${XZ_CMD} is that if there's an issue tarring the mtree, then it produces an incomplete file and doesn't fail.

Is there a way to keep the pipe, but fail if there's an error on the ${TAR_CMD}? Or would it be okay to ${TAR_CMD} -f ${PACKAGEDIR}/${dist}-dbg.txz and bypass the pipe altogether? I don't know if someone out there is relying on the separation.

Makefile.inc1
607

You're right, I had misunderstood mtree -i.

Moving mtree into distribute* and consolidating -DNO_ROOT is next up.

Makefile.inc1
1579

The use of XZ_CMD was added in bd9cab6fb43aa and is probably obsolete. We might want --options xz:threads=<some value> to the tar command instead.

sys/conf/newvers.sh
258 ↗(On Diff #147892)

This change is good and independent of the rest. Would you care to open a pull request and I'll pick this up?

Makefile.inc1
607

D48041 is a first step in always using -DNO_ROOT to build release artifacts to facilitate consolidation.

sys/conf/newvers.sh
258 ↗(On Diff #147892)

Thanks, merged.

I'm moving this work to https://github.com/freebsd/freebsd-src/pull/1550 where I'm more comfortable working with small commits, and it seems like GitHub is preferred for outside contributors.