Page MenuHomeFreeBSD

diff: Import new diff algorithms from libdiff
AbandonedPublic

Authored by thj on Oct 3 2022, 9:35 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 13, 7:47 PM
Unknown Object (File)
Dec 18 2024, 1:22 AM
Unknown Object (File)
Dec 14 2024, 5:55 AM
Unknown Object (File)
Nov 3 2024, 6:52 PM
Unknown Object (File)
Oct 10 2024, 11:33 PM
Unknown Object (File)
Sep 26 2024, 3:15 PM
Unknown Object (File)
Sep 13 2024, 7:55 AM
Unknown Object (File)
Sep 13 2024, 7:55 AM

Details

Reviewers
bapt
pstef
jhb
Group Reviewers
Klara
Summary

Import and integrate modern diff algorithms from Game of Tree's libdiff.
This adds support for two new diff algorithms, Myers diff and Patience
diff.

These algorithms perform a different form of search compared to the
classic Stone algorithm and support escapes when worst case scenarios
are encountered.

Add the -A flag to allow selection of the algorithm, but default to
using the new Myers diff implementation.

The libdiff implementation currently only supports a subset of input and
output options supported by diff. When these options are used, but the
algorithm is not selected, automatically fallback to the classic Stone
algorithm until support for these modes can be added.

This review is published as a WIP, there are a couple of patches in the libdiff
import that still need to be upstreamed, but I think this is ready for feedback
for the integration of the library.

Test Plan

The new algorithms pass all but one subtest of the existing diff tests. This
one I have forced to stone, I think this is okay as it very recent
functionality.

Due to different diff algorithms generating different, but equivalent diffs we
have to verify functionality by running each an ensuring that applying the
created diff with patch creates the same file.

The following test file runs through commits passed on the command line and
checks that the diff remains valid between unpatched diff, gdiff and the
updated diff. I have used this for testing and it is happy with the most recent
1000 FreeBSD commits.

#!/bin/sh
#
# Test diff command by running through the commits in a git repository that is
# the current working directory. Test is performed by taking each commit and
# checking if it is equivalent (patched file is the same) as installed diff,
# gdiff and the git diff output. test-commit.sh uses $testdiffcmd and tries to
# only log minimal progress and errors.

testdiffcmd=/usr/obj/code/freebsd/worktrees/diff-got/amd64.amd64/usr.bin/diff/diff.full

#
#
# Takes commits to test on the command line
#	$ test-commit.sh 57f7a82fbb4299a255c2c20ec812258004a90631 7d98cc096b995ca3bfd85277ed009b0f872c3e1b ...
#
# failed diffs end up in test-commit.log
#
# To get big lists of commits to test we can use git log and xargs, Run through
# many commits as generated by git in 100 commit chunks increasing -P adds more
# jobs, but there seem to be concurrency issues:
# 	git log -n 1000 --pretty=format:"%H" | xargs -P1 -L 100 -t sh ./test-commit.sh unified
# will run through 1000 commits, in 100 commit chunks, with one thread
# 
# Testing files are named to try and allow concurrent access to the git repo
# for tests.

mode=$1
shift

case $mode in 
"classic")
	diffflags=""
	patchflags="-s -n"
	;;
"ed")
	diffflags="-e"
	patchflags="-e"
	;;
"unified")
	diffflags="-u"
	patchflags="-s -u"
	;;
"func")
	diffflags="-u -p"
	patchflags="-s -u"
	;;
*)
	echo "no valid mode specified"
	echo "usage: test-commit.sh [classic | ed | unified ] hash hash hash ..."
	exit
	;;
esac

commits=$@

if [ -z "$commits" ]
then
	echo no commits specified
	exit
fi

outdir=testdir
logfile=test-commit.log

patchcmd=gpatch

commitcount=0
filecount=0
skipcount=0

for commit in $commits
do
	commitcount=$((commitcount+1))
	prevcommit=$(git log -n 1 --pretty=format:"%H" $commit^1)
	files=$(git diff --name-only $prevcommit $commit)

	if [ -z $prevcommit ]
	then
		echo "no previous commits for $commit"
		continue
	fi
	printf "$commitcount:  $commit -> $prevcommit\n\t"

	for f in $files
	do
		printf "."
		filecount=$((filecount+1))
		filename=$(basename $f)

		afile=${outdir}/${filename}-${prevcommit}-a
		bfile=${outdir}/${filename}-${commit}-b

		gitdiff=${outdir}/gitdiff-${prevcommit}-${commit}
		gdiff=${outdir}/gdiff-${prevcommit}-${commit}
		bsddiff=${outdir}/bsddiff-${prevcommit}-${commit}
		testdiff=${outdir}/testdiff-${prevcommit}-${commit}

		gitpatch=${outdir}/gitpatch-${prevcommit}-${commit}
		gdiffpatch=${outdir}/gdiffpatch-${prevcommit}-${commit}
		bsdpatch=${outdir}/bsdpatch-${prevcommit}-${commit}
		testdiffpatch=${outdir}/testdiffpatch-${prevcommit}-${commit}

		git --no-pager show ${prevcommit}:${f} 2>/dev/null > $afile
		if [ $? -ne 0 ]
		then
			#echo "no a file skipping $f"
			skipcount=$((skipcount+1))
			continue
		fi

		git --no-pager show ${commit}:${f} 2>/dev/null > $bfile
		if [ $? -ne 0 ]
		then
			#echo "no b file skipping $f"
			skipcount=$((skipcount+1))
			continue
		fi

		cp ${afile} ${gitpatch}
		cp ${afile} ${gdiffpatch}
		cp ${afile} ${bsdpatch}

		git diff ${prevcommit}:${f} ${commit}:${f} > ${gitdiff}
		gdiff ${diffflags} $afile $bfile > ${gdiff}
		diff ${diffflags} $afile $bfile > ${bsddiff}

		# Check that the git patch creates b file
		$patchcmd -s -u ${gitpatch} ${gitdiff}

		if [ $? -gt 1 ]
		then
			echo "patch failed skipping $f"
			skipcount=$((skipcount+1))
			continue
		fi

		cmp ${bfile} ${gitpatch}

		# Check that the bsddiff (stone) patch creates b file
		$patchcmd ${patchflags} ${bsdpatch} ${bsddiff}
		cmp ${bfile} ${bsdpatch}

		if [ $? -ne 0 ]
		then
			# if the first test failed, this might be a binary. If
			# it is skip this file
			if ! file --mime $f | grep -q "text"
			then
				file --mime $f | grep -q "text"
				echo "binary file skipping $f"
				skipcount=$((skipcount+1))
				continue
			fi
		fi

		# Check that the gdiff patch creates b file
		$patchcmd ${patchflags} ${gdiffpatch} ${gdiff}
		cmp ${bfile} ${gdiffpatch}

#diffstat < ${gitdiff}
#wc -l ${afile} ${bfile}

		if [ ! -z $testdiffcmd ]
		then
#echo "running: $testdiffcmd ${diffflags} $afile $bfile > ${testdiff}"
			$testdiffcmd -A myers ${diffflags} $afile $bfile > ${testdiff}

			cp ${afile} ${testdiffpatch}

			$patchcmd ${patchflags} ${testdiffpatch} ${testdiff}
			cmp ${bfile} ${testdiffpatch}

			if [ $? -ne 0 ]
			then
				echo failed $prevcommit $commit >> ${logfile}
				echo $(tput bold) test diff failed $(tput sgr0)
				echo diff ${diffflags} ${afile} ${bfile}
				echo "processed $filecount files from $commitcount commits (skipped $skipcount)"
				exit 255
			fi
		fi
	done
	printf "\n"
done

#echo "Sucessfully diffed $filecount files from $commitcount commits (skipped $skipcount)"
#echo "$commits" | wc -l

#commitbacklog=100
#commits=$(git log -n ${commitbacklog} --pretty=format:"%H")
# rm log.out; time git log -n 1000 --pretty=format:"%H" | xargs -P4 -L 100 -t sh ./test-commit.sh

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 47762
Build 44649: arc lint + arc unit

Event Timeline

thj requested review of this revision.Oct 3 2022, 9:35 AM

When ready to commit is the intent that libdiff will go via contrib?

  • move libdiff into contrib
usr.bin/diff/diffreg.c
253

What's this?

Would it be possible to omit the compat subdirectory from the import?

What's preventing us from landing this?

usr.bin/diff/diffreg_new.c
142

Our diffreg() expects diffreg_new() to return D_SAME, D_DIFFER, D_BINARY or D_ERROR. However in several places we give rc the return value from a library function (e.g. diff_atomize_file) which returns 0 or an error code, then goto done which eventually returns rc unmodified. This means that unless an error occurs we almost always return 0 from diffreg_new() regardless of whether the files differed. The net result is that diff -s foo bar, where foo and bar are not identical, will correctly show the differences between foo and bar, but will then incorrectly add Files foo and bar are identical.

216

rc needs to be D_ERROR here.

219

rc needs to be D_ERROR here.

260

rc needs to be D_DIFFER here.