The comparison size of 40 bytes is known at compilation time but clang refuses to optimize it, thus hand-roll a short variant.
Benchmarked on a kernel with other changes on top of it. Interestingly it fails to provide a speed up, but instead it shifts CPU time elsewhere.
before: https://people.freebsd.org/~mjg/pf_nohashrow.svg
after: https://people.freebsd.org/~mjg/pf_nohashrow_custom_bcmp.svg
You can see time in netisr_dispatch -> ip_input -> ip_tryforward drop from 12.19 to 10.40 of the total.