amd64: compare TLB shootdown target to all_cpus
On amd64, the pmap code passes all_cpus to
smp_targeted_tlb_shootdown() when unmapping from the
kernel pmap. This function has an optimized path to send IPIs
to all but itself, which it intends to do when the target
is all cpus. However, we need to compare the target cpu mask
with all_cpus, rather than using CPU_ISFULLSET(). Comparing with
CPU_ISFULLSET() will only work when we have MAXCPU cpus active in
the system, otherwise, we'll be sending repeated IPIs, rather than
a single IPI to all CPUs but ourself.
Fixing this should reduce the time spent in native_lapic_ipi_wait()
as we will be sending ipis in parallel, rather than one-by-one.
This is confirmed by dtrace.
Reviewed by: alc, jhb, kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D28102