We use pmap_invalidate_cpu_mask() to get the set of active CPUs. This
(32 byte) set is copied by value through multiple frames until we get to
smp_targeted_tlb_shootdown(), where it is copied yet again.
Avoid this copying by having pmap_invalidate_*() make a local copy of
the active CPU set and passing it by reference. Also leverage the use
of the non-destructive CPU_FOREACH_ISSET to avoid unneeded copying
within smp_targeted_tlb_shootdown().
No functional change intended.