As accessing the tcp hostcache happens frequently on some
classes of servers, it was recommended to use atomic_add/subtract
rather than (per-CPU distributed) counters, which have to be
summed up at high cost to cache efficiency.
This eliminates a counter_u64_fetch() from a hot path, introduced
by D29510.
PR: 254333
MFC after: 2 weeks
Sponsored by: NetApp, Inc.