With a dynamic pcpu layout we cannot use startup_alloc() to allocate
per-CPU data. The radix SMR structure was the only dynamically
allocated per-CPU structure that violated this rule. I think it is
preferable to defer creation of SMR structures since startup_alloc()
does not provide domain-local memory.
Defer initialization of pwd_zone for the same reason.
Permit allocation using uma_zalloc_smr() before the SMR structure
is attached. This is fine so long as the attach happens before APs are
started.