Define memmove and make bcopy alt entry point
Make a memmove entry point just before bcopy and have it swap its args
before continuing into the body of bcopy. Adjust the returns to return
dst (original %o0 swapped to %o1) from both entry points. bcopy users
will ignore them. Since these are in the branch delay slot, it should
take no additional time. I use %o6 for this rather than just move %o1
back to %o2 at the end since my sparc64 assembler knowledge is weak.
Also eliminate wrapper call from memmove to bcopy.
Differential Revision: https://reviews.freebsd.org/D15374