Fix copying planar bitmaps when the horizontal start and end are both not
multiples of 8. Then the misaligned pixels at the end were not copied.
Clean up variable misuse related to this bug. The width in bytes was
first calculated correctly and used to do complicated reblocking
correctly, but it was stored in an unrelated scratch variable and later
recalculated with an off-by-1-error, so the last byte (times 4 planes)
in the intermediate copy was not copied.
This doubly-misaligned case is especially slow. Misalignment complicates
the reblocking, and each misaligment requires a read before write, and this
read is still not done from the shadow buffer.