AudioBuffer, processBlock, alignment and SIMD

I was taking a look at this and related threads, and besides the fact that AudioBuffer only aligns to the sample type, that’s pretty much irrelevant for processBlock, where the data comes preallocated from the host. I don’t see how this could be fixed in JUCE or in the plugin. Even if the first channel of the first bus was 16-byte aligned, every other channel can only be guaranteed to align to the sample type. Copying the whole buffer to a per-channel 16-byte aligned buffer is pointless: we have the same amount of unaligned loads and stores, plus a number of aligned ones. So it seems like the best you can do is a scalar prologue to align to the first channel of the first bus, and hope the host is sending buffer sizes divisible by 16 bytes. I guess most hosts would do that on ARM, considering how expensive unaligned access seems to be there…