I found recently that I had some SSE code that assumed 16-byte alignment without checking for it. Unfortunately, PluginHost doesn't enforce alignment so sometimes my code would crash and sometimes it wouldn't.
To test for 16-byte alignment I'm doing something like this:
After a bit of investigation I've found that by using DirectSound and trying different buffer sizes I can usually find a case where the audio data isn't aligned. Note that the allocation is kind of random, so the same buffer size might sometimes result in aligned data.
The attached plugin code can be used to demonstrate this. It also shows how to allocate data using _mm_malloc and then use AudioSampleBuffer::dataReferTo() so we end up with an aligned AudioSampleBuffer. It's important to never call AudioSampleBuffer::setSize() as this will reallocate the memory using malloc. As you can see, that's all a bit kludgy.
Jules - can you think of a neat way to add aligned allocation as an option for AudioSampleBuffer? Seems to me it might be a bit of work to make it cater for different alignments (16 for SSE, 32 for AVX) and also make it safely portable.
I can't quite see the usefulness of making the AudioSampleBuffer align the start of the block that it allocates.. Even if it did so, surely very little code is written that will only take a pointer to the buffer's data at sample index 0, and will always process a number of samples that's a multiple of 4..?
In almost all the places where I use buffers, I seem to do things like working on a sub-section of the buffer, starting from an arbitrary sample index, so even if the overall buffer is aligned, the sample you're starting from won't be! So surely the correct thing to do here is not to expect the AudioBuffer to magically always be aligned, but to write SSE routines that handle stray unaligned values at the start + end of the block while using SSE for the bulk of their work?
A great deal of the code I put in processBlock processes all of the samples in the buffer, starting at sample index 0. I would have thought that was true for many audio plugins! However, I see what you're saying and I could re-write my own SSE routines to fix the start.
The problem is that I've got a heavy reliance on libraries which assume that the first sample is aligned and then process any remaining ones at the end.
Yes, I understand, and it certainly wouldn't do any harm to tweak AudioSampleBuffer in that way. I probably should do at some point..
But if these are 3rd party libraries, it seems even stranger to me that anyone would publish a library routine that will crash if given a non-aligned pointer. What do they expect you to do if you need to run it on a sub-section of your data?
I've no idea if this is of any use. In the dark recesses of my mind I think I recall quadword alignment works well on certain processor (early ARMs http://stardot.org.uk/forums/viewtopic.php?t=9055&p=102086 , but possibly others. Often used by Demo coders) - both data and code (tight-loops) aligned to quad words works more efficiently AFAICR. Another thing (half-remembered) is that some (native) malloc libraries automatically start memory blocks on the most efficient on their implementations (again, more a feeling than a memory) - [I think what I'm suggesting is #define JUCE_QUADWORD_ALIGNED option for certain imps might be handy]
Writing algorithms based on the a data alignment premise seems kinda fragile. The canonical way is to write a scalar prologue and epilogue to the main vectorized loop.
Doing this incurs little-to-no cost for aligned buffers, and provides compability for arbitrarily sized buffers of arbitrary alignment. Of course, a library should always provide aligned data where possible, though.
Actually, the AudioBuffer class will already align each of its channels to 16 byte boundaries, I think.
But like Mayae says, that’s not really much use in the real world. You can’t write code that only works on 16-byte aligned data, as any practical algorithm will need to process sub-sections of data, or lengths that aren’t a multiple of 4, so you need to write preamble code anyway.
I agree that it’s sensible to write code that won’t crash if data is not aligned. However one of the libraries I use only uses an epilogue and no prologue and hence misses some opportunities to apply vectorisation if the start of the buffer is not aligned. Seeing as there is no real cost to address this, and some benefit (at least for me), then why not?
Maybe I misunderstand something, but I just tried this code:
AudioBuffer<float> buff(2, 10);
auto a = buff.getReadPointer(0);
a = buff.getReadPointer(1);
and these are the pointers for the 2 channels:
0x0000 0000 0060 4088
0x0000 0000 0060 40b0
I don’t see the alignment here, but again maybe it’s something I don’t get. The second pointer points to a 16byte aligned float, but it’s just accidental because the 10 samples need 40 bytes. Also don’t see any alignment in the AudioBuffer::allocateData() function.
Could you clarify what you mean by aligned Audiobuffers?
Thanks in advance!
Just had to implement my own workaround with setDataToReferTo() to have buffers with aligned memory. It works but is really ugly and not safe (AudioSampleBuffer can reallocate).
@jules Please consider adding an option to AudioSampleBuffer to allocate arbitrarily aligned memory. It is simple to add, backwards compatible, and makes our lives much more easier under certain circumstances.