Alignment of AudioSampleBuffers

Hi all & Jules in particular :)

I found recently that I had some SSE code that assumed 16-byte alignment without checking for it. Unfortunately, PluginHost doesn't enforce alignment so sometimes my code would crash and sometimes it wouldn't.

To test for 16-byte alignment I'm doing something like this:

isAligned = (reinterpret_cast<uintptr_t>(buffer.getReadPointer (ch))& 0xF) == 0

After a bit of investigation I've found that by using DirectSound and trying different buffer sizes I can usually find a case where the audio data isn't aligned. Note that the allocation is kind of random, so the same buffer size might sometimes result in aligned data.

The attached plugin code can be used to demonstrate this. It also shows how to allocate data using _mm_malloc and then use AudioSampleBuffer::dataReferTo() so we end up with an aligned AudioSampleBuffer. It's important to never call AudioSampleBuffer::setSize() as this will reallocate the memory using malloc. As you can see, that's all a bit kludgy.

Jules - can you think of a neat way to add aligned allocation as an option for AudioSampleBuffer? Seems to me it might be a bit of work to make it cater for different alignments (16 for SSE, 32 for AVX) and also make it safely portable.

-Andrew

PS some useful links:

  • https://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_allocate_free_aligned_mem.htm
  • http://msdn.microsoft.com/en-us/library/83ythb65.aspx

I can't quite see the usefulness of making the AudioSampleBuffer align the start of the block that it allocates.. Even if it did so, surely very little code is written that will only take a pointer to the buffer's data at sample index 0, and will always process a number of samples that's a multiple of 4..?

In almost all the places where I use buffers, I seem to do things like working on a sub-section of the buffer, starting from an arbitrary sample index, so even if the overall buffer is aligned, the sample you're starting from won't be! So surely the correct thing to do here is not to expect the AudioBuffer to magically always be aligned, but to write SSE routines that handle stray unaligned values at the start + end of the block while using SSE for the bulk of their work?

A great deal of the code I put in processBlock processes all of the samples in the buffer, starting at sample index 0. I would have thought that was true for many audio plugins! However, I see what you're saying and I could re-write my own SSE routines to fix the start.

The problem is that I've got a heavy reliance on libraries which assume that the first sample is aligned and then process any remaining ones at the end.

Yes, I understand, and it certainly wouldn't do any harm to tweak AudioSampleBuffer in that way. I probably should do at some point..

But if these are 3rd party libraries, it seems even stranger to me that anyone would publish a library routine that will crash if given a non-aligned pointer. What do they expect you to do if you need to run it on a sub-section of your data?

That'd be great if you could.

Wrt the 3rd party libraries, one of them checks for alignment and uses non-SSE code if necessary. Another one uses _mm_loadu_ps for the input data. Each approach works OK but give up a of performance.

Having these aligned would be handy :)

It's a common request. I should probably just hard-code them to always use 16-byte aligned memory, since the overhead of doing so is pretty minimal.

1 Like

I've no idea if this is of any use. In the dark recesses of my mind I think I recall quadword alignment works well on certain processor (early ARMs http://stardot.org.uk/forums/viewtopic.php?t=9055&p=102086 , but possibly others. Often used by Demo coders) - both data and code (tight-loops) aligned to quad words works more efficiently AFAICR.  Another thing (half-remembered) is that some (native) malloc libraries automatically start memory blocks on the most efficient on their implementations (again, more a feeling than a memory) - [I think what I'm suggesting is #define JUCE_QUADWORD_ALIGNED option for certain imps might be handy]

Do it.  

I don't ever want to have to show anyone the embarrassing hack I've used instead :)

I’d like that too :slight_smile:

1 Like

Writing algorithms based on the a data alignment premise seems kinda fragile. The canonical way is to write a scalar prologue and epilogue to the main vectorized loop.

Doing this incurs little-to-no cost for aligned buffers, and provides compability for arbitrarily sized buffers of arbitrary alignment. Of course, a library should always provide aligned data where possible, though.

1 Like

Actually, the AudioBuffer class will already align each of its channels to 16 byte boundaries, I think.

But like Mayae says, that’s not really much use in the real world. You can’t write code that only works on 16-byte aligned data, as any practical algorithm will need to process sub-sections of data, or lengths that aren’t a multiple of 4, so you need to write preamble code anyway.

I agree that it’s sensible to write code that won’t crash if data is not aligned. However one of the libraries I use only uses an epilogue and no prologue and hence misses some opportunities to apply vectorisation if the start of the buffer is not aligned. Seeing as there is no real cost to address this, and some benefit (at least for me), then why not? :wink:

1 Like

Like I said, AFAIK the AudioBuffer class has been 16-byte aligned for a long time already.

That’s fine if you’re only using your own AudioBuffer objects and can make sure you always use an offset of 0, but that’s pretty limiting.

Maybe we need an emoticon for rhetorical questions :grin:

1 Like

Maybe I misunderstand something, but I just tried this code:

AudioBuffer<float> buff(2, 10);
auto a = buff.getReadPointer(0);
a = buff.getReadPointer(1);

and these are the pointers for the 2 channels:
0x0000 0000 0060 4088
0x0000 0000 0060 40b0

I don’t see the alignment here, but again maybe it’s something I don’t get. The second pointer points to a 16byte aligned float, but it’s just accidental because the 10 samples need 40 bytes. Also don’t see any alignment in the AudioBuffer::allocateData() function.

Could you clarify what you mean by aligned Audiobuffers?
Thanks in advance!

1 Like

Same here (on Xcode 9.2): The second assert triggers because getReadPointer(0) returns a value which is 8 aligned rather than 16 as required for NEON, SSE etc. ALMASK is 15 in this case.

const float* pSamples = buffer.getReadPointer(0);

jassert(pSamples);
jassert((reinterpret_cast<uintptr_t>(pSamples) & ALMASK) == 0);

pSamples is 0x101e1b018 - so I agree with FineCutBodies.

Since I discovered it, I use a wrapper that makes sure the buffer’s channels are aligned. Not sure why this is not a feature of the AudioBuffer class.

1 Like

I found this topic because I faced same issue (processing sample data in AudioBuffer by SSE codes).

In my case, I solved this issue by using AudioBlock class in juce_dsp module.
AudioBlock class can deal wtih byte-aligned HeapBlock.
And, AudioBuffer class can refer to AudioBlock’s buffer array.

I show snippet code. In below, byte-aligned HeapBlock is wrapped by AudioBlock and AudioBuffer refer to AudioBlock’s buffer array.

void xxxAudioProcessor::processBlock (AudioBuffer<float>& buffer, MidiBuffer& midiMessages)
{
    HeapBlock<char> alignedBlock;
    AudioBuffer<float> tmpBuffer;

    // Instantiate AudioBlock and it refer to 16-byte aligned HeapBlock
    dsp::AudioBlock<float> alignedAudioBlock(alignedBlock, numChannels, numSamples, 16);

    Array<float*> alignedChannels;
    for (int ch = 0; ch < alignedAudioBlock.getNumChannels(); ch++)
        alignedChannels.add(alignedAudioBlock.getChannelPointer(ch));

    tmpBuffer.setDataToReferTo(alignedChannels.data(), alignedChannels.size(), alignedAudioBlock.getNumSamples());

    for (int ch = 0; ch < buffer.getNumChannels(); ch++)
        tmpBuffer.copyFrom(ch, 0, buffer.getReadPointer(ch), buffer.getNumSamples());

    // Do processing AudioBuffer's object with some SSE codes.
    PROCESS_BY_SSE_CODE(tmpBuffer);

    for (int ch = 0; ch < buffer.getNumChannels(); ch++)
        buffer.copyFrom(ch, 0, tmpBuffer.getReadPointer(ch), tmpBuffer.getNumSamples());
}

Just had to implement my own workaround with setDataToReferTo() to have buffers with aligned memory. It works but is really ugly and not safe (AudioSampleBuffer can reallocate).

@jules Please consider adding an option to AudioSampleBuffer to allocate arbitrarily aligned memory. It is simple to add, backwards compatible, and makes our lives much more easier under certain circumstances.

1 Like