Alignment of AudioSampleBuffers

andrewj · July 27, 2014, 8:35am

Hi all & Jules in particular :)

I found recently that I had some SSE code that assumed 16-byte alignment without checking for it. Unfortunately, PluginHost doesn't enforce alignment so sometimes my code would crash and sometimes it wouldn't.

To test for 16-byte alignment I'm doing something like this:

isAligned = (reinterpret_cast<uintptr_t>(buffer.getReadPointer (ch))& 0xF) == 0

After a bit of investigation I've found that by using DirectSound and trying different buffer sizes I can usually find a case where the audio data isn't aligned. Note that the allocation is kind of random, so the same buffer size might sometimes result in aligned data.

The attached plugin code can be used to demonstrate this. It also shows how to allocate data using _mm_malloc and then use AudioSampleBuffer::dataReferTo() so we end up with an aligned AudioSampleBuffer. It's important to never call AudioSampleBuffer::setSize() as this will reallocate the memory using malloc. As you can see, that's all a bit kludgy.

Jules - can you think of a neat way to add aligned allocation as an option for AudioSampleBuffer? Seems to me it might be a bit of work to make it cater for different alignments (16 for SSE, 32 for AVX) and also make it safely portable.

-Andrew

PS some useful links:

https://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_allocate_free_aligned_mem.htm
http://msdn.microsoft.com/en-us/library/83ythb65.aspx

jules · July 27, 2014, 10:06am

I can't quite see the usefulness of making the AudioSampleBuffer align the start of the block that it allocates.. Even if it did so, surely very little code is written that will only take a pointer to the buffer's data at sample index 0, and will always process a number of samples that's a multiple of 4..?

In almost all the places where I use buffers, I seem to do things like working on a sub-section of the buffer, starting from an arbitrary sample index, so even if the overall buffer is aligned, the sample you're starting from won't be! So surely the correct thing to do here is not to expect the AudioBuffer to magically always be aligned, but to write SSE routines that handle stray unaligned values at the start + end of the block while using SSE for the bulk of their work?

andrewj · July 27, 2014, 11:11am

A great deal of the code I put in processBlock processes all of the samples in the buffer, starting at sample index 0. I would have thought that was true for many audio plugins! However, I see what you're saying and I could re-write my own SSE routines to fix the start.

The problem is that I've got a heavy reliance on libraries which assume that the first sample is aligned and then process any remaining ones at the end.

jules · July 27, 2014, 11:29am

Yes, I understand, and it certainly wouldn't do any harm to tweak AudioSampleBuffer in that way. I probably should do at some point..

But if these are 3rd party libraries, it seems even stranger to me that anyone would publish a library routine that will crash if given a non-aligned pointer. What do they expect you to do if you need to run it on a sub-section of your data?

andrewj · July 27, 2014, 11:48am

That'd be great if you could.

Wrt the 3rd party libraries, one of them checks for alignment and uses non-SSE code if necessary. Another one uses _mm_loadu_ps for the input data. Each approach works OK but give up a of performance.

jimc · February 12, 2015, 6:14pm

Having these aligned would be handy :)

jules · February 13, 2015, 9:52am

It's a common request. I should probably just hard-code them to always use 16-byte aligned memory, since the overhead of doing so is pretty minimal.

ShareCommunity · March 17, 2015, 5:54pm

I've no idea if this is of any use. In the dark recesses of my mind I think I recall quadword alignment works well on certain processor (early ARMs http://stardot.org.uk/forums/viewtopic.php?t=9055&p=102086 , but possibly others. Often used by Demo coders) - both data and code (tight-loops) aligned to quad words works more efficiently AFAICR. Another thing (half-remembered) is that some (native) malloc libraries automatically start memory blocks on the most efficient on their implementations (again, more a feeling than a memory) - [I think what I'm suggesting is #define JUCE_QUADWORD_ALIGNED option for certain imps might be handy]

jimc · March 18, 2015, 5:48pm

Do it.

I don't ever want to have to show anyone the embarrassing hack I've used instead :)

lalala · May 30, 2016, 4:46pm

I’d like that too

Mayae · May 30, 2016, 10:47pm

Writing algorithms based on the a data alignment premise seems kinda fragile. The canonical way is to write a scalar prologue and epilogue to the main vectorized loop.

Doing this incurs little-to-no cost for aligned buffers, and provides compability for arbitrarily sized buffers of arbitrary alignment. Of course, a library should always provide aligned data where possible, though.

jules · May 31, 2016, 8:22am

Actually, the AudioBuffer class will already align each of its channels to 16 byte boundaries, I think.

But like Mayae says, that’s not really much use in the real world. You can’t write code that only works on 16-byte aligned data, as any practical algorithm will need to process sub-sections of data, or lengths that aren’t a multiple of 4, so you need to write preamble code anyway.

andrewj · May 31, 2016, 9:28am

I agree that it’s sensible to write code that won’t crash if data is not aligned. However one of the libraries I use only uses an epilogue and no prologue and hence misses some opportunities to apply vectorisation if the start of the buffer is not aligned. Seeing as there is no real cost to address this, and some benefit (at least for me), then why not?

jules · May 31, 2016, 9:41am

Like I said, AFAIK the AudioBuffer class has been 16-byte aligned for a long time already.

That’s fine if you’re only using your own AudioBuffer objects and can make sure you always use an offset of 0, but that’s pretty limiting.

andrewj · May 31, 2016, 10:00am

Maybe we need an emoticon for rhetorical questions

finecutbodies · August 30, 2018, 3:34am

Maybe I misunderstand something, but I just tried this code:

AudioBuffer<float> buff(2, 10);
auto a = buff.getReadPointer(0);
a = buff.getReadPointer(1);

and these are the pointers for the 2 channels:
0x0000 0000 0060 4088
0x0000 0000 0060 40b0

I don’t see the alignment here, but again maybe it’s something I don’t get. The second pointer points to a 16byte aligned float, but it’s just accidental because the 10 samples need 40 bytes. Also don’t see any alignment in the AudioBuffer::allocateData() function.

Could you clarify what you mean by aligned Audiobuffers?
Thanks in advance!

SNG · May 1, 2019, 5:39pm

Same here (on Xcode 9.2): The second assert triggers because getReadPointer(0) returns a value which is 8 aligned rather than 16 as required for NEON, SSE etc. ALMASK is 15 in this case.

const float* pSamples = buffer.getReadPointer(0);

jassert(pSamples);
jassert((reinterpret_cast<uintptr_t>(pSamples) & ALMASK) == 0);

pSamples is 0x101e1b018 - so I agree with FineCutBodies.

finecutbodies · May 1, 2019, 7:42pm

Since I discovered it, I use a wrapper that makes sure the buffer’s channels are aligned. Not sure why this is not a feature of the AudioBuffer class.

COx2 · November 12, 2019, 8:08am

I found this topic because I faced same issue (processing sample data in AudioBuffer by SSE codes).

In my case, I solved this issue by using AudioBlock class in juce_dsp module.
AudioBlock class can deal wtih byte-aligned HeapBlock.
And, AudioBuffer class can refer to AudioBlock’s buffer array.

I show snippet code. In below, byte-aligned HeapBlock is wrapped by AudioBlock and AudioBuffer refer to AudioBlock’s buffer array.

void xxxAudioProcessor::processBlock (AudioBuffer<float>& buffer, MidiBuffer& midiMessages)
{
    HeapBlock<char> alignedBlock;
    AudioBuffer<float> tmpBuffer;

    // Instantiate AudioBlock and it refer to 16-byte aligned HeapBlock
    dsp::AudioBlock<float> alignedAudioBlock(alignedBlock, numChannels, numSamples, 16);

    Array<float*> alignedChannels;
    for (int ch = 0; ch < alignedAudioBlock.getNumChannels(); ch++)
        alignedChannels.add(alignedAudioBlock.getChannelPointer(ch));

    tmpBuffer.setDataToReferTo(alignedChannels.data(), alignedChannels.size(), alignedAudioBlock.getNumSamples());

    for (int ch = 0; ch < buffer.getNumChannels(); ch++)
        tmpBuffer.copyFrom(ch, 0, buffer.getReadPointer(ch), buffer.getNumSamples());

    // Do processing AudioBuffer's object with some SSE codes.
    PROCESS_BY_SSE_CODE(tmpBuffer);

    for (int ch = 0; ch < buffer.getNumChannels(); ch++)
        buffer.copyFrom(ch, 0, tmpBuffer.getReadPointer(ch), tmpBuffer.getNumSamples());
}

sebk · November 12, 2019, 4:47pm

Just had to implement my own workaround with setDataToReferTo() to have buffers with aligned memory. It works but is really ugly and not safe (AudioSampleBuffer can reallocate).

@jules Please consider adding an option to AudioSampleBuffer to allocate arbitrarily aligned memory. It is simple to add, backwards compatible, and makes our lives much more easier under certain circumstances.

Topic		Replies	Views
What's the best way to use aligned buffers? Audio Plugins	1	1120	February 15, 2017
AudioSampleBuffer ideas General JUCE discussion	13	1972	August 30, 2014
AudioSampleBuffer alignement General JUCE discussion	0	428	November 19, 2016
AudioBuffer, processBlock, alignment and SIMD General JUCE discussion	0	381	November 5, 2022
Are AudioBuffers channels allways SIMD aligned?	2	179	September 28, 2023

Alignment of AudioSampleBuffers

Purchase

Discover

Learn

Support

About

Events

Alignment of AudioSampleBuffers

Related Topics

Purchase

Discover

Learn

Support

About

Events