Callback Performance and Stability. Direction?

I’m looking to improve the stability and performance of my audio callback. I’ve been doing some stress testing and I have 2 test configurations both with 30 tracks. The first is just straight file playback, the second includes 3 audio units per track.

I had been using an AudioProcessorGraph but for the sake of simplicity/testing I pulled together the essential processors for each track and placed them into respective “EngineTrack” objects.

[]The first processor on each track handles file reading. Each uses an AudioTransportSource and I’ve set the readAhead to 32768.[/]
[]There are 3 audio files used. There are 10 tracks per audio file.[/]
[]The audio units are plugins from MeldaProduction: MTuner, MEqualizer and MCompressor.[/]
[]The device is running 64 samples at 44100.[/]

The callback:

[code] for (int i = 0; i < numOutputChannels; ++i)
if (outputChannelData[i] != 0)
zeromem (outputChannelData[i], sizeof (float) * numSamples);

if (isPlaying) {
    for (int i = 0; i < enTrackSize; i++) {
        EngineTrack * t = enTracks.getUnchecked(i);
        for (int j = 0; j < t->plugSize; j++) {
            AudioPluginInstance * inst = t->plugs.getUnchecked(j);
            inst->processBlock(t->buff, t->midiBuff);
        this->mixBuffs(t->buff, outputChannelData, numOutputChannels, numSamples);
    this->trackPosition += numSamples * playbackSpeed;



With just file playback, performance seems pretty good. Between 10 and 20% cpu usage (as per getCpuUsage()). Glitches do occur from time to time but I’m not sure what causes them. The buffering of the files seems to be healthy throughout playback. It seems that window resizing can encourage glitches, but not all the time.

When testing with the audio units added, the performance is terrible. Cpu usage is at 80-100% and the audio output is useless. To compare, I set up Logic using the same configuration and it happily chugs away. The cpu meter on it runs hot but it doesn’t seem to affect the output in any way. Of course I can only imagine what magic they are doing. I’ve profiled both my app and Logic and most of the time is spent in the audio units, particularly the compressor appears to take up most of the time. Not surprising as there isn’t really anything happening in the callback other than reading from the file buffers.

So with that said, I’m left scratching my head a bit. The callback is really basic and most of the time is taken up in the audio units. I can only guess that the load of the audio units needs to be spead out somehow, possibly using multiple threads. I’m imagining using 1 thread per track where the callback would signal to the tracks to process the incoming data, and then it would wait for the tracks to finish. Is that a rediculous concept? Would there be any parallel benefit? I’m not sure how else to look at it, so that’s why I’m reaching out and hoping someone might have some guidance to share. I’ve been following Vinn’s thread on lock free queues and lately I’ve been reading up on real time processing topics. Specifically Ross Bencina’s blog posting on real time processing and chapter on SuperCollider’s internals. Very interesting stuff, but I’m not sure how it would apply here.

Anyway, any thoughts are greatly appreciated!


I guess it stands to figure that unless the track is being monitored with live input then it doesn’t need to be processing on demand and could be buffered instead. :shock: doh…

If you have a multicore processor, it’s likely that logic will spread the AU on several cores while your app runs all the audio processing on a single core

Thanks dinaiz, I’d guess you are right about Logic doing that. I’m running Lion on a Mid 2010 Macbook pro. It has a 2.4 ghz Intel Core 2 Duo.

I’ve changed my setup so that each track is buffered using an AbstractFifo and the results are a million times better. Well, ok maybe not a million… but much much better! I’ve been trying out numbers and I’ve set the fifo size to 32768 and I’m feeding it buffers of 512 samples. Each track is on it’s own thread with a priority of 10. (If any of this sounds like a bad idea please feel free to point me in the right direction) The audio callback is using 64 samples buffers. Cpu is running heavy at 70%, but the callback time is at a minimum (via getCpuUsage()) and the buffering appears to be keeping up. I still have a lot of tweaking and testing to do I’m sure, but it feels like I’m on the right track here. :smiley:

I recently ran some tests on ableton live (PC) with a multicore processor (4 cores) and, by watching the load of each core, I could easily deduce that the plugins run on separate cores, so yeah I guess logic works the same

For the number of threads, I think that’s a tradeoff between parallelism and multithreading overhead. I’m no expert on that topic, but I’ve found out that Tim Blechmann (the guy behind supercolider) wrote a lot of interesting stuff about that topic (I actually found the links at the end of the article you linked to in your first post :slight_smile: )

Good luck, that’s definitely non-trivial !

Could anyone please help me with the implementation of an audio fifo buffer that can be used for reading from an AudioFormatReaderSource?

I whish I could but honestly, I think my knowledge of multitreading is too low to be of any help …

Have you considered using the BufferingAudioSource instead of AudioFormatReaderSource?

I’ll try that! I completely forgot that class!

…and we have a winner! It works like a charm…

Anyway, I’m experiencing some glitches during the files playback.
I set the BufferingAudioSource’s numberOfSamplesToBuffer parameter to 44100 and it works fine with long audio files (length > 44100), but the glitches occour when reading short files (e.g. a drumkick).

Just guessing, but this could be a problem with the way your using the BufferingAudioSource. iirc it returns silence if the ::getNextAudioBlock method is called before the internal thread has pre-buffered enough samples. It does not try to wait until the input is ready to avoid blocking the audio thread. If you or the host calls prepareToPlay right before the first block, you’re likely to miss the first samples.

For your stress test, I can provide some logging class that I recently wrote for inspecting multi-threading behaviour in Ableton Live. It’s in an experimental state, and will most likely not work in every host. But if it does, it can provide you some useful info about when each of your plugin instances is called by which thread.

Thank you steffen, but my app is the host and I’m pretty sure I’m not calling prepareToPlay just before calling getNextAudioBlock on the BufferingAudioSource.
I’m using the graph too, but in my plugins (the audio file players) I have an AudioTransportSource to which I attach a custom PositionableAudioSource, that contains an array of BufferingAudioSources.
If I give, let’s say, 44100 samples to the AudioTransportSource as the readAheadBufferSize parameter it works fine, no glitches at all, but I miss the first part of the audio files.

ah sorry, I didn’t follow the whole thread, but the bit about the dropouts in the BufferingAudioSource caught my attention, because it sounds a lot like some problems I ran into while using that class.

Are you changing the nextReadPosition of the BufferingAudioSource right before the glitches occur? the SharedBufferingAudioSourceThread will be notified immediately, but if it is pre-buffering for a lot of other BufferingAudioSource instances, it might take some time before the section at the new read position is read into memory.

If you are using AudioProcessors within your host app, the class I mentioned might still work and give you some insight which threads are calling your audio file players, and how much time is spend in each instance’s callback function. I’ll post it in another thread.