Buffer copying on processBlock not fast enough?

I’m not entirely sure if I get your question right, but for now I assume that your question is not about the buffer shared between plugin instances that the original author asked. Do I get that right @TypeWriter?

Now if your question is what’s the fastest way to copy float data in C++ then probably std::memcpy is the answer. The cppreference site on std::memcpy tells us

std::memcpy is meant to be the fastest library routine for memory-to-memory copy.

It is fast because it will usually use the widest SIMD instruction set available to the CPU you are running on instead of doing an element by element copy, so it will probably copy multiple samples with a single CPU instruction. But these are implementation details that you don’t need to care about. Now std::memcpy operates on void pointers and takes the number to bytes to copy. As this always involves a bit math and a risk to do the math wrong, a good choice when using JUCE is to use juce::FloatVectorOperations::copy which takes float pointers and the number of floats to copy and calls memcpy internally. And if we look into the juce::AudioBuffer copy member functions we see that they call juce::FloatVectorOperations::copy internally. So it’s likely that using the AudioBlock member functions will lead to the best speed possible.

One thing however that looks a bit dangerous is your example function signature

void function (juce::AudioBuffer)

If you pass a buffer by value like that it will create a copy of the buffer which will have the exact same downside as calling makeCopyOf which is allocating a new heap buffer in the copied instance. Never do this on the audio thread – otherwise the speed comparison between different sample copy strategies is the least of your performance bottlenecks :wink: Always pass buffers by reference and if you want them to be read only as const reference if you can like

void function (const juce::AudioBuffer& buffer)

And another thing to consider if you care about speed: Your code example says

// loop by channel or sample.

If you need to perform sample-wise operations always loop over channels in the outer loop and over samples in the inner loop. The reason is that the individual samples of a channel are next to each other in memory which means that after accessing the first sample in a channel the CPU will already have the next few samples in its cache memory and the upcoming memory accesses will be super fast. Doing it the other way round will create a lot of cache misses and your CPU has to wait more for memory content being loaded from the RAM into the CPU cache.


I thought I put a reference in the function in my comment…

First question would you recommend converting my projects from copying buffer with juce to something like memcpy? Or is it negligible?

What do you think about passing float pointers instead of references?

Clearing with memset(samples,0,numsamples * sizeof(float));

function (float* left, float*){};

I usually don’t iterate channels and process both at the same time. For mid/side.

Or is all of this negligible… I’m making a pretty big synth with many buffets and I didn’t know if the juce classes would be clunky because of all the extra data.

I thought I made that clear in my post. juce::FloatVectorOperations::copy uses std::memcpy internally. The juce::AudioBuffer and juce::dsp::AudioBlock copy functions use juce::FloatVectorOperations::copy internally. This means, if you just use JUCE-supplied buffer copy functions you already use std::memcpy, it’s just hidden away from you. So my advice: Just stick to the JUCE functions and you’ll likely get a very good performance. In the end, copy operations are one of the more cheap things to do in your code anyway, compared with all kinds of math operations.

Short answer: Passing raw pointers is outdated C technique, passing references is modern C++. Go with references wherever you can. Performance wise this doesn’t make any difference, a reference is equal to a pointer in the final machine code. But you preserve a lot of information like the buffer size that would get lost or would have to be passed as extra argument which is error prone.

1 Like

i can not help you all with solving this issue as i am also puzzled by the challenges with the different threads and processing sequences and stuff, but i can tell you that 1. this is possible and 2. it is reasonable. here’s why:

  1. this is possible because apparently it is done already in plugins like smart eq or pro-q3 in order to share the signals of all instances of the plugins in the whole project without setting up any sidechaining. so apparently that’s a thing and we just don’t know how it works.
  2. it is reasonable. same plugins as an example: can be helpful if you need more than 1 sidechain input for a certain feature, like in pro-q’s example the ability to compare spectrums in order to detect frequency masking. it would even be reasonable if you only want 1 sidechain input, because in some DAWs it’s kind of a pain to set that up and would be cool if the plugins just made it easier

I’m not familiar with pro eq.

I would assume that if they are passing anything it is only parameters and they are using into some sort xml to file. At an agreed upon location by all plugins.

Regardless getting sidechain, I remember was a hassle, I think it’s only the constructor, and is buss supported function. Then grabbing the read pointers in the block. I’ll have to reference my old code to see…

idk if the instances really share whole signals but it’s definitely more than parameter values. you see the spectrums of different tracks are compared in realtime to check for frequency masking. so it must be at least some sort of almost realtime-ish FFT representation of the signals. and it’s probably similiar with smart EQ3, where the spectral information is compared in order to come up with EQ curves that make some things be more foreground and other more background.

so idk if you guys want to keep on discussing this “automatic” sidechain stuff or not, but definitely keep in mind it’s possible to some extend

Maybe it has something to do with multi producer buffers in this…

I’m not sure if this thread is really helpful to anyone anymore, because I see two completely separated issues discussed here:

  • “What’s the fastest way of copying audio buffers?” – A rather low level optimisation question, but definitively something that you should be aware of when writing high performance dsp code
  • “How to send audio from plugin instance A to B in real-time without using the hosts side chain facility” – A super specific question about creating a data channel that is not intended to be created in all plugin APIs out there. There are a lot of challenges involved here, but trust me, finding the most efficient buffer copy strategy is really not one of the main problems that you’ll face here.

One final statement on this from my side: I am part of the team that built the smart:EQ 3. For obvious reasons I won’t go into detail here on how exactly we made it work :wink: But as said above inter-plugin-communication is by no ways intended to be made possible by the plugin formats. So if you want something like that to work fluently and real-time safe you need to create some management facility that runs in parallel to the host and makes sure that

  • All plugin instances that belong to a certain session are identified and known
  • Data is sent from the right source plugin to the right destination plugin
  • Timestamps of the exchanged data match
  • Multi-threaded rendering and out-of-process plugin hosting is no problem for you

This is really no trivial task, you should have some deeper insight about the whole technology stack that you are working with if you want to create something like that on your own. No offence, but if someone can’t answer the question about the fastest way to copy float buffers no their own, they are probably not experienced enough yet to set up a reliable inter-plugin-communication implementation and I’d recommend to start with slightly less complicated projects to build up their skills.


it’s still valuable that you wrote down this list for everyone, even if OP shouldn’t use that approach, because i do see this topic pop up from time to time. also to me personally it is an honor to actually gain a bit of insight in one of the plugins that fascinate me most. now everyone who really needs to implement this monstrousity of a sidechain alternative can come back here to get a starting point, which is cool