Convolution Reverb with Juce Convolution class (High CPU usage,... why)

Hi,

I would like to develop a convolution reverb and tried it with the juce::dsp::Convolution class. It works fine but the convolution-process function kills my cpu. It seems that the cpu usage increases, when I increase the source buffer. My dsp process function processed 64 samples per call. Let me show you my code…

In the Constructor I generate a testbuffer with Noise like this:

const float TEST_SIZE = 2048 * 8;
testBuffer.setSize(2, TEST_SIZE);

float *tL = testBuffer.getWritePointer(0);
float *tR = testBuffer.getWritePointer(1);

for (size_t i = 0; i < TEST_SIZE; i++)
{
	const float multiply = 1.0f - (float)i / (float)TEST_SIZE;
	tL[i] = (rand() % 10000) * 0.0001f * multiply;
	tR[i] = (rand() % 10000) * 0.0001f * multiply;
}

also in the constructor I initialize the convolution class (config::BUFFER_LENGTH = 64, the size of my dsp loop)

juce::dsp::ProcessSpec specs;
specs.maximumBlockSize = config::BUFFER_LENGTH;
specs.numChannels = 2;
specs.sampleRate = 44100;
juceConvolution.prepare(specs);
juceConvolution.copyAndLoadImpulseResponseFromBuffer(testBuffer, 44100, true, TEST_SIZE);

The convolution is now ready to work. In the dsp function I call the process of the Convolution

void Convolution::process(AudioSampleBuffer& buffer)
{
   // ....
   juce::dsp::AudioBlock<float> block(buffer);
   juce::dsp::ProcessContextReplacing<float> context(block);
   juceConvolution.process(context);
   // ....
}

It works. Sounds like a noise Reverb, but it takes more than 15% CPU in Release mode and over 80% CPU in debug. If I increase the testbuffer, the CPU usage will also increase. That’s strange because when I load a Convolution VST from another companies, I can choose wave-files as long as I want and the CPU usage doesn’t increase. It’s static. What am I doing wrong here?

I haven’t studied the Juce convolution algorithm in detail but it may be the kind where the CPU use increases as the impulse response length increases. (There are various ways to do the convolution including ones where the CPU use does not increase much as the IR length increases.) There isn’t really anything you could do in that case except to find another convolution algorithm that has different CPU use characteristics. What happens if you increase the processing buffer size from 64 samples to some higher value?

Hello !

Well, I’m not sure you can really find any convolution reverb plug-in where the CPU load doesn’t increase with the IR size !

The convolution algorithm in JUCE is state of the art uniform partitioned convolution algorithm. That means that it is not really suited for IR with a size higher than 1 second, for which not uniform partitioned algorithms are mandatory and being used by every single reverb convolution plug-in.

The reason we decided to not include such an algorithm in JUCE is because a lot of more or less troll patents cover more or less all the ways to do a proper not uniform partitioned convolution algorithm, and until we are able to see what is possible to do in an open source library without getting an army of lawyers calls, we won’t be able to include this in JUCE I’m afraid.

If you still want to use JUCE for your convolution reverb, you might want to increase the buffer size of the algorithm processing block, and handle the additional latency by yourself. Or you might want to try another convolution library. Moreover, if you are doing your tests on Windows, the CPU load might decrease if you install Intel MKL and use it with the JUCE FFT class wrapper (on macOS vDSP is automatically being used).

Otherwise, the difference in CPU load between Debug and Release configurations is normal, it’s a collateral damage from using SIMD accelerated instructions in the Convolution processing code.

3 Likes

Thanks for the answers. That’s strange. Try the convolver from FL Studio. That’s a convolution Reverb and it doesn’t matter how long the IR is you load in. Just 4 fun I created a wave which is 30 minutes long and used in in the convolver as IR. The CPU usange doesn’t increase. Only 1% usage and (almost) no latency. Hmm.
I will try to find another algorithm.

Sounds suspicious :slight_smile: How do you know it’s 1% usage ? Are you looking for the CPU load in FL Studio or with a task manager ? Since not uniform partitioned convolution involves a background thread, it is (more than) possible that this 1% is just the audio thread part (the first second) but not the other threads part (the 30 minutes minus 1 second :slight_smile: )

I get similar results with Reaper’s ReaVerb, I can load up a 60 second (sadly, the longest ReaVerb allows to load) IR and the CPU usage is under 1% during audio playback in Process Explorer (a Task Manager replacement that shows more accurate information).

edit : Hah, better yet, if I have 2 instances of the ReaVerb plugin with the 60 second IR, the CPU use still does not even rise over 1%, so I guess most of that 1% CPU use is just Reaper’s general audio processing stuff going on…The plugins show ~0.1% CPU use in Reaper’s own performance meters.

In Reaverb, I think you need to enable the ZL and LL options to do a fair comparison, otherwise the algorithm is working with a high latency which is a very good way to reduce CPU consumption for a convolution algorithm.

Anyway, I’d really like at some point to do some testing with not uniform partitioning, to get this kind of performance, but I have first to clear all the doubts with patents and licensing…

1 Like

I have the same issue on Windows, I get from 90% to 100 % cpu load in Fl Studio. But what is weird is that in Ableton I get more or less 10% of cpu load. Did anyone figured out, what is causing this issues?

These kinds of discrepencies often occur when “denormals” are enabled. It’s possible that Fl Studio does not disable them while Ableton does, and the algorithm may produce denormals which bring everything to a crawl.

1 Like

AU or VST/VST3?
Are you comparing the same format on both?

FLStudio is also well known for its strange behaviour regarding variable input audio buffer sizes. The convolution algorithm is supposed to do FFTs as rarely as possible, but that’s not possible if there are tons of small size buffer process calls in the process, which is probably the cause of this.

Apparently, the convolver included in FL Studio is working properly in this context, but I wouldn’t be surprised if convolution software from other companies exhibit the same CPU load increase behaviour.

On our side, the only way to remove that problem would be in my opinion to replace the zero latency behaviour with an optional fixed latency scheme. I’m not sure there is another solution unfortunately but I could be wrong…

1 Like

Yeah, FL Studio is using a random number generator to figure out how many samples you get in your processBlock callback, which is toxic for any fixed-size algorithms. I am using something like this in my codebase:

if(hostType == FLStudio)
{
    processWithLatencyThroughFIFO();
}
else
{
    processNormal()
}

RE patent trolls, there is one convolution library on GitHub which does use non-uniform partitioned convolution:

While I understand the caution of not wanting to deal with these kinds of people, I seriously doubt that a technique this obvious can be used for serious patent claims.

Also note that almost every convolution reverb that uses 0.1% CPU for a 17minute long impulse response is most probably lifting the heavy work to a background thread.

3 Likes

Fl Studio is Pita, I was testing a 64 bit Vst2 on Windows. Fl studio uses some horribles block sizes, some times I even get 1 sample to process. What they recommend to do for “buggy” plugins is to activate the “fixed buffer size” option and also I increased the buffer size from the audio prefferences. Although, this is not a real solution it at least works.

I have tested that library, at that time I was wondering why would someone use a variable block size in a convolution library (now I know lol). But the coding style is quite horrible and it has a GPL license, so using it is not an option.

The license is MIT (or BSD, can‘t say from just the license text lol) according to the COPYING.TXT file.

RE coding style it was one of the reasons I switched from wdl. A convolution library is somewhat a blackbox („here are samples, give me the output“), so you don‘t care too much about API consistency, but I found it really easy to integrate and the background threading just works.

I know of a few commercial projects that use this library.

1 Like

Greetings,

I can recommend this expired patent as a good way to go for non-uniformly partitioned low-latency FFT convolution:

https://patents.google.com/patent/US6625629B1

I assume Bill Gardner’s (earlier) methods are in the clear by now as well. As I recall, the earliest patents were by Lake Engineering, and those should be expired.

Cheers,
Julius

9 Likes

Hello Julius and happy to see you there! We met a few times already at AES conference when I was still a PhD candidate, and I’m a big fan of all the work you did, the documentation about DSP on your websites, and all the stuff you have been working on in general :wink:

About the convolution class in JUCE, the current one I have been working on with the JUCE team who improved a lot the threading behaviour is either a specific type of non-uniformly partitioned convolution called canonical (since JUCE 6), or uniform depending on the settings (was made available in JUCE 5). The canonical algorithm is the fastest possible for mid to high latencies defined by the plug-in user in their DAW, but can lead to dropouts occasionally if the DAW latency is very low. It’s still a big improvement over the JUCE 5 version because now the engine can deal with long IRs for example for reverb modeling.

I’m currently working on a version that uses distribution of processing to remove the dropouts issues, either using an additional processing thread (like in the LoFI-HiFi convolver for example) or one thread only. I can’t say when or whether it’s going to be available in JUCE however unfortunately since I’m just a contractor.

1 Like

Shameless promotion:
I can offer a commercial license to my zero latency non-uniform multichannel multiplatform realtime convolution, already proven to work in a number of commercially available products.

6 Likes

Hey everyone. I’m just trying out the non-uniformed partitioning in juce::dsp and it doesn’t appear to be helping the CPU usage for longer reverbs/IRs. I would have assumed that this would make it a viable optinon for implementing in a reverb? I’ve set the head size to 512 and 1024 and neither appear to be helping much. I’ve been working on a Gardner style implementation. Is it worth just cracking on with my own implementation for a reverb or is the juce one going to be best? I want a max of about 5s of reverb lets say.

Why does convolution spoil readPointer? I wanted to mix with a dry signal, took getReadPointer, and there is a wet convolution signal. I had to make a separate buffer and copy the dry signal into it before.