A binaural panner by convolution


#1

Hello JUCE experts.
I am trying to implement a simple binaural panner in JUCE.
I am just using HRTF data set provided by one of my professors.
When I test my plugin with Reaper, it sounds like it is working but I hear some weird noise.
I have been trying to fix it but could not find out why.
Here is a block of my code (processBlock function).
void YoonBinauralPannerAudioProcessor::processBlock (AudioBuffer& buffer, MidiBuffer& midiMessages)
{
//ScopedNoDenormals noDenormals;
//auto totalNumInputChannels = getTotalNumInputChannels();
//auto totalNumOutputChannels = getTotalNumOutputChannels();

auto x_L = buffer.getReadPointer(0);
//auto x = x_L;
auto bufferL = buffer.getWritePointer(0);
auto bufferR = buffer.getWritePointer(1);
const auto Length = buffer.getNumSamples();

for(auto i = 0; i<Length; i++)
{
    for(auto j = 0; j<Length; j++)
    {
        bufferL[i] += hrir_l[theta][j]*x_L[(i+(Length-j))%Length];
        bufferR[i] += hrir_r[theta][j]*x_L[(i+(Length-j))%Length];
     }
}

}

So, I am just trying to convolve hrir arrays with the x_L buffer.
Is it possible that the noise is caused by the number of computations in the for loops?
Thanks in advance.


#2

Short answer: Your convolution approach makes no sense. Two obvious mistakes: HRIRs I know might be quite long. At least their length is fixed. But the Length variable you’re using in the inner loop depends on the length of the current block, which might be everything. The chance that it‘s exactly the length of your HRIR is quite limited. So using it for this loop doesn’t make any sense. Furthermore you need to reflect your sample history in a convolution, which isn’t done in your approach either.
I‘d give you the advice on using the juce dsp::convolution class instead and maybe first read some literature on real time convolution.


#3

Thanks for the advice. Actually, I just learned JUCE so there are lots of things that I might get wrong about it. So, I first printed out the length of the buffers and found out that it was equal to 512. Do you mean that it is not consistent when I test with an audio plugin? And that’s why I generated a text file that contains 512 samples of the data set as 2D arrays (hrir_l and hrir_r arrays above).


#4

The length of the buffer is determined by the audio driver and/or the host application and can be pretty much anything. 1 sample, 3 samples, 256 samples, 441 samples, 20007 samples etc…You saw the number 512 just by chance in your particular audio device and host configuration. Your code needs to be able to deal with any buffer size. (The buffer size can in some hosts also vary between calls to processBlock.)


#5

I think you are having several separate problems here. First one: Understanding of what happens in processBlock

So, why did you use const auto Length = buffer.getNumSamples(); and not const int Length = 512; if you are sure that it will always return 512 and your code relies on that assumption? The answer is, that buffer.getNumSamples() obviously can return any number of samples. And your code should be designed in such a way, that it can handle any number of samples. Don’t expect anything, just be prepared for every value. Now if your HRIRs have a length of 512 samples and you expect that buffer.getNumSamples() returns 512 in every situation, your code is likely to break in the first moment when it returns a value greater than 512 and will produce an unwanted result if it is smaller. So iterating over Length in the inner loop doesn’t make any sense, while the outer loop is fine, as it just handles the number of samples passed to it.

Second: Understanding how to do real-time convolution.
If you convolve an input signal with the length of 10 samples with a impulse response of the length 512, the resulting output will be 10 + 512 - 1 = 521 samples long. Now think of a signal with the length of 20 samples, convolved with the same impulse response. This will result in 20 + 512 - 1 = 531 output samples. Now what block-based audio processing is doing is to split data up into blocks. Just think you want to compute your 20 input samples in two blocks of 10 samples but still want to expect the correct number of output samples. Therefore you will need to compute all 521 output samples of the first block, write the first 10 into your output buffer and store the remaining 511 samples you generated for later use. Now compute the second block and add the remaining 511 samples from the previous block to the first 511 samples of the new 521 samples result. Then go on and return another 10 samples and store the rest for later use. Now you returned two times 10 samples and still have 511 in memory, which is 10 + 10 + 511 = 531 and therefore matches the theoretical result for a 20 sample input signal. Of course this can be done with a variable-sized input buffer. If it gives you 500 samples in the next call, just return 500 samples and store 21 for the next block and so on… But not storing any data will result in anything but not the expected convolution-result.

Third: Proper way of loading a HRIR.
Embedding the HRIR arrays in a static header file is a well-working way for doing a quick test. However in a real-world application you would probably like to load a variable-sized HRIR from a .wav file, read this file(s) at startup and then work with the content as you just did. This makes your application flexible enough to work with any HRIR.

Fourth: Length of your HRIRs.

Where do you get your HRIRs from? Did you measure them by yourself or did you use HRIRs from an existing database? Because normally you should chose the length of your HRIR based on the original data’s length and not just choose to use 512 samples just because processBlock seems to always use that number of samples. That said, I know of significantly longer AND shorter HRIRs, depending on the measurement setup. So before artificially making your original data longer by zero-padding (–> waste of resources) or even worse just cutting the end of your original data, you should just stick to the number of samples supplied by your database.

Fifth: Time-Domain-Convolution.
Doing such a convolution in the time-domain is possible but in most real-world-applications you would rather compute it in the frequency domain and therefore use HRTFs instead of HRIRs. I would advise you to take a look at that topic. And even if you want to stick to Time-Domain-Convolution you should have a read on efficient ways to implement them with ring-buffers and other tricks. Or just stick to the JUCE-classes such as dsp::FIR (Time-Domain) or dsp::Convolution (Frequency-Domain).

If you need any deeper information on that topic, I did my bachelor thesis on real-time binaural synthesis. This publication helped me a lot to get started: Partitioned convolution algorithms for real-time auralization. And if you by chance understand german, just drop me a DM and I can send you some extracts of my bachelor thesis.


#6

Thank you so much for your detailed explanation.
I see what I am doing wrong. As you said it actually does not make any sense.
Actually, I implemented a binaural panner on a DSP board once and it is sample-based.
And I used the following code for convolution part. In a sample-based case it works.
17%20PM
But my code above does not even follow this method lol. Let me just spend more time on this. If I get stuck I will leave a comment again. Thank you so much.