Help needed with reverberation problem

I am currently dabbling with an idea for new DAW mixing process, and want to test-drive this idea by means of a proof-of-concept implementation. I am usually quite capable of doing so, but I do have one slight problem of understanding and I would welcome anyone who could offer me some help on the described problem (further down the page).

Before I get into the actual problem, I think it is helpful to understand what exactly I am trying to accomplish, and why. So let’s start with the why.

If you look at the state of the art workflow in DAWs, the mixing process is pretty concerned with treating individual signals, and is using reverb and spatial relationships between signals more like an afterthought. In the real world however, no (natural) sound source is heard without the room and it’s spatial orientation colouring it from the start; this is because we do live in a 3D world, and anything in that 3D world can generate sound. Another thing is that reverbs are quite unintuitive; the presented parameters oftentimes don’t directly map to spatial perception of humans, but rather concentrate on the properties of the model used to generate the reverberation. Of course it is possible to receive convincing results from tweaking by ear without watching the controls, but oftentimes this is more hit-and-miss than deliberate design of the space the sound lives in.

So, what I am trying to establish is a DAW application that right from the start takes spatial relationships between sound sources and receivers into account. This means, the first step in the workflow is not going to be putting in the kick and listening ad nauseam to it in isolation, but rather to create a room and it’s properties, listener position, and then bring in the kick, position it and go from there. I hope this little description of workflow makes the principle a bit more tangible.

So, technically, this involves designing reverbs, and by reverbs I mean convolution reverberation. The process I am going to stick to for the moment is to generate the impulse response(s) for the room in design offline, meaning pre-calculate them at some point, by means of geometry and application of the laws of optics for sound propagation in the room. This is simple, although computationally expensive, and I see no problem doing that.

Assuming for the moment we have two listeners (a stereo mic configuration in the room), and eight sources (let’s assume them to be mono), this would result in sixteen impulse responses to convolve at runtime to place everything correctly into the same listening space. This is computationally intensive, but nevertheless doable on a modern machine, and the implementation is quite straightforward.

Where it comes to me being stuck is doing the actual deconvolution of the room to generate the impulse response, and the actual convolution of signals. It’s not like I don’t understand in theory how this works, or the math behind it, but I find it quite hard to translate from the equations to an algorithm that will apply those equations. Of course I have looked anywhere I could think of for examples and solutions on how to do this, but it seems to me that every paper published and every article written on the subject either assumes you know how to write that code, or to use an existing piece of software from a third party. The usually abundantly available signal processing libraries are no great help either, since I haven’t found a single one of them that actually does deconvolution, let alone for the few that actually do convolution. They all seem to be in love with FIR and IIR implementations and content to use ready-made IRs.

What I am asking for

... is someone who can help me implement convolution and deconvolution algorithms.

What I am not asking for

... is someone to implement it for me (I wouldn’t mind, but I won’t ask for it).

... is a pointer to musicdsp.org or dspguide.com, I know those sites, thank you very much.

... are comments questioning the feasibility of the actual project.

... are lame questions on why I’m not using reverb plugin XXX instead.

... are similarly lame questions on why yet another DAW app.

It’s not a DAW (yet), and it is a completely new concept, I don’t expect it to be perfect the first time around, I don’t intend it to be feature complete enough to even call it a proper DAW for a long time. And most of all, I’m doing this for fun and curiosity.

If you have nothing to say that actually can help me out, I’d really rather concentrate on answers helping me progress with this than bullshitting.

Okay, a few days later I think I figured out what I need to do.
However, I have another question.

When doing FFT convolution, is there any advantage to convoluting the complex FFT results compared to convoluting just the real components of the result? I would believe so, but maybe anyone who has actually already implemented this can give me a pointer on whether this contributes significantly to the accuracy of the reproduction?

*edit*
Just for clarification, I am not worried about the influence of noise. Since I am creating an artificial room and my aim is to determine the impulse response of that room for a given source and receiver position in that room, it can be considered a noise-free environment, apart from single precision floating point errors introduced into the calculations. 

 

Hi lucem,

basically, I find your idea pretty cool, but also very challenging, at least as far as I imagine the final system. smiley

When doing FFT convolution, is there any advantage to convoluting the complex FFT results compared to convoluting just the real components of the result? I would believe so, but maybe anyone who has actually already implemented this can give me a pointer on whether this contributes significantly to the accuracy of the reproduction?

When doing FFT convolution, you preferrably do an real-to-complex-FFT, i.e. your input signal is purely real, but the output of the FFT is complex (you could also perform a complex-to-complex-FFT, but as your data is purely real, your imaginary parts would be zero, so you would waste much calculation time). Then you perform the complex multiplication which considers both the real and the imaginary parts and do a complex-to-real-FFT for obtaining the purely real final result. However, for achieving high speed and low latency, you need something like partitioned convolution with uniform or even better non-uniform block sizes, otherwise your latency is as long as your impulse response.

I've implemented such an FFT convolver (currently GPL-licensed), so if you want to use it for your hobby or open source project, please feel free to give it a try:

https://github.com/HiFi-LoFi/FFTConvolver

It is actually pretty fast (partitioned convolution, SSE optimized), and comes without any further dependencies, so it should be easy to add it to your project (otherwise, please let me know). However, you should know how partitioned convolution basically works for configuring the block size parameters reasonable.

You can find here also a pretty fast and cross-platform real-complex FFT implementation in C++ including some time measurements, which you might find useful if you want to implement your own convolution (uses either Ooura, FFTW3 or vDSP, depending on platform and configuration):

https://github.com/HiFi-LoFi/AudioFFT

 

If it comes to the deconvolution resp. IR creation part, then I guess that this is the even harder job. Just for an idea:

https://ccrma.stanford.edu/realsimple/imp_meas/imp_meas.pdf

Maybe going with some already available IRs or some external tools is a way to go for a very first "proof of concept" prototype...

Regards and good success with your project!

Uli

 

 

Thanks :) Of course I know your project - and I'd use it for the actual realtime part any time. However, the part I am talking about is the IR generation, or better, IRs generation, as basically every audio source in the virtual room (and every channel of that) has it's own calculated impulse response, dependent on the room geometry, source position/orientation and receiver (mic) position and orientation. For lots of channels, this will probably mean that it's not feasible to do this in real time, but I guess one could freeze the tracks one isn't working on at the time and solve things that way. I can imagine doing lots of fun stuff with this concept, but before I start thining deeply about them, I want the core of it working properly, which is room generation and correct spatial positioning.