Is there a good way to do parallel processing with the dsp module? In particular, I have a bunch of dsp::IIR filters that I want to run in parallel.
look for SIMDregister
It’s not completely clear to me what you want.
Do you want your filters to be capable to handle multi channels, so you want multiple parallel filters, one for each audio channel with the same parameters? Or do you want to split the entire input signal into multiple copies of the signal, work on each of the copies in parallel and then join the result back together by e.g. summing the paths up?
I want to split the input signal into copes, send one to each filter, and then join them after.
You can use processContextNonReplacing() to use one block as input and another as output.
You’ll need to preallocate all the extra buffers you’ll need and don’t resize them in processBlock(), as this leads to allocation. Use your input block as source, then process through all your filters targeting your other blocks. At the end sum them back up and account for level.
Assuming you meant “split the input signal into copies” and meant summing by “join”, a helper class like this could work for you (untested – just quickly written for the sake of the example)
template <class ProcessorType, size_t numProcessors>
class ParallelProcessors
{
public:
void prepare (const juce::dsp::ProcessSpec& spec)
{
for (size_t i = 0; i < numTempBuffers; ++i)
tempBuffers[i] = juce::dsp::AudioBlock<float> (tempBuffersMemory[i], spec.numChannels, spec.maximumBlockSize);
for (auto& p : processors)
p.prepare (spec);
}
template <class ProcessContext>
void process (const ProcessContext& context)
{
const auto numSamples = context.getInputBlock().getNumSamples();
// The current context might hold smaller blocks than the one that we allocated.
// This is why we make temporary sub-blocks from the pre-allocated temp buffers
std::array<juce::dsp::AudioBlock<float>, numTempBuffers> processingTempBlocks;
// Let the first processors process with a non-replacing context that writes to the
// temp buffers
for (size_t i = 0; i < numTempBuffers; ++i)
{
processingTempBlocks[i] = tempBuffers[i].getSubBlock (0, numSamples);
processors[i].process (juce::dsp::ProcessContextNonReplacing (context.getInputBlock(), processingTempBlocks[i]));
}
// Let the last processor process the context passed in so that it writes to the
// desired output block
processors.back().process (context);
// Accumulate the temporary block data into the output buffer
for (auto& tempBlock : processingTempBlocks)
context.getOutputBlock() += tempBlock;
}
/** The actual processors */
std::array<ProcessorType, numProcessors> processors;
private:
static constexpr auto numTempBuffers = numProcessors - 1;
std::array<juce::HeapBlock<char>, numTempBuffers> tempBuffersMemory;
std::array<juce::dsp::AudioBlock<float>, numTempBuffers> tempBuffers;
};
It simply wraps a desired number of processors of the same type, makes them work on the same input signal and takes care of allocating and managing some temporary buffers.
Example usage could look like
// An example instance of four IIRs
ParallelProcessors<juce::dsp::IIR::Filter<float>, 4> fourParallelIIRs;
// Assigning some coefficients to each instance by accessing the processors array
for (auto& iir : fourParallelIIRs.processors)
*iir.coefficients = juce::dsp::IIR::ArrayCoefficients<float>::makeBandPass (sampleRate, frequency);
// Preparing the parallel chain
fourParallelIIRs.prepare (spec);
// Processing the parallel chain with some process context
fourParallelIIRs.process (context);
Hope that gives you a good starting point. The code is of course subject to tweaking depending on your specific needs and as I said untested, but I’ve implemented something like that a few times sucessfully
Took a minute to wrap my brain around it but it works! Thank you!