Q about best place in signal chain for a couple functions

Hi everyone! I’m working on building a pitch shifting plugin that uses an ESOLA shifting algorithm, which is implemented in a custom rewrite of the Synthesiser class to allow polyphony. Within each individual voice, the esola algorithm itself requires the detection of the input pitch (fundamental frequency) and the detection of the signal’s epoch locations.

I already have my detectPitch() and extractEpochSampleIndices() functions written – my question is this – would it be wiser to put these two functions in the Synthesiser class’s renderVoices(), or put them at the top level within my processBlock() before calling the Synthesiser’s renderNextBlock() ?

to clarify…

Option 1:

void AudioProcessor::processBlock (juce::AudioBuffer<float>& buffer, juce::MidiBuffer& midiMessages)
{
    synth.renderNextBlock(buffer, 0, buffer.getNumSamples(), wetBuffer, inputMidi);
}

and then inside renderNextBlock(), the Synthesiser class breaks the input buffer down into smaller chunks between midiMessages and calls this function on each smaller chunk:

void Synthesiser::renderVoices (AudioBuffer<float>& inputAudio, const int startSample, const int numSamples, AudioBuffer<float>& outputBuffer)
{
	epochIndices = extractEpochSampleIndices(inputAudio, startSample, numSamples, sampleRate); 
	currentInputFreq = findPitch(inputAudio, startSample, numSamples, sampleRate); 
	
	for (auto* voice : voices)
	{
		voice->updateInputFreq(currentInputFreq);
		voice->renderNextBlock (inputAudio, startSample, numSamples, outputBuffer, epochIndices);
	}
}

Option 2:

void AudioProcessor::processBlock (juce::AudioBuffer<float>& buffer, juce::MidiBuffer& midiMessages)
{
   // these two would be custom functions that take the pitch & epoch data and propogate to each synth voice:
    synth.updateInputPitch(findPitch(inputAudio, startSample, numSamples, sampleRate));
    synth.updateEpochs(extractEpochSampleIndices(inputAudio, startSample, numSamples, sampleRate));

    synth.renderNextBlock(buffer, 0, buffer.getNumSamples(), wetBuffer, inputMidi);
}

and then the renderVoices would simply be:

void Synthesiser::renderVoices (AudioBuffer<float>& inputAudio, const int startSample, const int numSamples, AudioBuffer<float>& outputBuffer)
{
	for (auto* voice : voices)
		voice->renderNextBlock (inputAudio, startSample, numSamples, outputBuffer, epochIndices);
}

Is there any practical difference between these two approaches, in terms of performance or stability?

The only potential concern I have is, because the Synthesiser class in its renderNextBlock() breaks the input buffer into small chunks in between midi messages, if I have my detechPitch() and findEpochs() in the renderVoices(), they may get passed too short of a chunk to be able to correctly detect pitch/epochs… But I could just use setMinimumRenderingSubdivisionSize() to set a large enough minimum block size…

I was thinking that maybe putting these functions in renderVoices() would allow for a greater level of synchronicity, but perhaps that’s not the case. There is also the possibility that the input pitch varies over the course of the input buffer, so detecting pitch in smaller chunks may be desirable…

Sorry for the long post. Thanks for reading!

Does the synth class typically take audio input?

Seems a bit confusing that a synth would take an input block?

No, the synth class normally doesn’t.

My particular use case is that I’m building a vocal harmonizer instrument that will shift input audio to desired pitches, and I would like polyphony, so I’m basically doing a custom rewrite of the Synthesiser class.

I tried to simplify my code and explanation a bit for the forums because I know this is kind of an exotic use case ¯_(ツ)_/¯

1 Like

Just taking a look at the paper it looks super interesting I’d love hear the results. I’ve written class OLAP algos and am working on a Phase Vocoder in the free time, I’d love to hear how this sounds compared to them.

Here’s a couple notes for this short glance:

  1. Your analysis is at the mercy of a lot of factors running it this way, block size, and midi size. If you’re not streaming the audio into the synth in real time and it’s a true instrument not an effect, you’d have a better time doing analysis on the entire audio file when it’s loaded into the software rather than attempting to cope the hurdles of the variable block sizes

  2. The JUCE synth should be more than equipped to handle polyphony out of the box so I’m a bit confused on that

  3. You can hold your sample and give each voice a shared ptr to the sample so it can access the data it needs as it needs it.

I would have something more like:

sampleLoaded() {
// Perform analysis

for (auto voice : voices) {
    voice->setAnalysisData(xxx);
    voice->setSample(shared_ptr sample);
}
}

If you’re streaming the audio input into the synth, doesn’t that mean all the voices will not have their own ability to “restart” the sample differently to another voice?

So if you swapped it to this method you could have polyphony, pitch shifting, etc, and individual trigger control etc, much more like a classic synth?

1 Like

Thanks for the response, it’s cool to hear from someone who’s done this sort of thing!

Yes, the audio will be streaming into the plugin in real-time from the host, my entire goal with this project is to create a plugin that can be used for live performance.

You’re right that the analysis is at the mercy of a lot of factors… I’m leaning toward doing the analysis in the top-level processBlock as opposed to the renderVoices function…

Yes, the Juce Synth class can handle polyphony right out of the box. I decided to do a custom rewrite for two reasons: I have a lot of custom things I need to do that are different from the Juce class’s behavior, and also, I’m a Juce beginner and it’s teaching me a lot to pull apart the Juce code and interrogate why it’s written the way it is.

in your reply, are you using “sample” to mean “prerecorded bit of audio that’s triggered for playback”?

Because my plugin doesn’t deal with “samples” in this sense at all – it simply takes the input audio buffer from the host and performs a shifting algorithm (ESOLA in my case).

I see what you mean now.

So it’s an effects plugin but you’re using a custom version of the synth to manage your layers.

It’s a very cool idea I wish you luck with it! I don’t see any problem with the approach and I can see how there would be a lot of benefits to use the midi & voice handling features of the synthesizer to do what you want.

In any case with regards to your original question – there’s really no difference in those two approaches, in most cases with DSP it comes down to two factors for saving CPU for processes which can’t be removed.

Do I do this once per block? or once per sample? In both of your cases you’re processing it once per block – so there’s really no difference either way.

Good luck with it!

1 Like

OK great, thank you for the help :slightly_smiling_face: