Realtime Rubberband Pitchshifter - Working Example

Hi All,

I’ve been thinking about having a go with Rubberband for a while to use pitch shifting in a plugin. I’ve seen various different posts on the forum say this cant be done because of the multiple hundred millisecond delay etc so I thought I would give it a go anyway and post some functional code here as I haven’t seen any JUCE examples. In return there are a couple of issues that I hope someone with a little more knowledge of rubber band could help me with as I have only spent a day with the thing.

My Aim
I wanted to create a pitch shifter with the possibility of using fast modulation.

My Implementation
Once I had got rubber band up and running and using the process() and retrieve() functions I noticed that the latency was incredibly inconsistent. Just because you feed the required number of samples into the input doesn’t mean you are going to get as many (or as few) as that same number of samples out the other end. This led to rubber bands internal buffers filling up with lower pitches and emptying themselves faster with higher pitches. This extreme latency issue is still a problem if you use ring buffers on their own as overtime I found the latency just kept increasing gradually.

I looked at their example of a linux plugin to see how they deal with this and the solution is to modulate the stretch ratio in a way that is dependant on the number of samples available in the output ring buffer. After a little bit of trial and error I have come up with 2 “modes”, a low latency one which doesn’t mind the buffers being so empty and also another mode that is suited to smooth modulation (the output will empty rapidly when you change pitch so the extra latency is to give a bit more leeway). Here comes the first problem… Although the smooth mode prevents any tearing due to lack of output samples and is fine on many sources, with some sources (synths with loads of harmonics) there is a vinyl like crackle when changing the pitch (more prominent when it is gradual) which I think is coming from inside rubber band. Does anyone have any suggestions of different settings to give rubber band that may eliminate this (possibly) internal crunchiness.

The other issue is the potentially random amount of latency. Just wanted to check I hadn’t missed something that would make this a little more constant? I’d improve it by putting a variable delay line for the dry signal and monitor the latency of the wet so that they are somewhat hitting at the same time when dryCompensationDelay is enabled - its not very good at the moment for a percussive dry/wet mix but works for other sources!

To get it to work I also created a simple RingBuffer class which would give me the buffer pointers I needed etc… This could be improved to make it a little more “safe” but it seems to work.

My implementation seems to have pretty good latency but I haven’t done extensive testing at different buffer sizes.

Anyway, here’s my code!

Pitch shifter class:

#include <JuceHeader.h>
#include "rubberband/RubberBandStretcher.h"
#include "RingBuffer.h"

class PitchShifter
{
 public:
    /** Setup the pitch shifter. By default the shifter will be setup so that the dry signal isn't delayed to be given a somewhat similar latency to the wet signal - this is not accurate when enabled! By enabling minLatency some latency can be reduced with the expense of potential tearing during modulation with a change of the pitch parameter.
     */
    PitchShifter(int numChannels, double sampleRate, int samplesPerBlock, bool dryCompensationDelay=false, bool minLatency=false)
    {
        rubberband = std::make_unique<RubberBand::RubberBandStretcher>(sampleRate, numChannels, RubberBand::RubberBandStretcher::Option::OptionProcessRealTime + RubberBand::RubberBandStretcher::Option::OptionPitchHighConsistency, 1.0, 1.0);
        //rubberband->setMaxProcessSize(samplesPerBlock);
        initLatency = (int) rubberband->getLatency();
        maxSamples = 256;

        input.initialise(numChannels, sampleRate);
        output.initialise(numChannels, sampleRate);
        
        juce::dsp::ProcessSpec spec;
        spec.maximumBlockSize = samplesPerBlock;
        spec.numChannels = numChannels;
        spec.sampleRate = sampleRate;
        if (dryCompensationDelay)
        {
            dryWet = std::make_unique<juce::dsp::DryWetMixer<float>>(samplesPerBlock * 3.0 + initLatency);
            dryWet->prepare(spec);
            dryWet->setWetLatency(samplesPerBlock * ((minLatency) ? 2.0 : 3.0) + initLatency);
        } else
        {
            dryWet = std::make_unique<juce::dsp::DryWetMixer<float>>();
            dryWet->prepare(spec);
        }
        
        timeSmoothing.reset(sampleRate, 0.05);
        mixSmoothing.reset(sampleRate, 0.3);
        pitchSmoothing.reset(sampleRate, 0.1);
        
        if (minLatency)
        {
            smallestAcceptableSize = maxSamples * 1.0;
            largestAcceptableSize = maxSamples * 3.0;
        } else
        {
            smallestAcceptableSize = maxSamples * 2.0;
            largestAcceptableSize = maxSamples * 4.0;
        }
    }
    
    ~PitchShifter()
    {
        
    }
    
    /** Pitch shift a juce::AudioBuffer<float>
     */
    void processBuffer (juce::AudioBuffer<float>& buffer)
    {
        dryWet->pushDrySamples(buffer);
        
        pitchSmoothing.setTargetValue(powf(2.0, pitchParam / 12));          // Convert semitone value into pitch scale value.
        auto newPitch = pitchSmoothing.skip(buffer.getNumSamples());
        if (oldPitch != newPitch)
        {
            rubberband->setPitchScale(newPitch);
            oldPitch = newPitch;
        }

        for (int sample = 0; sample < buffer.getNumSamples(); sample++) {   // Loop to push samples to input buffer.
            for (int channel = 0; channel < buffer.getNumChannels(); channel++) {
                input.pushSample(buffer.getSample(channel, sample), channel);
                buffer.setSample(channel, sample, 0.0);
                
                if (channel == buffer.getNumChannels() - 1) {
                    auto reqSamples = rubberband->getSamplesRequired();
                    
                    if (reqSamples <= input.getAvailableSamples(0)) {       // Check to trigger rubberband to process when full enough.
                        auto readSpace = output.getAvailableSamples(0);
                        
                        if (readSpace < smallestAcceptableSize) {           // Compress or stretch time when output ring buffer is too full or empty.
                            timeSmoothing.setTargetValue(1.1);
                        } else if (readSpace > largestAcceptableSize) {
                            timeSmoothing.setTargetValue(0.9);
                        } else {
                            timeSmoothing.setTargetValue(1.0);
                        }
                        rubberband->setTimeRatio(timeSmoothing.skip((int) reqSamples));
                        rubberband->process(input.readPointerArray((int) reqSamples), reqSamples, false);   // Process stored input samples.
                    }
                }
            }
        }
        
        auto availableSamples = rubberband->available();
        
        if (availableSamples > 0) {                                         // If rubberband samples are available then copy to the output ring buffer.
            rubberband->retrieve(output.writePointerArray(), availableSamples);
            output.copyToBuffer(availableSamples);
        }
        
        auto availableOutputSamples = output.getAvailableSamples(0);        // Copy samples from output ring buffer to output buffer where available.
        for (int channel = 0; channel < buffer.getNumChannels(); channel++) {
            for (int sample = 0; sample < buffer.getNumSamples(); sample++) {
                if (output.getAvailableSamples(channel) > 0) {
                    buffer.setSample(channel, ((availableOutputSamples >= buffer.getNumSamples()) ? sample : sample + buffer.getNumSamples() - availableOutputSamples), output.popSample(channel));
                }
            }
        }
        
        if (pitchParam == 0 && mixParam != 100.0) {                         // Ensure no phasing with mix occurs when pitch is set to +/-0 semitones.
            mixSmoothing.setTargetValue(0.0);
        } else
        {
            mixSmoothing.setTargetValue(mixParam/100.0);
        }
        dryWet->setWetMixProportion(mixSmoothing.skip(buffer.getNumSamples()));
        dryWet->mixWetSamples(buffer);                                      // Mix in the dry signal.
    }
    
    /** Set the wet/dry mix as a % value.
     */
    void setMixPercentage(float newPercentage)
    {
        mixParam = newPercentage;
    }
    
    /** Set the pitch shift in semitones.
     */
    void setSemitoneShift(float newShift)
    {
        pitchParam = newShift;
    }
    
    /** Get the % value of the wet/dry mix.
     */
    float getMixPercentage()
    {
        return mixParam;
    }
    
    /** Get the pitch shift in semitones.
     */
    float getSemitoneShift()
    {
        return pitchParam;
    }
    
    /** Get the estimated latency. This is an average guess of latency with no pitch shifting but can vary by a few buffers. Changing the pitch shift can cause less or more latency.
     */
    int getLatencyEstimationInSamples()
    {
        return maxSamples * 3.0 + initLatency;
    }
 
 private:
    std::unique_ptr<RubberBand::RubberBandStretcher> rubberband;
    RingBuffer input, output;
    juce::AudioBuffer<float> inputBuffer, outputBuffer;
    int maxSamples, initLatency, bufferFail, smallestAcceptableSize, largestAcceptableSize;
    float oldPitch, pitchParam, mixParam;
    std::unique_ptr<juce::dsp::DryWetMixer<float>> dryWet;
    juce::SmoothedValue<float> timeSmoothing, mixSmoothing, pitchSmoothing;
};

Simple ring buffer class:

#include <JuceHeader.h>

class RingBuffer
{
public:
    RingBuffer(){}
    ~RingBuffer(){}
    
    void initialise(int numChannels, int numSamples)
    {
        readPos.resize(numChannels);
        writePos.resize(numChannels);
        
        for (int i = 0; i < readPos.size(); i++)
        {
            readPos[i] = 0.0;
            writePos[i] = 0.0;
        }
        
        buffer.setSize(numChannels, numSamples);
        pointerArrayBuffer.setSize(numChannels, numSamples);
    }
    
    void pushSample(float sample, int channel)
    {
        buffer.setSample(channel, writePos[channel], sample);
        
        if (++writePos[channel] >= buffer.getNumSamples()) {
            writePos[channel] = 0;
        }
    }
    
    float popSample(int channel)
    {
        auto sample = buffer.getSample(channel, readPos[channel]);
        
        if (++readPos[channel] >= buffer.getNumSamples()) {
            readPos[channel] = 0;
        }
        return sample;
    }
    
    int getAvailableSamples(int channel)
    {
        if (readPos[channel] <= writePos[channel]) {
            return writePos[channel] - readPos[channel];
        } else
        {
            return writePos[channel] + buffer.getNumSamples() - readPos[channel];
        }
    }
    
    const float** readPointerArray(int reqSamples)
    {
        for (int sample = 0; sample < reqSamples; sample++) {
            for (int channel = 0; channel < buffer.getNumChannels(); channel++) {
                pointerArrayBuffer.setSample(channel, sample, popSample(channel));
            }
        }
        return pointerArrayBuffer.getArrayOfReadPointers();
    }
    
    float** writePointerArray()
    {
        return pointerArrayBuffer.getArrayOfWritePointers();
    }
    
    void copyToBuffer(int numSamples)
    {
        for (int channel = 0; channel < buffer.getNumChannels(); channel++) {
            for (int sample = 0; sample < numSamples; sample++) {
                pushSample(pointerArrayBuffer.getSample(channel, sample), channel);
            }
        }
    }

private:
    juce::AudioBuffer<float> buffer, pointerArrayBuffer;
    std::vector<int> readPos, writePos;
    JUCE_DECLARE_NON_COPYABLE_WITH_LEAK_DETECTOR (RingBuffer)
};

Hope this can help someone! It should be good if you’re not wanting constant modulation and/or you have non harmonically rich sources. If anyone tries it out and has a solution to the slight crunchiness then please let me know!

Thanks,
David

12 Likes

Thx David ! you save my day!

Hi All, I also implemented a pitch shift plugin using juce and rubberband, here is the code: GitHub - jiemojiemo/rubberband_pitch_shift_plugin: A Pitch shifter plugin implementation using JUCE and rubberband

Hope can help you guys.

3 Likes

@jiemojiemo thanks for making a repo out of your solution (I just opened a PR to fix a build issue on Windows.) How does your solution compare to @DavidCNAntonia 's, though ?

@saintmatthieu I’m fairly sure I’ve made some changes since my original post. See below the updated code. Planning at some point soon to give rubber band v3 a try to have selectable formant. I expect the cpu use will shoot up though. This code itself is pretty bad on cpu (subjective as users seem happy with it being low cpu) when shifting up as its optimised for really low latency - it is however in a commercial product.

David

#include "RingBuffer.h"

class PitchShifter
{
 public:
    /** Setup the pitch shifter. By default the shifter will be setup so that the dry signal isn't delayed to be given a somewhat similar latency to the wet signal - this is not accurate when enabled! By enabling minLatency some latency can be reduced with the expense of potential tearing during modulation with a change of the pitch parameter.
     */
    PitchShifter(int numChannels, double sampleRate, int samplesPerBlock)
    {
        rubberband = std::make_unique<RubberBand::RubberBandStretcher>(sampleRate, numChannels, RubberBand::RubberBandStretcher::Option::OptionProcessRealTime + RubberBand::RubberBandStretcher::Option::OptionPitchHighConsistency + RubberBand::RubberBandStretcher::Option::OptionTransientsSmooth + RubberBand::RubberBandStretcher::Option::OptionPhaseIndependent + RubberBand::RubberBandStretcher::Option::OptionFormantPreserved + RubberBand::RubberBandStretcher::Option::OptionChannelsTogether + RubberBand::RubberBandStretcher::Option::OptionWindowShort + RubberBand::RubberBandStretcher::Option::OptionEngineFaster, 1.0, 1.0);
        
        initLatency = (int) rubberband->getLatency();
        maxSamples = sampleRate / 1000.0 * 4.0;
        
        input.initialise(numChannels, sampleRate);
        output.initialise(numChannels, sampleRate);
        
        for (int sample = 0; sample < rubberband->getPreferredStartPad(); ++sample) {   // Loop to push samples to input buffer.
            for (int channel = 0; channel < numChannels; channel++) {
                input.pushSample(0.0, channel);
            }
        }
        
        samplesToSkip = (int) rubberband->getStartDelay();
        
        juce::dsp::ProcessSpec spec;
        spec.maximumBlockSize = samplesPerBlock;
        spec.numChannels = numChannels;
        spec.sampleRate = sampleRate;

        timeSmoothing.reset(sampleRate, 0.05);
        mixSmoothing.reset(sampleRate, 0.1);
        pitchSmoothing.reset(sampleRate, 0.0);
        
        pitchSmoothing.setCurrentAndTargetValue(1.0);
        timeSmoothing.setCurrentAndTargetValue(1.0);
        
        smallestAcceptableSize = maxSamples * 0.5;
        largestAcceptableSize = maxSamples * 1.5;
        
        latencyInSamples = (initLatency + maxSamples);
        
        dryWet = std::make_unique<juce::dsp::DryWetMixer<float>>(latencyInSamples * 2);
        dryWet->prepare(spec);
        dryWet->setWetLatency(latencyInSamples);
        
        formantPreserving = true;
    }
    
    ~PitchShifter()
    {
        
    }
    
    void setFormantPreserving(bool shouldPreserveFormants)
    {
        if (shouldPreserveFormants != formantPreserving) {
            formantPreserving = shouldPreserveFormants;
            rubberband->setFormantOption((shouldPreserveFormants) ? RubberBand::RubberBandStretcher::Option::OptionFormantPreserved : RubberBand::RubberBandStretcher::Option::OptionFormantShifted);
        }
    }
    
    int getLatency()
    {
        return latencyInSamples;
    }
    
    /** Pitch shift a juce::AudioBuffer<float>
     */
    void processBuffer (juce::dsp::AudioBlock<float>& block)
    {
        pitchSmoothing.setTargetValue(powf(2.0, pitchParam / 12));          // Convert semitone value into pitch scale value.
            
        if (!pitchSmoothing.isSmoothing())
        {
            if (pitchSmoothing.getCurrentValue() <= 1.0)
            {
                smallestAcceptableSize = maxSamples * 0.5;
                largestAcceptableSize = maxSamples * 1.5;
            } else if (pitchSmoothing.getCurrentValue() > 1.0)
            {
                smallestAcceptableSize = maxSamples * 0.5;
                largestAcceptableSize = maxSamples * 2.5;
            }
        }
        
        dryWet->pushDrySamples(block);
        
        for (int sample = 0; sample < block.getNumSamples(); ++sample) {   // Loop to push samples to input buffer.
            for (int channel = 0; channel < block.getNumChannels(); channel++) {
                input.pushSample(block.getSample(channel, sample), channel);
                block.setSample(channel, sample, 0.0);
                
                if (channel == block.getNumChannels() - 1) {
                    reqSamples = rubberband->getSamplesRequired();
                    
                    if (reqSamples <= input.getAvailableSamples(0)) {       // Check to trigger rubberband to process when full enough.
                        readSpace = output.getAvailableSamples(0);
                        
                        if (readSpace < smallestAcceptableSize) {           // Compress or stretch time when output ring buffer is too full or empty.
                            timeSmoothing.setTargetValue(1.1);
                        } else if (readSpace > largestAcceptableSize) {
                            timeSmoothing.setTargetValue(0.9);
                        } else {
                            timeSmoothing.setTargetValue(1.0);
                        }
                        rubberband->setTimeRatio(timeSmoothing.skip((int) reqSamples));
                        newPitch = pitchSmoothing.skip((int) reqSamples);
                        if (oldPitch != newPitch)
                        {
                            rubberband->setPitchScale(newPitch);
                            oldPitch = newPitch;
                        }
                            rubberband->process(input.readPointerArray((int) reqSamples), reqSamples, false);   // Process stored input samples.
                        
                    }
                }
            }
        }
        
        auto availableSamples = rubberband->available();
        
        if (availableSamples > 0) {                                         // If rubberband samples are available then copy to the output ring buffer.
            rubberband->retrieve(output.writePointerArray(), availableSamples);
            output.copyToBuffer(availableSamples);
        }
        
        auto availableOutputSamples = output.getAvailableSamples(0) - samplesToSkip;        // Copy samples from output ring buffer to output buffer where available.
        if (samplesToSkip > 0)
        {
            int thisSkip = juce::jmin(output.getAvailableSamples(0), samplesToSkip);
            for (int sample = 0; sample < thisSkip; ++sample)
            {
                for (int channel = 0; channel < block.getNumChannels(); ++channel)
                {
                    output.popSample(channel);
                }
                samplesToSkip--;
            }
        }
        
        for (int channel = 0; channel < block.getNumChannels(); ++channel) {
            for (int sample = 0; sample < block.getNumSamples(); ++sample) {
                if (output.getAvailableSamples(channel) > 0) {
                    block.setSample(channel, (int) ((availableOutputSamples >= block.getNumSamples()) ? sample : sample + block.getNumSamples() - availableOutputSamples), output.popSample(channel));
                }
            }
        }
        
        if (pitchParam == 0 && mixParam != 100.0) {                         // Ensure no phasing with mix occurs when pitch is set to +/-0 semitones.
            mixSmoothing.setTargetValue(0.0);
        } else
        {
            mixSmoothing.setTargetValue(mixParam/100.0);
        }
        dryWet->setWetMixProportion(mixSmoothing.skip((int) block.getNumSamples()));
        dryWet->mixWetSamples(block);                                      // Mix in the dry signal.
    }
    
    /** Set the wet/dry mix as a % value.
     */
    void setMixPercentage(float newPercentage)
    {
        mixParam = newPercentage;
    }
    
    /** Set the pitch shift in semitones.
     */
    void setSemitoneShift(float newShift)
    {
        pitchParam = newShift;
        
        smallestAcceptableSize = maxSamples * 10.0;
        largestAcceptableSize = maxSamples * 20.0;
    }
    
    /** Get the % value of the wet/dry mix.
     */
    float getMixPercentage()
    {
        return mixParam;
    }
    
    /** Get the pitch shift in semitones.
     */
    float getSemitoneShift()
    {
        return pitchParam;
    }
    
    /** Get the estimated latency. This is an average guess of latency with no pitch shifting but can vary by a few buffers. Changing the pitch shift can cause less or more latency.
     */
    int getLatencyEstimationInSamples()
    {
        return maxSamples * 3.0 + initLatency;
    }
 
 private:
    std::unique_ptr<RubberBand::RubberBandStretcher> rubberband;
    RingBuffer input, output;
    int maxSamples, initLatency, bufferFail, smallestAcceptableSize, largestAcceptableSize;
    float oldPitch, pitchParam, mixParam, newPitch;
    std::unique_ptr<juce::dsp::DryWetMixer<float>> dryWet;
    juce::SmoothedValue<float> timeSmoothing, mixSmoothing, pitchSmoothing;
    bool formantPreserving;
    int latencyInSamples = 0, samplesToSkip = 0, readSpace;
    size_t reqSamples;
};

Low latency ? I’m interested, will try it out. @jiemojiemo’s solution sounds great (I think it also uses v2) but seems to introduce a latency of 150ms, too much for my use case. Haven’t looked at the code in detail, though, that might be improvable there too.

Yeah, it’s about as low as I felt I could push it really. I think it’s about 10ms ish. I’ve just realised I have 2 get latency functions in there. I’ll have to check which one is the one to use!

David

Hey @DavidCNAntonia,
I did quite some testing of your rubberband wrapper.

  • Latency : with a constant pitch shift of a 5th up, I measured less than 30ms.
  • Quality : I modulated the pitch shift parameter with a sine wave of amplitude 7 and frequency 1Hz (i.e., going up and down a fifth every second) without any artifacts. (*)
  • Computing power : 70 times real-time (**)

(*) I experimented with every possible window setting, i.e. OptionWindowStandard, OptionWindowShort and OptionWindowLong. Only the short window was crackling-free. I guess you might have tuned your integration with that window ?

(**) I modified one option, namely OptionFormantPreserved to OptionFormantShifted, which reduces processing time by a factor or more than two, for at least equivalent quality for my use case. But I guess that if you use it for vocals you’ll probably want formant preservation. In that case it would more be 30 times real time. (One-time measurement, only ball-park.)

Congratulations, that’s the most suitable pitch shifter I’ve tried so far for my overdriven guitar solo use case, and I’ve tried expensive ones … and that one’s all open-source, nice !

By the way, are you interested in making a github repo out of it ? If not I’d volunteer, of course with acknowledgements. Similarly to @jiemojiemo I would wrap it in a JUCE VST.

Hi @DavidCNAntonia and @saintmatthieu maybe this answer could help you regarding Rubberband in real time usage:

It seems to be not possible to use Rubberband in real time environment :confused:

Thanks for this link, @DEADBEEF , it’s interesting to see the breadth of interest in this library.

@DavidCNAntonia 's wrapper performs good enough to me, though :man_shrugging:

In this directory you will find Les_Petits_Poissons.mp3 and Les_Petits_Poissons_harmonized.mp3. Latency seems like a constant 16ms.

Thank you @saintmatthieu for the detailed answer.
Well, It’s strange :thinking: from the message I shared it was about the minimal latency we could get was 5000 samples.
@icebreakeraudio do you have an insight about it?

To be brief - the library was simply not designed for live input. This is what the developers told me, and where that 5000 samples number came from.
I don’t have much more detail about it, but there are a lot of factors as to how much delay there would be. The problem I had was not only the size of the delay, but the fact the library wouldn’t tell me what the delay would be. I couldn’t get it lower than 9k samples without it sounding odd.

Sorry for the long silence! Glad you found the wrapper useful. Interesting idea to make a GitHub repo, I’ll see what I can put together.

Yes, the short window was the one I tested and tuned to. It’s a while since I did it and I can’t remember if I tuned to that window for sound quality of synths or for latency; I found synths particularly hard to make sound good.

I have preserve formant toggle-able in my use case. This was something that I felt was needed for my use case.

It’s good enough for me for realtime use, I have a commercial licence and am now shipping a product with it: https://www.caelumaudio.com/CaelumAudio/?Page=Choric Best demo hearing the pitch shift do its thing is the instagram reel part way down the page. Maybe because I’m blending the signal that the latency isn’t too much an issue though - I’m reporting roughly 10-20ms latency to the host.

That being said, the plugin also installs a “low-latency” version with pitch shifting disabled specifically for live performance as this was heavily requested.