DSP beginner asking for assistance with PSOLA


Hi all,
I am trying to build a pitch shifting audio plugin.

I took the audio input tutorial and edited like this

    void getNextAudioBlock (const AudioSourceChannelInfo& bufferToFill) override
        auto* device = deviceManager.getCurrentAudioDevice();
        auto activeInputChannels  = device->getActiveInputChannels();
        auto activeOutputChannels = device->getActiveOutputChannels();
        auto maxInputChannels  = activeInputChannels .getHighestBit() + 1;
        auto maxOutputChannels = activeOutputChannels.getHighestBit() + 1;

        auto level = (float) levelSlider.getValue();
        /// values for the pitchShit routine 
        float shift = 2; // up an octave 
        long sr = device->getCurrentSampleRate(); // samplerate 
        long os = 32; // oversampling 
        long numSamps = bufferToFill.numSamples; // size of buffer 
        long fftSize = 2048; // fft window size 

        for (auto channel = 0; channel < maxOutputChannels; ++channel)
            if ((! activeOutputChannels[channel]) || maxInputChannels == 0)
                bufferToFill.buffer->clear (channel, bufferToFill.startSample, bufferToFill.numSamples);
                auto actualInputChannel = channel % maxInputChannels; // [1]

                if (! activeInputChannels[channel]) // [2]
                    bufferToFill.buffer->clear (channel, bufferToFill.startSample, bufferToFill.numSamples);
                else // [3]
                    auto* inBuffer = bufferToFill.buffer->getReadPointer (actualInputChannel,
                    auto* outBuffer = bufferToFill.buffer->getWritePointer (channel, bufferToFill.startSample);
                    //for (auto sample = 0; sample < bufferToFill.numSamples; ++sample)
                    //    outBuffer[sample] = inBuffer[sample] * random.nextFloat() * level;
                    float *inbuf = (float *)inBuffer;
                    float *outbuf = (float *)outBuffer;
//void smbPitchShift(float pitchShift, long numSampsToProcess, long fftFrameSize, long osamp, float sampleRate, float *indata, float *outdata)
                    // call routine with values I added 
                    smbPitchShift(shift, numSamps, fftSize, os, sr, inbuf, outbuf);

I am calling this routine :
The result seems to be transposing up the octave which it should but it’s pretty noisy :slight_smile:

Casting like this seems a bit iffy but it compiles and makes noise :slight_smile:

float *inbuf = (float *)inBuffer;
float *outbuf = (float *)outBuffer;

Do I need to interpolate between the bins or something …
Not sure where to go from here.
Perhaps I need to read a DSP book !
Thanks for your replies !


Hey Sean,

I suspect that the smbPitchShift routine is indeed meant for processing an entire buffered audio file, and not only small, continuous buffers. Try changing the sample buffer size (you can do it in the standalone program’s options), and see if the noise changes frequency.

I’ve been working on a pitch-shifting software (auto-tune) for the past 6 months myself, and believe me when I tell you I tried cheating my way around actually having to know what I’m doing, but that didn’t work out. If you are serious about this (or any other DSP project, for that matter), you need to know how the pitch shifting method you are implementing works, and code it yourself.

The smbPitchShift routine implements a frequency-domain pitch shifting method using fast fourier transforms (FFTs). If you choose to implement a similar algorithm, I can only recommend the ffts library (specifically, linkotec’s fork), which it implements high-performance FFTs with a permissive license, allowing you to use it in commercial applications.

Good luck on the journey that lies ahead of you, pitch shifting is indeed a complex (and equally exciting) endeavour!


Thanks CrushedPixel.
I was thinking the same as you on my morning work but was also thinking why would that routine have a buffer in it if it was meant to process an entire file.

Thankfully I started another thread about threading which gave me some hints …

I made this small change

        //for (auto channel = 0; channel < maxOutputChannels; ++channel)
        for (auto channel = 0; channel < 1; ++channel)

The noise has gone and it seems to work with a bit of latency which I would expect.

It doesn’t sound amazing with polyphonic input but at least I made some progress!

I have a general understanding of what the algorithm is doing. It’s passing a kernel through the buffer and using it to detect periodicity in the waveform. Not enough to build the ultimate product but I will keep battling.
The version I built using PYO sounds good enough that I want to continue!
Perhaps when I finish my computer science degree in a year I can consider some further post-grad DSP study of some sort …


Cuda phase vocoder

I came to the conclusion long ago that smbPitchShift was a misleading listing, almost designed to throw people off the scent. Because the absolute best way to pitch shift is to time shift THEN change the playback rate. I spent years researching my own algorithm. Yes, I said YEARS! Good luck! :smiley:


Well yours does pretty much what I intended to build ( without the ADSR and sustain pedal ) . Sounds great and the latency is good !!
I might use my time more wisely and solve another problem !!!


PSOLA was designed for working in the time-domain. It’s mainly for voices and certainly only for single notes. The main aspect of it is the pitch detection part, of which I haven’t looked at in depth. I believe people have used several pitch detection routines at once and discard the result that changes rapidly, for example, one technique may see sudden octave changes which is unnatural for a human. I’m sure there’s a few caveats like that, but it’s all part of the fun :grin:
Getting in deep with pitch shifting opens up a world of psycho-acoustics, where mathematics can’t model how the brain works in a linear way - Now THAT’S a rabbit hole. :grinning:


I am bit confused because the thread title says PSOLA but then you mention the smbPitchShift, which is a FFT/phase vocoder based process…