Time‑stretching with JUCE + signalsmith‑stretch results in pitch drop instead of tempo change

Hello,
I’m trying to implement a time‑stretching feature using JUCE and signalsmith‑stretch, but I haven’t been able to get it working correctly yet.

I’ve extracted the parts of the implementation that seem relevant and included them below.
stretchRatio is a predefined value (smaller = slower, larger = faster).
My expectation is that setting stretchRatio to 2 would give “same pitch, double speed,” and setting it to 0.5 would give “same pitch, half speed.”
However, in both cases the pitch drops while the tempo stays the same.
If anyone has any insights or suggestions, I would greatly appreciate your help.

void ClipAudioSourceModel::prepareToPlay(int samplesPerBlockExpected, double sampleRate) {
    sr = sampleRate;
    if (readerSource) {
        readerSource->prepareToPlay(samplesPerBlockExpected, sampleRate);
    }
    if (resampler) {
        const double ratio = (sourceSampleRate > 0.0 && sr > 0.0) ? (sourceSampleRate / sr) : 1.0;
        resampler->setResamplingRatio(ratio);
        resampler->prepareToPlay(samplesPerBlockExpected, sampleRate);
    }

    if (!stretcher) {
        int numChannels = readerSource && readerSource->getAudioFormatReader()
            ? readerSource->getAudioFormatReader()->numChannels
            : 2;
        stretcher = std::make_unique<signalsmith::stretch::SignalsmithStretch<float>>();
        stretcher->reset();
        stretcher->presetDefault(numChannels, sampleRate);
    }
    if (stretcher) {
        stretcher->setTransposeFactor(1.0);
    }

    int numChannels = readerSource && readerSource->getAudioFormatReader()
        ? readerSource->getAudioFormatReader()->numChannels
        : 2;
    stretchInputBuffer.setSize(numChannels, samplesPerBlockExpected * 4);
    stretchOutputBuffer.setSize(numChannels, samplesPerBlockExpected * 4);
}

void ClipAudioSourceModel::getNextAudioBlock(const juce::AudioSourceChannelInfo& bufferToFill) {
    int outputSamples = bufferToFill.numSamples;
    int inputSamples = static_cast<int>(outputSamples * stretchRatio);
    
    juce::AudioSourceChannelInfo inputInfo;
    inputInfo.buffer = &stretchInputBuffer;
    inputInfo.startSample = 0;
    inputInfo.numSamples = inputSamples;

    resampler->getNextAudioBlock(inputInfo);

    int numChannels = bufferToFill.buffer->getNumChannels();
    float* const* inputChannels = stretchInputBuffer.getArrayOfWritePointers();
    float* const* outputChannels = stretchOutputBuffer.getArrayOfWritePointers();

    stretcher->process(inputChannels, inputSamples, outputChannels, outputSamples);

    for (int ch = 0; ch < numChannels; ++ch) {
        bufferToFill.buffer->copyFrom(ch, bufferToFill.startSample,
                                       stretchOutputBuffer, ch, 0, outputSamples);
    }
}

This is a weird one. There’s nothing obviously wrong to me with the use of the SignalsmithStretch class from what I’m seeing here, but I’m also not seeing the whole picture.
The weird part is that you get a drop in pitch whether the ratio is 2 or 0.5, which suggests that something else is off. Are there any other artefacts in the audio?

One thing making me a little nervous is that the number of channels in bufferToFill is not checked against the number of channels in the stretchOutputBuffer, which could cause issues if the stretchOutputBuffer is mono and bufferToFill is stereo. But that could be nothing depending on the greater context of the project.

icebreakeraudio, thank you for your reply.

Since the copy of signalsmith-stretch had originally been added by the AI and I wasn’t sure where it came from, I re-added it via git submodule and checked again, but the issue didn’t improve.
I’m using signalsmith-stretch with tag 1.1.0, and linear is set to 0.3.0.

Are there any other artefacts in the audio?

You mean whether there are any other sources, right.

I’ll paste the entire source code here.

header

#pragma once

#include <JuceHeader.h>

#include "../../ThirdParty/signalsmith-stretch/signalsmith-stretch.h"

class ClipAudioSourceModel : public juce::PositionableAudioSource {
  public:
    ClipAudioSourceModel(std::unique_ptr<juce::AudioFormatReaderSource> readerSrc,
        const juce::String&                                             sourceName,
        double                                                          startTimeSec,
        double                                                          cropStartSec,
        double                                                          cropEndSec,
        juce::AudioFormatManager&                                       formatManager);

    void        prepareToPlay(int samplesPerBlockExpected, double sampleRate) override;
    void        releaseResources() override;
    void        getNextAudioBlock(const juce::AudioSourceChannelInfo& bufferToFill) override;
    void        setNextReadPosition(juce::int64 newPosition) override;
    juce::int64 getNextReadPosition() const override;
    juce::int64 getTotalLength() const override;
    bool        isLooping() const override;

    double              getStartTimeSec() const;
    void                setStartTimeSec(double newStartTime);
    double              getDurationSec() const;
    double              getCropStart() const;
    double              getCropEnd() const;
    void                setCropStart(double newCropStart);
    void                setCropEnd(double newCropEnd);
    const juce::String& getSourceName() const;

    juce::String getClipId() const;
    void         setClipId(const juce::String& id);

    void                  setThumbnailSource(juce::InputSource* source);
    juce::AudioThumbnail& getThumbnail() {
        return thumbnail;
    }
    const juce::AudioThumbnail& getThumbnail() const {
        return thumbnail;
    }

    double getSourceSampleRate() const;

    void setStretchRatio(double ratio);
    double getStretchRatio() const;

  private:
    std::unique_ptr<juce::AudioFormatReaderSource> readerSource;
    juce::String                                   sourceLabel;
    double                                         startTime = 0.0, cropStart = 0.0, cropEnd = 0.0;
    double                                         sr               = 44100.0;
    double                                         sourceSampleRate = 44100.0;
    juce::String                                   clipId;

    std::unique_ptr<juce::ResamplingAudioSource> resampler;
    juce::AudioThumbnailCache                    thumbnailCache;
    juce::AudioThumbnail                         thumbnail;

    double stretchRatio = 1.0;
    std::unique_ptr<signalsmith::stretch::SignalsmithStretch<float>> stretcher;
    juce::AudioBuffer<float> stretchInputBuffer;
    juce::AudioBuffer<float> stretchOutputBuffer;
};

You are using setTransposeFactor :wink:

I forgot pasting source.

#include "ClipAudioSourceModel.h"

ClipAudioSourceModel::ClipAudioSourceModel(std::unique_ptr<juce::AudioFormatReaderSource> readerSrc,
    const juce::String&                                                                   sourceName,
    double                                                                                startTimeSec,
    double                                                                                cropStartSec,
    double                                                                                cropEndSec,
    juce::AudioFormatManager&                                                             formatManager)
    : readerSource(std::move(readerSrc)),
      sourceLabel(sourceName),
      startTime(startTimeSec),
      cropStart(cropStartSec),
      cropEnd(cropEndSec),
      thumbnailCache(5),
      thumbnail(512, formatManager, thumbnailCache) {
    if (readerSource && readerSource->getAudioFormatReader() != nullptr) {
        sourceSampleRate = readerSource->getAudioFormatReader()->sampleRate;
    }
    resampler = std::make_unique<juce::ResamplingAudioSource>(readerSource.get(), false, 2);
}

void ClipAudioSourceModel::prepareToPlay(int samplesPerBlockExpected, double sampleRate) {
    sr = sampleRate;
    if (readerSource) {
        readerSource->prepareToPlay(samplesPerBlockExpected, sampleRate);
    }
    if (resampler) {
        const double ratio = (sourceSampleRate > 0.0 && sr > 0.0) ? (sourceSampleRate / sr) : 1.0;
        resampler->setResamplingRatio(ratio);
        resampler->prepareToPlay(samplesPerBlockExpected, sampleRate);
    }

    if (!stretcher) {
        int numChannels = readerSource && readerSource->getAudioFormatReader()
            ? readerSource->getAudioFormatReader()->numChannels
            : 2;
        stretcher = std::make_unique<signalsmith::stretch::SignalsmithStretch<float>>();
        stretcher->reset();
        stretcher->presetDefault(numChannels, sampleRate);
    }
    if (stretcher) {
        stretcher->setTransposeFactor(1.0);
    }

    int numChannels = readerSource && readerSource->getAudioFormatReader()
        ? readerSource->getAudioFormatReader()->numChannels
        : 2;
    stretchInputBuffer.setSize(numChannels, samplesPerBlockExpected * 4);
    stretchOutputBuffer.setSize(numChannels, samplesPerBlockExpected * 4);
}

void ClipAudioSourceModel::releaseResources() {
    if (resampler)
        resampler->releaseResources();
    if (readerSource)
        readerSource->releaseResources();
}

void ClipAudioSourceModel::getNextAudioBlock(const juce::AudioSourceChannelInfo& bufferToFill) {
    if (std::abs(stretchRatio - 1.0) < 0.001) {
        DBG("[DEBUG] ClipAudioSourceModel::getNextAudioBlock No time stretch");
        if (resampler) {
            resampler->getNextAudioBlock(bufferToFill);
            return;
        }
        if (readerSource) {
            readerSource->getNextAudioBlock(bufferToFill);
            return;
        } else {
            bufferToFill.clearActiveBufferRegion();
            return;
        }
    }

    if (!stretcher || !resampler) {
        bufferToFill.clearActiveBufferRegion();
        return;
    }

    int outputSamples = bufferToFill.numSamples;
    int inputSamples = static_cast<int>(outputSamples * stretchRatio);
    
    double timeFactor = static_cast<double>(outputSamples) / static_cast<double>(inputSamples);
    DBG("[DEBUG] ClipAudioSourceModel::getNextAudioBlock - stretchRatio=" << stretchRatio 
        << ", outputSamples=" << outputSamples << ", inputSamples=" << inputSamples
        << ", timeFactor=" << timeFactor
        << " (if >1: slower, if <1: faster)");

    juce::AudioSourceChannelInfo inputInfo;
    inputInfo.buffer = &stretchInputBuffer;
    inputInfo.startSample = 0;
    inputInfo.numSamples = inputSamples;

    resampler->getNextAudioBlock(inputInfo);

    int numChannels = bufferToFill.buffer->getNumChannels();
    float* const* inputChannels = stretchInputBuffer.getArrayOfWritePointers();
    float* const* outputChannels = stretchOutputBuffer.getArrayOfWritePointers();

    stretcher->process(inputChannels, inputSamples, outputChannels, outputSamples);

    for (int ch = 0; ch < numChannels; ++ch) {
        bufferToFill.buffer->copyFrom(ch, bufferToFill.startSample,
                                       stretchOutputBuffer, ch, 0, outputSamples);
    }
}

void ClipAudioSourceModel::setNextReadPosition(juce::int64 newPosition) {
    if (readerSource)
        readerSource->setNextReadPosition(newPosition);
    if (resampler)
        resampler->flushBuffers();
}

juce::int64 ClipAudioSourceModel::getNextReadPosition() const {
    return readerSource ? readerSource->getNextReadPosition() : 0;
}

juce::int64 ClipAudioSourceModel::getTotalLength() const {
    return readerSource ? readerSource->getTotalLength() : 0;
}

bool ClipAudioSourceModel::isLooping() const {
    return false;
}

double ClipAudioSourceModel::getStartTimeSec() const {
    return startTime;
}

void ClipAudioSourceModel::setStartTimeSec(double newStartTime) {
    startTime = newStartTime;
}

double ClipAudioSourceModel::getDurationSec() const {
    return juce::jmax(0.0, cropEnd - cropStart);
}

double ClipAudioSourceModel::getCropStart() const {
    return cropStart;
}

double ClipAudioSourceModel::getCropEnd() const {
    return cropEnd;
}

void ClipAudioSourceModel::setCropStart(double newCropStart) {
    cropStart = newCropStart;
}

void ClipAudioSourceModel::setCropEnd(double newCropEnd) {
    cropEnd = newCropEnd;
}

const juce::String& ClipAudioSourceModel::getSourceName() const {
    return sourceLabel;
}

void ClipAudioSourceModel::setThumbnailSource(juce::InputSource* source) {
    thumbnail.setSource(source);
}

double ClipAudioSourceModel::getSourceSampleRate() const {
    return sourceSampleRate;
}

juce::String ClipAudioSourceModel::getClipId() const {
    return clipId;
}

void ClipAudioSourceModel::setClipId(const juce::String& id) {
    clipId = id;
}

void ClipAudioSourceModel::setStretchRatio(double ratio) {
    if (ratio <= 0.0)
        return;

    double oldRatio = stretchRatio;
    stretchRatio = ratio;
    
    DBG("[DEBUG] ClipAudioSourceModel::setStretchRatio - oldRatio=" << oldRatio << ", newRatio=" << ratio);

    if (stretcher && sr > 0.0) {
        stretcher->setTransposeFactor(1.0);
        DBG("[DEBUG] ClipAudioSourceModel::setStretchRatio - stretcher reset and transposeFactor set to 1.0");
    } else {
        DBG("[DEBUG] ClipAudioSourceModel::setStretchRatio - stretcher not yet initialized, will be configured in prepareToPlay");
    }
}

double ClipAudioSourceModel::getStretchRatio() const {
    return stretchRatio;
}

lcapozzi

I’m passing 1.0 as the argument to setTransposeFactor,
but does that also affect the pitch?

Are you processing samples or real time audio? If you are processing incoming signal, you have to account for the different buffer length. As far as I remember, i had many headaches trying to use stretch on realtime audio, while it works like a charm for samples

I’m processing samples, but I’m still running into problems…