Looking for a timestretching/pitchshifting library that has very low latency

jamand · April 7, 2021, 10:15pm

Hi,

I am building a plugin that uses pitch shifting and time stretching in realtime. I have tried both SoundTouch and Rubberband, and although they work well at first glance, they have >100ms latencies. I am wondering if anyone has experience on using a low latency timestretching/pitchshifting library, and can give me some guidance.

Thanks!

jakemumu · April 8, 2021, 12:54am

to be fair you’ll typically need around 512 samples of latency minimum to do stuff like this, so ~100ms is pretty decent

xenakios · April 8, 2021, 2:35am

512 samples at 44100 Hz sample rate is around 11 ms, so an additional 90 ms of latency is quite a lot.

SoundTouch and RubberBand are not well suited for processing real time inputs, they are more meant for stuff like playing back existing samples/files.

benvining · April 8, 2021, 5:18am

Here’s the real time PSOLA implementation I’m currently working on:

The idea here is that there’s one “PsolaAnalyzer” object than can communicate to an unlimited number of “PsolaShifter” objects, with the goal of lowering the CPU load because analysis is a heavy task. This repo is still a work in progress; this should make you some pitch shifted sound but there are definitely still some issues that I’m attempting to iron out. If anyone wants to cast an eye over this and help me identify & fix issues, that would be amazing…

With this kind of algorithm, generally the latency is determined by the pitch detection required for the PSOLA’s analysis phase. With a time-domain approach like ASDF or AMDF, a good rule of thumb is for the latency to be 2 * the maximum possible detectable period (ie, 2 * the period of the lowest detectable frequency). And even if you use some sort of FFT trickery that can get away with detecting pitch from fewer samples, the resynthesis OLA process will still need analysis grains that are at least a period long, so I think it would be hard to get around requiring at least 1 or 2 input periods’ worth of latency (meaning, of course, that the actual plugin’s latency must be 1 or 2 maximum possible input periods’ worth of samples).

But even if you’re calibrating for a bass voice – let’s say your lowest possible detectable input frequency is a D2 (or midinote 38), that’s a frequency of 73.42 Hz. At a sampling rate of 44.1 kHz, a single period of this frequency is ~601 samples. So if the latency is double that, we get 1202 samples = ~27.26 milliseconds. Not terrible, but definitely perceivable.

Obviously with this paradigm, the latency would decrease as the input vocal range increases. So for a soprano, if your lowest possible note can be even as high as C4 (midinote 60, frequency 261.63), the max period would be ~169 samples, making our latency 338 samples = 7.66 ms!

Here’s an example of an ASDF-based pitch detector. This is thoroughly tested and should be decently accurate. The latency is like I describe above, 2 * max input period. Let me know if it misbehaves for you.

Low latency is always a hard thing to get right, especially for live performance. There’s often a trade off between latency and sound quality (for pitch shifting in particular, sometimes lower latency equates with more distorted formants. The lowest latency you can get would be to just use a classic vocoder technique, which imprints the input signal’s formants onto a synthesized carrier signal that’s already at the desired pitch, instead of doing any actual pitch resynthesis, so those can just be one sample in, one sample out with no granularity required, I think).

I’ve found that on the pitch shifting front, PSOLA seems to be a good middle ground between latency and quality. I know less about timestretching, but AFAIK the same basic principles apply.

Hope this helps.

stenzel · April 8, 2021, 9:38am

For low latency time stretching I get the best results from a conventional time domain pitch shifter that plays back windowed snippets at a different rate, enhanced with a correlation matching of overlapping segments that results in less perceived phasiness.
Quality depends on the input signals, the more correlated the better.

xipix · March 7, 2024, 9:14am

Perhaps three years’ too late for @jamand, but others may find this useful. Last month I released “Bungee” - it’s a new open source audio time stretch and pitch shift library using current-best-practice phase vocoder techniques and written in modern C++.

Bungee’s latency is of the order of 20ms for speed and pitch controls and 40ms from audio input to output. If you were willing to accept somewhat worse bass response and clarity of dense tones, this could be halved by halving the granular step size (and thus window and transform lengths would also be halved).

I also have a closed source “Pro” version with better performance in terms of CPU and audio quality.

Try Bungee live: https://bungee.parabolaresearch.com/bungee-web-demo
Compare Bungee with other stretch implementation: https://bungee.parabolaresearch.com/compare
Bungee’s open source code: https://github.com/kupix/bungee

Topic		Replies	Views
How to handle latency in pitch shifter (rubberband) General JUCE discussion	9	1610	February 21, 2023
Pitch Shift/Slow Down Algorithms General JUCE discussion	36	15765	February 20, 2023
Realtime Rubberband Pitchshifter - Working Example Audio Plugins	23	4969	July 18, 2024
Time shift when playing notes Audio Plugins	15	713	March 27, 2012
Looking for help on time stretching task JUCE Jobs	1	765	February 1, 2021

Looking for a timestretching/pitchshifting library that has very low latency

Purchase

Discover

Learn

Support

About

Events

Looking for a timestretching/pitchshifting library that has very low latency

Related topics

Purchase

Discover

Learn

Support

About

Events