Messing with FFT's and buffering

Hi there! I am a complete amateur to audio programming. I am interested in making a real-time phase vocoder plugin for the sake of pitch shifting (and eventually get to formant shifting which is a bit more complicated). I tried making it and following references but I dont have references for making it good in realtime and only the concrete implementation itself. Ive tried my best to do it real-time but it just doesn’t work and it’s incredibly choppy. I am not experienced in audio programming especially real-time processing. I am a bit familiar with FFT’s because my CS highschool project was a diffusion model that used spectrograms for audio denoising and I had to talk a LOT in my paper about the theory behind FFTs. I’m a bit slow and harder nut to crack especially when it comes to official papers and other code - its hard for me to read pseudo code especially implementations that use advanced syntax that doens’t match my amateur-intermediate level of C++ ;-;
If anyone could explain to me about how I would go on about making a real-time phase vocoder, directly tracking the phases and impacting the frequencies - let me know! Things such as accounting for latency, proper buffering, proper real-time trackig and rendering etc.. Let me knowww :slight_smile:, I already have made reverb from scratch (meaning writing the algorithm, the comb and all pass filters loops and the basic form of a delay feedback loop from scratch. I like having control of effects especially effects that are simple to write from scratch :D) I understand I get ahead of myself a little making a pitch shifter but still! I can’t get my head off of it. I think I have the fundamental understanding of the algorithm itself, zero understandign and flexbility for actual real time application.

I understand that I have to collect enough samples before anything and store the samples in a (I hope simple?) buffer, and send the buffer to the vocoder and split the audio to windows, process the chunks and send them back to the audio processor and basically out. Do I maybe need to keep temporal info like the previous state of a buffer?

There was an ADC talk & github repo that may be helpful.

2 Likes

Formant shifting involves remapping the spectral envelope of an STFT representation. Instead of shifting the bins themselves (which would changes pitch and probably mess up windowing), it interpolates the envelope to shift the resonant frequencies up or down. Bin phases remain unaffected to preserve the original timing and clarity of the signal, only bin magnitudes are changed.

Making the calculations efficient enough for real-time processing is the tricky part. Particularly estimating the spectral envelope. I’d recommend starting from an existing phase vocoder codebase and can offer open-source project Bungee (a library for audio time-stretching and pitch-shifting). Bungee uses modern C++ to implement an efficient STFT pipeline and this could give you a big head start.