Converting signal from frequency back to time domain creates artifacts when pitch shifting up

This is kind of a weird one--I'm a little stuck.

I'm making a sampler, and I have implemented pitch shifting of samples similarly to how Jules did in the source. Everything was working fine--pitch shifting up as well as down resulted in no artifacts. Now, what I'm trying to do is calculate the FFT and IFFT in the rendering callback so I can do some custom filtering, etc.

In the SampleVoice::renderNextBlock function I calculate the FFT (using FFTW3), then the IFFT, which results in the exact same input (as long as you divide the unnormalized output by the numSamples parameter); however, when pitch shifting samples after this conversion, I'm getting artifacts only ABOVE the root note and not below. This is very puzzling to me.

I'm using the same linear interpolation used in the source code:

const int pos = (int) samplePosition;
float alpha = (float) (samplePosition - pos);
float invAlpha = 1.0f - alpha;

//outFL holds the unnormalized IFFT (i.e. the original input) of the Left channel.

//outFR holds the unnormalized IFFT (i.e. the original input) of the Right channel.

float l = ((outFL [pos-start]/numSamples) * invAlpha +  (outFL[pos-start + 1]/numSamples) *  alpha);

float r = inR != nullptr ? (outFR[pos-start]/numSamples) * invAlpha + (outFR[pos-start + 1]/numSamples * alpha) : l;

*outL += l*attack_multiplier*release_multiplier;
*outR += r*attack_multiplier*release_multiplier;

samplePosition += pitchRatio;


I'm not sure why the artifacts would happen only ABOVE the root note and not below, when the same interpolation algorithm works fine on the original input that was not transformed via FFT and then IFFT. Any ideas what's going on?



Sampling after doing FFT then IFFT causes artifacts ONLY when the sample triggered is above the root note of the sample. To implement pitch shifting, you have to do some interpolation to smooth out the data to avoid artifacts. To be clear, I am getting artifacts specifically only when pitch shifting a sample higher (above the root note) after doing FFT then IFFT on the input.

I compared the original input to the FFT/IFFT transformed input, and it's identical. If I re-trigger the sample at its root-note pitch or below its root-note pitch, it sounds perfect. If I trigger the sample ABOVE the root note, I get artifacts. If I get rid of the FFT/IFFT on the original input and just sample the original input, I don't get artifacts when triggering the sample above its root note pitch.

I'll look into the rounding errors, though. Maybe normalizing the IFFT (dividing by the numSamples integer) is causing rounding errors...

It was a noob mistake.

The IFFT was only calculated for numSamples, though if you pitch shift up, you will need to index values that are further into the sample that weren't initialized.

I got it all working, though I'm realizing that it's fairly slow to do in realtime. Using FFTW, you have to create the fft plans, initialize input arrays, calculate output, do your filtering, then create plans for ifft, then calculate ifft, then add it to the output buffer.

There's a lot of allocation going on during that, and it's causing a few artifacts. Any tips for calculating FFT in realtime but minimizing allocation? I suppose you could calculate the FFT of the entire sample on the startNote callback, and then you'd only have to do your filtering and IFFT transform in realtime, and I suppose you could store those arrays as members instead of allocating each time...some food for thought...I am new to this, it's a fun learning experience :)

I've never used FFTW, but in most FFT libraries I've come across you only need to do the FFT/IFFT setup once for each FFT block size you intend to use (I believe this is what you're referring to as 'create plans'?).

So for example, if you only intend to use FFT block sizes of (say) 1024 and 2048 throughout your app, you only need to create and initialize two of these structures at start up.  After that, you just choose which one to use based on the block size you have.

If the FFT library's any good, it will only perform memory allocations during creation of these structures.