FFT resynthesis (to oscillators instead of iFFT)

Hi all, I’ve been stuck for days in this. I already posted it in kvraudio but maybe I’ll have more luck here as it does involve the JUCE FFT and may be doing something wrong with it.

I’m trying to recreate this additive resynthesis to experiment a bit with sounds. Basically it analyzes the frequencies, amplitudes and phases of a sample and sets some sine oscillators to reproduce it.

So my approach is:

  1. Each block load N samples (usually 128-256, depending on the blockSize) from the file/sample into an array, apply windowing to it (multiplyWithWindowingTable), and then zero pad to 1024 or 2048 into another array. That array is 2 * fftSize, and the window type is Hann or Hamming
  2. In each block, FFT that array with performRealOnlyForwardTransform. That should give interleaved [real, imaginary] float values that I take to calculate magnitudes, phases, and frequencies (SampleRate * index / fftSize). I calculate magnitudes and phases like pizzafilms does. I have also tried calculating manually with sqrt(re * re + im * im) for magnitudes and atan2 for phases, but nothing
  3. In each block, set those magnitudes, phases and frequencies to sine oscillators, transitioning with linear interpolation. Setting the amplitudes and frequencies helps achieveing a sound that resembles the original sample, but in a really distorted way. Setting phases doesn’t help at all, and I think I may be mishandling that. If a phase is to say something -1.5 and next phase is 1.0, that would be going 2.5 upwards in phase/wavetable position, so that plus the phase incremental due to the frequency oscillator may be creating the FM-ish noise I hear?

I’ve tried aswell going for a full 1024-2048 FFT instead of small blocks, and interpolating the transition (freq., amplitudes, phases) from big block go big block in each small block, as it would give much better resolution (thought 128-256 may not be enough?) but it didn’t solve anything, sounds even worse.

Any thoughts? I thought I may be doing something wrong with how I get the FFT magnitudes, frequencies and phases.

Shower thought: I can hear the CPU fan burning each time the guy in the video plays notes, it may not be worth it computationally speaking to go for this approach and just do the iFFT after some parameter tweaks to alter the sound?

1 Like

I would just get rid of the phase and let the oscillator continue with the phase it had in the previous block. Good thing we human beings can’t really hear phase :slight_smile: (don’t want to start a discussion here, yes I can hear the difference between a dirac and a linear sweep :wink: )

So this should make things a little bit easier, then your oscillators (how many do you have?) just have to follow the the peaks in the spectrum -> amplitude and frequency.

1 Like

Yeah that’s what I was doing in the beggining, ignoring phase and just setting frequencies and amplitudes to the oscillators in each block. The percussive and more frequency filled samples sounded quite distorted (although recognizable) but the more harmonic/tonal sounds like violins and pianos were totally distorted.

I am using 32 sines (the guy of the video uses only 20 and it sounds good), and if one chunk has less than 32 peaks I just set rest of the amplitudes at 0 so it won’t add to the sound. One thing I forgot to say is that I’m using a basic peak detection to know where the important frequencies are: if ((fft_magnitude[i] > fft_magnitude[i-1]) && (fft_magnitude[i] > fft_magnitude[i+1]))
and then I use Quadratic Method to interpolate the frequency, as many frequencies won’t fall in the center of the bin and the peak may be inbetween 2 bins.

Checking the log/output of the frequencies and amplitudes calculated I didn’t see anything weird that’s why I thought it either was about phases as I’ve been told in kvraudio, either I wasn’t picking enough samples to get decent FFT results. The problem with this latter one is that it sounds worse, no matter if just using a big block ever X audio blocks (i.e a 512 file block every 4 x 128 audio blocks) or overlapping (i.e each block reading 0-256 file samples, then 128-384, 256-512…).

I honestly didn’t expect it to be so “hard” to get it working :smile: