Revising pYIN to use JUCE's fft

shawnm · July 23, 2020, 12:55pm

I am trying to rewrite the fastDifference function in the pYIN by Matthias Mauch to use JUCE’s fft. I am doing something wrong, but I don’t know what.

Here is the original:

void
YinUtil::fastDifference(const double *in, double *yinBuffer)
{

int frameSize = 2 * m_yinBufferSize;
int halfSize = m_yinBufferSize;

double *audioTransformedComplex = new double[frameSize + 2];
double *audioOutReal = new double[frameSize];
double *kernel = new double[frameSize];
double *kernelTransformedComplex = new double[frameSize + 2];
double *yinStyleACFComplex = new double[frameSize + 2];
double *powerTerms = new double[m_yinBufferSize];

// POWER TERM CALCULATION
// ... for the power terms in equation (7) in the Yin paper
powerTerms[0] = 0.0;
for (int j = 0; j < m_yinBufferSize; ++j) {
    powerTerms[0] += in[j] * in[j];
}

// now iteratively calculate all others (saves a few multiplications)
for (int tau = 1; tau < m_yinBufferSize; ++tau) {
    powerTerms[tau] = powerTerms[tau-1] -
        in[tau-1] * in[tau-1] +
        in[tau+m_yinBufferSize] * in[tau+m_yinBufferSize];  
}

// YIN-STYLE AUTOCORRELATION via FFT
// 1. data
m_fft.forward(in, audioTransformedComplex);

// 2. half of the data, disguised as a convolution kernel
for (int j = 0; j < m_yinBufferSize; ++j) {
    kernel[j] = in[m_yinBufferSize-1-j];
}
for (int j = m_yinBufferSize; j < frameSize; ++j) {
    kernel[j] = 0.;
}
m_fft.forward(kernel, kernelTransformedComplex);

// 3. convolution via complex multiplication -- written into
for (int j = 0; j <= halfSize; ++j) {
    yinStyleACFComplex[j*2] = // real
        audioTransformedComplex[j*2] * kernelTransformedComplex[j*2] -
        audioTransformedComplex[j*2+1] * kernelTransformedComplex[j*2+1];
    yinStyleACFComplex[j*2+1] = // imaginary
        audioTransformedComplex[j*2] * kernelTransformedComplex[j*2+1] +
        audioTransformedComplex[j*2+1] * kernelTransformedComplex[j*2];
}

m_fft.inverse(yinStyleACFComplex, audioOutReal);

// CALCULATION OF difference function
// ... according to (7) in the Yin paper.
for (int j = 0; j < m_yinBufferSize; ++j) {
    yinBuffer[j] = powerTerms[0] + powerTerms[j] - 2 *
        audioOutReal[j+m_yinBufferSize-1];
}
delete [] audioTransformedComplex;
delete [] audioOutReal;
delete [] kernel;
delete [] kernelTransformedComplex;
delete [] yinStyleACFComplex;
delete [] powerTerms;

}

Here is mine:

void
YinUtil::fastDifference(const float *in, float *yinBuffer)
{

int frameSize = 2 * m_yinBufferSize;

const juce::dsp::Complex<float>* inComplex = reinterpret_cast<const juce::dsp::Complex<float>*>(in);
juce::dsp::Complex<float>* audioOutComplex = new juce::dsp::Complex<float>[frameSize];
juce::dsp::Complex<float>* audioTransformedComplex = new juce::dsp::Complex<float>[frameSize];

float *kernel = new float[frameSize];
juce::dsp::Complex<float>* kernelTransformedComplex = new juce::dsp::Complex<float>[frameSize];

juce::dsp::Complex<float>* yinStyleACFComplex = new juce::dsp::Complex<float>[frameSize];
float *powerTerms = new float[m_yinBufferSize];

// POWER TERM CALCULATION
// ... for the power terms in equation (7) in the Yin paper
powerTerms[0] = 0.0;
for (int j = 0; j < m_yinBufferSize; ++j) {
    powerTerms[0] += in[j] * in[j];
}

// now iteratively calculate all others (saves a few multiplications)
for (int tau = 1; tau < m_yinBufferSize; ++tau) {
    powerTerms[tau] = powerTerms[tau-1] -
        in[tau-1] * in[tau-1] +
        in[tau+m_yinBufferSize] * in[tau+m_yinBufferSize];  
}

// YIN-STYLE AUTOCORRELATION via FFT
// 1. data
m_fft.perform(inComplex, audioTransformedComplex, false);

// 2. half of the data, disguised as a convolution kernel
for (int j = 0; j < m_yinBufferSize; ++j) {
    kernel[j] = in[m_yinBufferSize-1-j];
}
for (int j = m_yinBufferSize; j < frameSize; ++j) {
    kernel[j] = 0.;
}

const juce::dsp::Complex<float>* kernalComplex = reinterpret_cast<const juce::dsp::Complex<float>*>(kernel);
m_fft.perform(kernalComplex, kernelTransformedComplex, false);

// 3. convolution via complex multiplication -- written into
for (int j = 0; j < frameSize; ++j) {

    yinStyleACFComplex[j] = audioTransformedComplex[j] * kernelTransformedComplex[j];
    
}

m_fft.perform(yinStyleACFComplex, audioOutComplex, true);

// CALCULATION OF difference function
// ... according to (7) in the Yin paper.
for (int j = 0; j < m_yinBufferSize; ++j) {
    yinBuffer[j] = (powerTerms[0] + powerTerms[j]) - (2 * audioOutComplex[j+m_yinBufferSize].real());
}

delete [] powerTerms;
delete [] audioTransformedComplex;
delete [] audioOutComplex;
delete [] kernel;
delete [] kernelTransformedComplex;
delete [] yinStyleACFComplex;

}

Any ideas what I am doing wrong?

Thanks!
Shawn

shawnm · July 23, 2020, 10:02pm

I got it working with adamski’s AudioFFT:

void
YinUtil::fastDifference(const float *in, float *yinBuffer)
{

int frameSize = 2 * m_yinBufferSize;
int halfSize = m_yinBufferSize;

float *audioTransformedComplex = new float[frameSize + 2];
float *audioOutReal = new float[frameSize];
float *kernel = new float[frameSize];
float *kernelTransformedComplex = new float[frameSize + 2];
float *yinStyleACFComplex = new float[frameSize + 2];
float *powerTerms = new float[m_yinBufferSize];

// POWER TERM CALCULATION
// ... for the power terms in equation (7) in the Yin paper
powerTerms[0] = 0.0;
for (int j = 0; j < m_yinBufferSize; ++j) {
    powerTerms[0] += in[j] * in[j];
}

// now iteratively calculate all others (saves a few multiplications)
for (int tau = 1; tau < m_yinBufferSize; ++tau) {
    powerTerms[tau] = powerTerms[tau-1] -
        in[tau-1] * in[tau-1] +
        in[tau+m_yinBufferSize] * in[tau+m_yinBufferSize];
}

// YIN-STYLE AUTOCORRELATION via FFT
// 1. data
//m_fft.forward(in, audioTransformedComplex);
m_fft.init(frameSize);
std::vector<float> real1 (audiofft::AudioFFT::ComplexSize(frameSize));
std::vector<float> imag1 (audiofft::AudioFFT::ComplexSize(frameSize));
m_fft.fft(in, real1.data(), imag1.data());
for (int j = 0; j < halfSize + 2; ++j) {
    audioTransformedComplex[j*2] = real1[j];
    audioTransformedComplex[j*2+1] = imag1[j];
}

// 2. half of the data, disguised as a convolution kernel
for (int j = 0; j < m_yinBufferSize; ++j) {
    kernel[j] = in[m_yinBufferSize-1-j];
}
for (int j = m_yinBufferSize; j < frameSize; ++j) {
    kernel[j] = 0.;
}
//m_fft.forward(kernel, kernelTransformedComplex);
std::vector<float> real2 (audiofft::AudioFFT::ComplexSize(frameSize));
std::vector<float> imag2 (audiofft::AudioFFT::ComplexSize(frameSize));
m_fft.fft(kernel, real2.data(), imag2.data());
for (int j = 0; j < halfSize + 2; ++j) {
    kernelTransformedComplex[j*2] = real2[j];
    kernelTransformedComplex[j*2+1] = imag2[j];
}

// 3. convolution via complex multiplication -- written into
for (int j = 0; j <= halfSize; ++j) {
    yinStyleACFComplex[j*2] = // real
        audioTransformedComplex[j*2] * kernelTransformedComplex[j*2] -
        audioTransformedComplex[j*2+1] * kernelTransformedComplex[j*2+1];
    yinStyleACFComplex[j*2+1] = // imaginary
        audioTransformedComplex[j*2] * kernelTransformedComplex[j*2+1] +
        audioTransformedComplex[j*2+1] * kernelTransformedComplex[j*2];
}

//m_fft.inverse(yinStyleACFComplex, audioOutReal);
std::vector<float> real3 (audiofft::AudioFFT::ComplexSize(frameSize));
std::vector<float> imag3 (audiofft::AudioFFT::ComplexSize(frameSize));
for (int j = 0; j <= halfSize; ++j) {
    real3[j] = yinStyleACFComplex[j*2];
    imag3[j] = yinStyleACFComplex[j*2+1];
}
m_fft.ifft(audioOutReal, real3.data(), imag3.data());

// CALCULATION OF difference function
// ... according to (7) in the Yin paper.
for (int j = 0; j < m_yinBufferSize; ++j) {
    yinBuffer[j] = powerTerms[0] + powerTerms[j] - 2 *
        audioOutReal[j+m_yinBufferSize-1];
}
delete [] audioTransformedComplex;
delete [] audioOutReal;
delete [] kernel;
delete [] kernelTransformedComplex;
delete [] yinStyleACFComplex;
delete [] powerTerms;

}

There must be something I don’t understand about the JUCE fft.

DaveH · July 24, 2020, 7:54am

i can’t answer you question, sorry. I have a question though, I have tried the time domain version of Yin, it was OK, and I wondered if it needed more that one technique running in parallel to get a check for inconsistencies.
How do you find this algorithm’s accuracy?
Thanks.

shawnm · July 24, 2020, 12:16pm

I am just starting to learn about this stuff so I don’t have many answers. Here is the description from Matthias Mauch and Simon Dixon’s
paper,

“we modify YIN to output multiple pitch candidates with associated probabilities (PYIN Stage 1). These probabilities arise naturally from a prior distribution on the YIN threshold parameter. We use these probabilities as observations in a hidden Markov model, which is Viterbi-decoded to produce an improved pitch track (PYIN Stage 2). We demonstrate that the combination of Stages 1 and 2 raises recall and precision substantially.”

The pYIN code has time domain (slowDifference) and frequency domain (fastDifference) versions. I need speed, so I’m looking into the fft version. I am not sure yet about accuracy.

luisvaldivia · July 27, 2020, 9:20am

The results of the inverse transformation of Juce fft are scaled, / buffer size. Other libraries like FFTW or VDSP (Apple) are not scaling the results of the inverse transformation.
If you want to avoid scaling you’ll need to change the Juce fft source code.
Best regards.

Topic		Replies	Views
Juce fft General JUCE discussion	11	1939	March 28, 2016
Inverse FFT - how to synchronize the phase in each buffer block General JUCE discussion	29	3515	May 9, 2018
FFT question Audio Plugins	1	1024	December 26, 2017
Open source FFT plugin General JUCE discussion	12	4286	January 2, 2012
JUCE FFT - Filter howto? Audio Plugins	3	3310	June 3, 2016

Revising pYIN to use JUCE's fft

Purchase

Discover

Learn

Support

About

Events

Revising pYIN to use JUCE's fft

Related topics

Purchase

Discover

Learn

Support

About

Events