Revising pYIN to use JUCE's fft

I am trying to rewrite the fastDifference function in the pYIN by Matthias Mauch to use JUCE’s fft. I am doing something wrong, but I don’t know what.

Here is the original:

void
YinUtil::fastDifference(const double *in, double *yinBuffer)
{

int frameSize = 2 * m_yinBufferSize;
int halfSize = m_yinBufferSize;

double *audioTransformedComplex = new double[frameSize + 2];
double *audioOutReal = new double[frameSize];
double *kernel = new double[frameSize];
double *kernelTransformedComplex = new double[frameSize + 2];
double *yinStyleACFComplex = new double[frameSize + 2];
double *powerTerms = new double[m_yinBufferSize];

// POWER TERM CALCULATION
// ... for the power terms in equation (7) in the Yin paper
powerTerms[0] = 0.0;
for (int j = 0; j < m_yinBufferSize; ++j) {
    powerTerms[0] += in[j] * in[j];
}

// now iteratively calculate all others (saves a few multiplications)
for (int tau = 1; tau < m_yinBufferSize; ++tau) {
    powerTerms[tau] = powerTerms[tau-1] -
        in[tau-1] * in[tau-1] +
        in[tau+m_yinBufferSize] * in[tau+m_yinBufferSize];  
}

// YIN-STYLE AUTOCORRELATION via FFT
// 1. data
m_fft.forward(in, audioTransformedComplex);

// 2. half of the data, disguised as a convolution kernel
for (int j = 0; j < m_yinBufferSize; ++j) {
    kernel[j] = in[m_yinBufferSize-1-j];
}
for (int j = m_yinBufferSize; j < frameSize; ++j) {
    kernel[j] = 0.;
}
m_fft.forward(kernel, kernelTransformedComplex);

// 3. convolution via complex multiplication -- written into
for (int j = 0; j <= halfSize; ++j) {
    yinStyleACFComplex[j*2] = // real
        audioTransformedComplex[j*2] * kernelTransformedComplex[j*2] -
        audioTransformedComplex[j*2+1] * kernelTransformedComplex[j*2+1];
    yinStyleACFComplex[j*2+1] = // imaginary
        audioTransformedComplex[j*2] * kernelTransformedComplex[j*2+1] +
        audioTransformedComplex[j*2+1] * kernelTransformedComplex[j*2];
}

m_fft.inverse(yinStyleACFComplex, audioOutReal);

// CALCULATION OF difference function
// ... according to (7) in the Yin paper.
for (int j = 0; j < m_yinBufferSize; ++j) {
    yinBuffer[j] = powerTerms[0] + powerTerms[j] - 2 *
        audioOutReal[j+m_yinBufferSize-1];
}
delete [] audioTransformedComplex;
delete [] audioOutReal;
delete [] kernel;
delete [] kernelTransformedComplex;
delete [] yinStyleACFComplex;
delete [] powerTerms;

}

Here is mine:

void
YinUtil::fastDifference(const float *in, float *yinBuffer)
{

int frameSize = 2 * m_yinBufferSize;

const juce::dsp::Complex<float>* inComplex = reinterpret_cast<const juce::dsp::Complex<float>*>(in);
juce::dsp::Complex<float>* audioOutComplex = new juce::dsp::Complex<float>[frameSize];
juce::dsp::Complex<float>* audioTransformedComplex = new juce::dsp::Complex<float>[frameSize];

float *kernel = new float[frameSize];
juce::dsp::Complex<float>* kernelTransformedComplex = new juce::dsp::Complex<float>[frameSize];

juce::dsp::Complex<float>* yinStyleACFComplex = new juce::dsp::Complex<float>[frameSize];
float *powerTerms = new float[m_yinBufferSize];

// POWER TERM CALCULATION
// ... for the power terms in equation (7) in the Yin paper
powerTerms[0] = 0.0;
for (int j = 0; j < m_yinBufferSize; ++j) {
    powerTerms[0] += in[j] * in[j];
}

// now iteratively calculate all others (saves a few multiplications)
for (int tau = 1; tau < m_yinBufferSize; ++tau) {
    powerTerms[tau] = powerTerms[tau-1] -
        in[tau-1] * in[tau-1] +
        in[tau+m_yinBufferSize] * in[tau+m_yinBufferSize];  
}

// YIN-STYLE AUTOCORRELATION via FFT
// 1. data
m_fft.perform(inComplex, audioTransformedComplex, false);

// 2. half of the data, disguised as a convolution kernel
for (int j = 0; j < m_yinBufferSize; ++j) {
    kernel[j] = in[m_yinBufferSize-1-j];
}
for (int j = m_yinBufferSize; j < frameSize; ++j) {
    kernel[j] = 0.;
}

const juce::dsp::Complex<float>* kernalComplex = reinterpret_cast<const juce::dsp::Complex<float>*>(kernel);
m_fft.perform(kernalComplex, kernelTransformedComplex, false);

// 3. convolution via complex multiplication -- written into
for (int j = 0; j < frameSize; ++j) {

    yinStyleACFComplex[j] = audioTransformedComplex[j] * kernelTransformedComplex[j];
    
}

m_fft.perform(yinStyleACFComplex, audioOutComplex, true);

// CALCULATION OF difference function
// ... according to (7) in the Yin paper.
for (int j = 0; j < m_yinBufferSize; ++j) {
    yinBuffer[j] = (powerTerms[0] + powerTerms[j]) - (2 * audioOutComplex[j+m_yinBufferSize].real());
}

delete [] powerTerms;
delete [] audioTransformedComplex;
delete [] audioOutComplex;
delete [] kernel;
delete [] kernelTransformedComplex;
delete [] yinStyleACFComplex;

}

Any ideas what I am doing wrong?

Thanks!
Shawn

1 Like

I got it working with adamski’s AudioFFT:

void
YinUtil::fastDifference(const float *in, float *yinBuffer)
{

int frameSize = 2 * m_yinBufferSize;
int halfSize = m_yinBufferSize;

float *audioTransformedComplex = new float[frameSize + 2];
float *audioOutReal = new float[frameSize];
float *kernel = new float[frameSize];
float *kernelTransformedComplex = new float[frameSize + 2];
float *yinStyleACFComplex = new float[frameSize + 2];
float *powerTerms = new float[m_yinBufferSize];

// POWER TERM CALCULATION
// ... for the power terms in equation (7) in the Yin paper
powerTerms[0] = 0.0;
for (int j = 0; j < m_yinBufferSize; ++j) {
    powerTerms[0] += in[j] * in[j];
}

// now iteratively calculate all others (saves a few multiplications)
for (int tau = 1; tau < m_yinBufferSize; ++tau) {
    powerTerms[tau] = powerTerms[tau-1] -
        in[tau-1] * in[tau-1] +
        in[tau+m_yinBufferSize] * in[tau+m_yinBufferSize];
}

// YIN-STYLE AUTOCORRELATION via FFT
// 1. data
//m_fft.forward(in, audioTransformedComplex);
m_fft.init(frameSize);
std::vector<float> real1 (audiofft::AudioFFT::ComplexSize(frameSize));
std::vector<float> imag1 (audiofft::AudioFFT::ComplexSize(frameSize));
m_fft.fft(in, real1.data(), imag1.data());
for (int j = 0; j < halfSize + 2; ++j) {
    audioTransformedComplex[j*2] = real1[j];
    audioTransformedComplex[j*2+1] = imag1[j];
}

// 2. half of the data, disguised as a convolution kernel
for (int j = 0; j < m_yinBufferSize; ++j) {
    kernel[j] = in[m_yinBufferSize-1-j];
}
for (int j = m_yinBufferSize; j < frameSize; ++j) {
    kernel[j] = 0.;
}
//m_fft.forward(kernel, kernelTransformedComplex);
std::vector<float> real2 (audiofft::AudioFFT::ComplexSize(frameSize));
std::vector<float> imag2 (audiofft::AudioFFT::ComplexSize(frameSize));
m_fft.fft(kernel, real2.data(), imag2.data());
for (int j = 0; j < halfSize + 2; ++j) {
    kernelTransformedComplex[j*2] = real2[j];
    kernelTransformedComplex[j*2+1] = imag2[j];
}

// 3. convolution via complex multiplication -- written into
for (int j = 0; j <= halfSize; ++j) {
    yinStyleACFComplex[j*2] = // real
        audioTransformedComplex[j*2] * kernelTransformedComplex[j*2] -
        audioTransformedComplex[j*2+1] * kernelTransformedComplex[j*2+1];
    yinStyleACFComplex[j*2+1] = // imaginary
        audioTransformedComplex[j*2] * kernelTransformedComplex[j*2+1] +
        audioTransformedComplex[j*2+1] * kernelTransformedComplex[j*2];
}

//m_fft.inverse(yinStyleACFComplex, audioOutReal);
std::vector<float> real3 (audiofft::AudioFFT::ComplexSize(frameSize));
std::vector<float> imag3 (audiofft::AudioFFT::ComplexSize(frameSize));
for (int j = 0; j <= halfSize; ++j) {
    real3[j] = yinStyleACFComplex[j*2];
    imag3[j] = yinStyleACFComplex[j*2+1];
}
m_fft.ifft(audioOutReal, real3.data(), imag3.data());

// CALCULATION OF difference function
// ... according to (7) in the Yin paper.
for (int j = 0; j < m_yinBufferSize; ++j) {
    yinBuffer[j] = powerTerms[0] + powerTerms[j] - 2 *
        audioOutReal[j+m_yinBufferSize-1];
}
delete [] audioTransformedComplex;
delete [] audioOutReal;
delete [] kernel;
delete [] kernelTransformedComplex;
delete [] yinStyleACFComplex;
delete [] powerTerms;

}

There must be something I don’t understand about the JUCE fft.

1 Like

i can’t answer you question, sorry. I have a question though, I have tried the time domain version of Yin, it was OK, and I wondered if it needed more that one technique running in parallel to get a check for inconsistencies.
How do you find this algorithm’s accuracy?
Thanks.

I am just starting to learn about this stuff so I don’t have many answers. Here is the description from Matthias Mauch and Simon Dixon’s
paper,

“we modify YIN to output multiple pitch candidates with associated probabilities (PYIN Stage 1). These probabilities arise naturally from a prior distribution on the YIN threshold parameter. We use these probabilities as observations in a hidden Markov model, which is Viterbi-decoded to produce an improved pitch track (PYIN Stage 2). We demonstrate that the combination of Stages 1 and 2 raises recall and precision substantially.”

The pYIN code has time domain (slowDifference) and frequency domain (fastDifference) versions. I need speed, so I’m looking into the fft version. I am not sure yet about accuracy.

1 Like

The results of the inverse transformation of Juce fft are scaled, / buffer size. Other libraries like FFTW or VDSP (Apple) are not scaling the results of the inverse transformation.
If you want to avoid scaling you’ll need to change the Juce fft source code.
Best regards.

1 Like