I am looking for some advice in regards to down-sampling buffers of audio in a plugin.
Basically I want to downsample the contents of the
processBlock function’s incoming audio buffer into another “down sampled” buffer. I want to do this as the plugin involves various feature extraction/analysis routines which compute different characteristics (spectral centroid, mfcc’s etc.) on the current buffer. I would like to down-sample as a reduction step prior to feature extraction. The plugin is mainly targeted to analysing vocals/beat-box performances so the the bandwith of interest is narrower than that of 44100hz sample rate etc.
The plugin also synthesises different sounds in response to certain feature analysis techniques so audio will be output at standard host sample rates (44100hz and up), hence the wish to down-sample to a separate buffer for analysis.
I am aware some libraries exist such as libsamplerate etc.
I’m interested in other approaches though. For example I know using an FIR lowpass filter and then decimating is one approach.
Does anyone have experience of implementing down-sampling using the LarangeInterpolator class or similar ? Any advice appreciated.
This is a little beyond my DSP knowledge at the moment so I may just need to get my head into some textbooks. Idealy I’d like something I can use/implement with minimal hassle as this is an academic project and the down-sampling stage is not the project focus.
I’m wondering whether the following approach is sufficient for down-sampling a buffer:
interpolator.process((downSamplingRate / sampleRate), buffer.getReadPointer(0), downSampledBuffer.getWritePointer(0, 0), numSamples);
downSamplingRate is some constant float value like
If this enough to down-sample without without any horrific aliasing issues etc ?
Apologies for the total ignorance, my DSP knowledge isn’t up to scratch enough for me to get my head around everything going on in LarangeInterpolator.cpp
If anyone is able to offer any explanation behind the code I’d be hugely grateful. I don’t fully understand the
calcCoefficientCode at the moment (appreciate that is a bit of an ask though)
Is it possible to use the LarangeInterpolator as is for down-sampling without pre-processing/filtering the input buffer or anything first ? Unsure as to whether doing so is completely the wrong approach and will result in all sorts of alias nastiness
EDIT: Going to have a read through the theory in the meantime…off down the rabbit hole.
Will the output be used for audio or analysis?
FIR yields the best results, and is pretty efficient using polyphase filters. However, it adds latency (N/2 samples). If you’re only going to analyse, a simple IIR filter may be enough followed by decimation.
Yep the down-sampled buffer will be used purely for analysis purposes. Still looking into all of this now.
So do you think some sort or reasonably steep biquad / cascade of biquads would work ?
I’m looking at the polyphase approach now.
EDIT: I’ve located the following on polyphase filters which might be worth an attempt: link
If it’s purely for analysis, and that sort of analysis is about analysing the largest parts not concerning phase I would just go for elliptical filters if you want steepness. Otherwise any common biquad implementation will work.
Using IIR filters you will only get phase delays in the top frequencies, leaving you with minimal delay in the passband, contrary to massive delay in FIR filters.
If, however, you need to preserve phase or it does have to be linear, FIR is your only choice.
If it’s just for analysis, can’t you work on the original data and simply translate the results? That’s eventually easier and also reduces the fact, that you analyse potential artefacts of your resampling into the result.
I mean translate like multiply any frequency value by the resampling factor, and any amplitude information stays the same.
Hmm it’s a fair point…
The analysis/feature-extraction results are being fed into a classification algorithm basically. So strictly speaking the results are still going to be discernible between the various “classifiable” classes/sounds.
I was more thinking of reducing the data points per buffer to run through the classifier by downsampling/decimating. As a reduction step it might aid the performance of the classifier. Maybe it is overkill…
@daniel would you mind clarifying what you mean by:
Do you mean multiply FFT bin values etc by the down-sampling ratio itself ? i.e. by a ratio of 1/4 etc.
well, kind of: your FFT returns a sequence of complex numbers, the real part is the amplitude of a certain frequency and the imaginary number is the phase of this frequency. This you can use normally.
Let’s assume you downsample from 48000 to 44100, so the ratio would be 0,91875.
Further if you expect the values for 1kHz e.g. in the bin 20 (the 20th number in your array of complex numbers), then it is actually the amplitude and phase for frequency (1000 Hz / 0,91875) = 1088,43537414966 Hz.
Was that understandable?
You will need a large fft for lower frequency resolution. I don’t know if there are any rules, but I like to have a number of wavelengths occur within my required frequency band for a filter, otherwise you are analysing dc offset. Dc offset is ok as an indicator of all wavelengths below the dc bin point.(i.e. a lpf).
A sample rate decrease is only ever a filtering of samples enough that no info occurs above (or maybe nearish) the newly required Nyquist and then a dropping of samples. Averaging, upon averaging upon averaging is the cheapest CPU filter. It introduces a delay, is linear phase and is not sharp. Biquad is just a more complex averaging scheme. Combining samples is averaging, but needs to be done many times in series to have the desired effect.
Good that you mention it @lpb, somebody at ircam claims:
The duration of the window must be five time longer than the period of the signal
[from: Introduction - Window Size]
But keep in mind, that downsampling changes the rules for everything, you don’t gain or lose data, you just change the scale.
Yep that makes sense. The more I look into it the less I’m thinking it’s necessary as a reduction step before computing the various features for the classification algorithm.
I think something like adaptive whitening is going to be a better approach as a reduction step prior to onset detection and then I will re-assess as to whether or not it’s worth it before the classifier.
Thanks a lot for the points guys. I’ve learnt things!