Is it just a sharp rise in amplitude over the course of a few samples, or is there something else to look out for - whats been your experience in this ?
2 envelope followers with different speeds, then compare for divergence. Search for the old spl transient designer manual for a pretty good overview. I’m sure there are more sophisticated ways today, but that’s how it was done for a long time.
The Ballistics Filter in the factory DSP library has two level detector types; peak, and RMS.
Effectively, “peak” mode just rectifies the signal before passing it through an LPF, whereas “RMS” mode raises the signal to power of two, passes through the filter, and then takes the square root. A typical envelope follower would just multiply your input signal by the output signal of the (rectified/filtered) ballistic filter.
I’ve had the idea for a while that if one were to parallel-process the audio in both modes at the same time (sharing a single time constant), and then take the delta signal between those two processes, you’d have a kind of “crest factor” measurement of your input signal - literally, the difference between “peak” and “RMS”, with respect to your chosen window length.
There’s possibly other solutions involving FFT’s and so forth, depending on your actual needs.
If you try the above, let me know how it goes.
The basic approaches already mentioned work. I’d recommend going with the two envelope follower approach, since it’s easily tweakable and you gain the benefit that the result is independent of your input gain.
An additional rather simple thing to try is using two thresholds, an “on” and a lower “off” threshold, to prevent rapid double-detection of transients.
To make detection of the actual onset of a transient more accurate, it might also help to “backtrack” the signal to where it started to rise once you detect a transient.
As with all things DSP, you can make it arbitrarily complicated. For example by improving the quality of your envelope followers. The basic combination of rectify and filter is rather jiggly. Good luck and let us know how it goes!
I had a lot better results in the frequency domain for both percussive and non-percussive sounds. There are a lot of papers and implementations for onset detection using high frequency content (HFC) detection.
How much temporal precision can you get with this approach? Does it lead to your onset decision naturally being quantized to your fft hopsize, or are you doing additional trickery to figure out the inter-frame time of the onset? (if so, I’d love to know how)
As far as I know, mostly from papers that discuss pitch shifter / time stretcher phase reset on transients, the spectral approach comes down to identifying sudden magnitude changes over a broad range of fft bins, in particular the higher ones. In a magnitude spectrum, transients tend to look like sharp vertical bars.
If you can recommend any particular papers that focus on this topic, maybe you can recommend some you found particularly helpful
Regarding time-domain detectors: Pre-filtering the signal helps there too of course.
Of course getting back to time-domain is slightly tricky. It’s been two years that i worked on the algorithm so my memories are slightly faint. I did combine it with a pre-filtered envelope follower i think. But given that for these applications a small FFT frame size is acceptable the search space is relatively small too so local minima almost get you there already.
… the spectral approach comes down to identifying sudden magnitude changes over a broad range of fft bins, in particular the higher ones. In a magnitude spectrum, transients tend to look like sharp vertical bars.
Yes this is the approach we took basically.
Hm … i can’t seem to find the paper that i liked best. Here is one other that turned up in the search: A Tutorial on Onset Detection in Music Signals by Juan Pablo Bello, Laurent Daudet, Samer Abdallah, Chris Duxbury, Mike Davies, and Mark B. Sandler