I apologize for asking a question that I’m sure has been asked a billion times already, but I’d really like to see an example of a juce project that does per-sample based processing (I understand why block based is preferred for VSTs, but that is not my use-case). All of the tutorials show block-based audio processing but imply that per-sample based is also possible. I’d love to see that. Is it just a matter of setting the buffer size down to 1 sample, or is there something more? I also understand that per-sample based procesing puts constraints on how much processing can be done, so I would like to see what the prefered threading model is to make sure the audio is handled smoothly while the UI does its thing
A plugin doesn’t have control of the block size that is passed from the DAW however, one can write a processblock method that iterates over that block and performs single-sample processing on the samples. for example if the DAW sent you 100 samples, you could have the processBlock method loop over those 100 samples and call forward 100 calls to a ‘processSingleSample()’ method.
hope that makes sense.
It probably wouldn’t be significantly different in that case either, you would still just write your audio callback as suggested by Jeff. Even on dedicated hardware, you are likely not going to be getting the callbacks sample by sample but rather as buffers of samples. And in case the callbacks actually are sample by sample, you would know that from the documentation of the hardware.
while I appreciate the non-answers to the question I asked, i just want an example of sample rate processing in JUCE. I apologize if my question was too hard to understand. sheesh
I’m not sure your question is hard to understand, as they have all been good answers based on the question. In what ways are these answers not specific to what you want to understand? With block processing you iterate over the samples in a buffer, and for each sample you do the work you need to do on the sample. If you have an API that is passing in a single sample (instead of a buffer) you simply do the work you did for each sample in the buffer for the block processing version. Since the answers that explain this are not helpful to you, I am wondering what the disconnect is?
What parts of JUCE do you want to use in your hardware project? Do you know what hardware you will be using, and which parts of JUCE work on this platform? I’m not using JUCE for hardware (but I do plenty of it on desktop), but I do use a hardware platform called the Daisy Seed, from Electrosmith. It’s audio subsystem has a callback mechanism very similar to a plugin, in which you get an input buffer and an output buffer. You specify the buffer size when you initialize the audio subsystem, and if your DSP code is capable of it, you could easily set it to 1, but the code would be the same if you set it for more, as you would simply iterate over the buffer size, which would work for 1 or more.
Again, if none of these answers are what you are looking for, maybe try to explain the specifics you feel you don’t understand. Perhaps you have some JUCE code you have written that you could explain where you think it would need to be different for the hardware project?
The insides of all block based processing is sample based processing.
It’s also sample based processing if you just pass a block size of one and pretend the for loops don’t exist.
Block based processing tends to be faster as the CPU cache, branch predictor and all the other fancy stuff works better in smaller loops when a little loop of code is doing the same thing over and over to the same data without using an insane number of variables.
If you’re writing on low performance MCUs without CPU caches and or you have CPU power to spare and don’t need the performance, or you’re doing something triivial enough then sample-by-sample is just fine.
processing the sample loop on the outside and the channels on the inside is typically just an in-between state of a project because some algorithms are easier to write like that initially. then you rewrite it to have the channel loop on the outside to be better aligned with the memory of the buffers for performance. but that doesn’t mean, it’s any less sample accurate than the other way. it can always accomplish the same thing practically
Having done this recently for a Juce-based modular synth, designed for the iPad, I did run into some expected performance issues. There is no sequential buffer to be simd optimised. My solution was to use a SIMD library and pack the channels into SIMD registers. This then required writing the dsp using the library. The packing and unpacking is the expensive bit, but if the DSP is non-trivial, there are gains to be had, with the trade-off of not using any JUCE DSP code beyond processBlock(). and all the head scratching involved with learning the SIMD library
This will depend on the hardware being used, as SIMD may not be an option, but if it is not, then the compiler would not have been able to use this for optimisation anyway.
But in summary, sample-based processing can be slow, and even with the above optimisation, it is still not as quick as block processing, with a small caveat: Mac silicon seems to perform much better on smaller block sizes.
This isn’t something you can say is true in general. Block processing has its own tradeoff, such as requiring additional intermediate buffers and therefore potentially having more cache misses.
The only way to know what is fastest is to implement the different versions and profile them under realistic circumstances.
I’m no optimisation expert, but I recall finding gains using the load-compute-store approach. It seems counter-intuitive but somehow compilers prefer the intermediate variable:
for (auto s = 0; s < bufSz; ++s)
{
/// load
auto sample = buffer[s];
/// compute
sample = processSample(sample);
/// store
buffer[s] = sample;
}
Can be simplified further of course, but this shows the principle.