Per sample based processing

almostErique · September 10, 2025, 8:06pm

I apologize for asking a question that I’m sure has been asked a billion times already, but I’d really like to see an example of a juce project that does per-sample based processing (I understand why block based is preferred for VSTs, but that is not my use-case). All of the tutorials show block-based audio processing but imply that per-sample based is also possible. I’d love to see that. Is it just a matter of setting the buffer size down to 1 sample, or is there something more? I also understand that per-sample based procesing puts constraints on how much processing can be done, so I would like to see what the prefered threading model is to make sure the audio is handled smoothly while the UI does its thing

I appreciate the help

JeffMcClintock · September 10, 2025, 8:24pm

A plugin doesn’t have control of the block size that is passed from the DAW however, one can write a processblock method that iterates over that block and performs single-sample processing on the samples. for example if the DAW sent you 100 samples, you could have the processBlock method loop over those 100 samples and call forward 100 calls to a ‘processSingleSample()’ method.
hope that makes sense.

almostErique · September 10, 2025, 8:45pm

I get that. I am not writing a VST (or any other plugin). The idea is that this will be on a dedicated hardware platform

xenakios · September 10, 2025, 8:59pm

It probably wouldn’t be significantly different in that case either, you would still just write your audio callback as suggested by Jeff. Even on dedicated hardware, you are likely not going to be getting the callbacks sample by sample but rather as buffers of samples. And in case the callbacks actually are sample by sample, you would know that from the documentation of the hardware.

almostErique · September 10, 2025, 9:31pm

while I appreciate the non-answers to the question I asked, i just want an example of sample rate processing in JUCE. I apologize if my question was too hard to understand. sheesh

kerfuffle · September 10, 2025, 9:37pm

That’s not a great way to ask for help.

To do per-sample based processing you just loop over the samples from the block. There’s not much more to it.

Here’s an example of a plugin that does this: GitHub - hollance/krunch: Lowpass filter + saturation audio effect plug-in

cpr2323 · September 10, 2025, 10:37pm

I’m not sure your question is hard to understand, as they have all been good answers based on the question. In what ways are these answers not specific to what you want to understand? With block processing you iterate over the samples in a buffer, and for each sample you do the work you need to do on the sample. If you have an API that is passing in a single sample (instead of a buffer) you simply do the work you did for each sample in the buffer for the block processing version. Since the answers that explain this are not helpful to you, I am wondering what the disconnect is?

What parts of JUCE do you want to use in your hardware project? Do you know what hardware you will be using, and which parts of JUCE work on this platform? I’m not using JUCE for hardware (but I do plenty of it on desktop), but I do use a hardware platform called the Daisy Seed, from Electrosmith. It’s audio subsystem has a callback mechanism very similar to a plugin, in which you get an input buffer and an output buffer. You specify the buffer size when you initialize the audio subsystem, and if your DSP code is capable of it, you could easily set it to 1, but the code would be the same if you set it for more, as you would simply iterate over the buffer size, which would work for 1 or more.

Again, if none of these answers are what you are looking for, maybe try to explain the specifics you feel you don’t understand. Perhaps you have some JUCE code you have written that you could explain where you think it would need to be different for the hardware project?

benvining · September 11, 2025, 3:19am

for (auto s = 0; s < buf.getNumSamples(); ++s)
  processSample(buffer[s]);

there you go

jimc · September 11, 2025, 8:42am

The insides of all block based processing is sample based processing.

It’s also sample based processing if you just pass a block size of one and pretend the for loops don’t exist.

Block based processing tends to be faster as the CPU cache, branch predictor and all the other fancy stuff works better in smaller loops when a little loop of code is doing the same thing over and over to the same data without using an insane number of variables.

If you’re writing on low performance MCUs without CPU caches and or you have CPU power to spare and don’t need the performance, or you’re doing something triivial enough then sample-by-sample is just fine.

Mrugalla · September 11, 2025, 5:58pm

processing the sample loop on the outside and the channels on the inside is typically just an in-between state of a project because some algorithms are easier to write like that initially. then you rewrite it to have the channel loop on the outside to be better aligned with the memory of the buffers for performance. but that doesn’t mean, it’s any less sample accurate than the other way. it can always accomplish the same thing practically

Curlymorphic · September 12, 2025, 3:35pm

Having done this recently for a Juce-based modular synth, designed for the iPad, I did run into some expected performance issues. There is no sequential buffer to be simd optimised. My solution was to use a SIMD library and pack the channels into SIMD registers. This then required writing the dsp using the library. The packing and unpacking is the expensive bit, but if the DSP is non-trivial, there are gains to be had, with the trade-off of not using any JUCE DSP code beyond processBlock(). and all the head scratching involved with learning the SIMD library

This will depend on the hardware being used, as SIMD may not be an option, but if it is not, then the compiler would not have been able to use this for optimisation anyway.

But in summary, sample-based processing can be slow, and even with the above optimisation, it is still not as quick as block processing, with a small caveat: Mac silicon seems to perform much better on smaller block sizes.

kerfuffle · September 13, 2025, 8:59am

This isn’t something you can say is true in general. Block processing has its own tradeoff, such as requiring additional intermediate buffers and therefore potentially having more cache misses.

The only way to know what is fastest is to implement the different versions and profile them under realistic circumstances.

almostErique · October 6, 2025, 2:41pm

just a little update, I basically got it working, but this answer is incorrect.

Unlike the process method which takes a buffer and modifies it, the processSample method returns a value which you can then update the buffer with.

so:

for (auto s = 0; s < buf.getNumSamples(); ++s)
buffer[s] = processSample(buffer[s]);

Thanks for the help though, appreciated

thecargocult · October 6, 2025, 11:02pm

I’m no optimisation expert, but I recall finding gains using the load-compute-store approach. It seems counter-intuitive but somehow compilers prefer the intermediate variable:

for (auto s = 0; s < bufSz; ++s)
{
/// load
auto sample = buffer[s];


/// compute
sample = processSample(sample);

/// store
buffer[s] = sample;
}

Can be simplified further of course, but this shows the principle.

Topic		Replies	Views
What exactly counts as block based processing? Audio Plugins	10	871	September 6, 2023
Processing Audio: Sample by Sample or Buffer by Buffer? General JUCE discussion	8	3958	March 15, 2017
Block-based processing: what is acceptable? General JUCE discussion	11	1453	May 23, 2022
ARGH! I don't like the way process processreplacing works General JUCE discussion	13	1682	May 12, 2017
How to do double buffered, block based processing? Audio Plugins	5	1429	November 20, 2020

Per sample based processing

Purchase

Discover

Learn

Support

About

Events

Per sample based processing

Related topics

Purchase

Discover

Learn

Support

About

Events