Oscillator per-sample process method uses (slow) std::function -- why?

Hi, paying JUCE customer here. I was excited to learn of the new DSP modules – especially because of idea of using templates to string together small modules of code (gain, etc.) without any performance overhead, thanks to C++ templates and inlining.

Looking into Oscillator, however, I was dismayed to see the heavy lifting being done by a std::function. I was hoping if someone could provide a rationale for why the class wasn’t simply templated on the generator functor itself, thus removing the need for a virtual call in a tight inner loop. Has anyone done any benchmarks as to the cost of extra indirection?

What’s particularly sad about this is the class helpfully offers to compute a lookup table based on the user-provided function, but this is stuffed inside the same costly std::function. To my mind, it would make much more sense to either a) templatize on the generating functor, or b) accept a std::function but only for the purposes of calculating a lookup table which itself is not hidden behind an std::function (just call it directly!).

I hope my tone is not over-the-top here, but it feels like perf is left on the table for no reason here :slight_smile:

3 Likes

I had a similar thougt, the whole template approach to remove any virtual calls, and then again a std::function for every sample - — somehow inconsequential.
BTW : maybe it would be more sufficient, if the function includes the per sample loop, than again the function can include intel ipp functions for the sin conversions in a row.
(Maybe faster than look-up tables)

Hmm, yes, this does look like something that needs a bit of optimising, thanks for pointing it out.

Having the function type as a class template parameter would make it a bit syntactically clunky, but perhaps it should only use a lookup table and not offer the option to call the function in the callback. @IvanC what do you think of that as an idea?

mhh, a lookup up table isn’t often precise enough, also might be slower than the function itself (memory access)

a lambda which takes a read/write sample-pointer and a numSamples parameter would be imho the best option, than the function it self can use SIMD or Intel IPP accelerations to generate sinus or other kind of tables.

I was trying to add the “use lookup”-switch as template argument, but got stuck with the syntax. Shouldn’t something like that be possible:

template <typename SampleType, bool useLookup>
class Oscillator
{
//...

    /** Returns the result of processing a single sample using a lambda. */
    SampleType JUCE_VECTOR_CALLTYPE processSample<SampleType, false> (SampleType) noexcept
    {
        jassert (isInitialised());
        auto increment = MathConstants<NumericType>::twoPi * frequency.getNextValue() / sampleRate;
        return generator (phase.advance (increment) - MathConstants<NumericType>::pi);
    }

    /** Returns the result of processing a single sample using a lookup table. */
    SampleType JUCE_VECTOR_CALLTYPE processSample<SampleType, true> (SampleType) noexcept
    {
        jassert (isInitialised());
        auto increment = MathConstants<NumericType>::twoPi * frequency.getNextValue() / sampleRate;
        return (*table) (phase.advance (increment) - MathConstants<NumericType>::pi);
    }

btw. out of curiosity, I used the benchmark link, I just learned:
http://quick-bench.com/QLek0OPw8FFzUk9QwbwnZGWkFt8

cool link!

just out of curiosity, why - MathConstants<NumericType>::pi ?

@daniel When I opened your link, the graphs showed the functor version as being much slower, but then I hit “clear cached results” and ran it again, and all versions came out the same!

…and actually, running it multiple times gives all kinds of random results - I think there’s a lot of jitter in the benchmarks. But on average, they all look pretty much the same.

Very confusing, so better forget about the link :wink:
I was trying with and without optimisation, and it also made a difference, if I did the increment inside the argument…
So the first test I did was probably the most accurate, when the direct call vs. functor ended up 60 to 65 ratio, which was what I expected, a little but measurable overhead…
So when playing around and adding all kind of tests I probably spoiled the test setup.

What about the idea of two specialisations, one for the generator function and one for the lookup table?

TBH there are probably more elegant ways to make the lookup table optional than having a flag like that. e.g. we could have remove all knowledge of lookup tables from the Oscillator class, and let it either take a std::function or some other kind of generator object that could abstract away the lookup table. Looking at the class again now, there are better ways it could have been done…

1 Like

Thanks for the feedback.
My idea came from the thought to move the decision of using a table or not to the compile time rather than runtime (using a delegate is a bit like using a lambda, just that it uses a fixed interface).
But you are right, there are a lot options to choose from.