Oscillator per-sample process method uses (slow) std::function -- why?

dsp_module

#1

Hi, paying JUCE customer here. I was excited to learn of the new DSP modules – especially because of idea of using templates to string together small modules of code (gain, etc.) without any performance overhead, thanks to C++ templates and inlining.

Looking into Oscillator, however, I was dismayed to see the heavy lifting being done by a std::function. I was hoping if someone could provide a rationale for why the class wasn’t simply templated on the generator functor itself, thus removing the need for a virtual call in a tight inner loop. Has anyone done any benchmarks as to the cost of extra indirection?

What’s particularly sad about this is the class helpfully offers to compute a lookup table based on the user-provided function, but this is stuffed inside the same costly std::function. To my mind, it would make much more sense to either a) templatize on the generating functor, or b) accept a std::function but only for the purposes of calculating a lookup table which itself is not hidden behind an std::function (just call it directly!).

I hope my tone is not over-the-top here, but it feels like perf is left on the table for no reason here :slight_smile:


#2

I had a similar thougt, the whole template approach to remove any virtual calls, and then again a std::function for every sample - — somehow inconsequential.
BTW : maybe it would be more sufficient, if the function includes the per sample loop, than again the function can include intel ipp functions for the sin conversions in a row.
(Maybe faster than look-up tables)


#3

Hmm, yes, this does look like something that needs a bit of optimising, thanks for pointing it out.

Having the function type as a class template parameter would make it a bit syntactically clunky, but perhaps it should only use a lookup table and not offer the option to call the function in the callback. @IvanC what do you think of that as an idea?


#4

mhh, a lookup up table isn’t often precise enough, also might be slower than the function itself (memory access)


#5

a lambda which takes a read/write sample-pointer and a numSamples parameter would be imho the best option, than the function it self can use SIMD or Intel IPP accelerations to generate sinus or other kind of tables.


#6

I was trying to add the “use lookup”-switch as template argument, but got stuck with the syntax. Shouldn’t something like that be possible:

template <typename SampleType, bool useLookup>
class Oscillator
{
//...

    /** Returns the result of processing a single sample using a lambda. */
    SampleType JUCE_VECTOR_CALLTYPE processSample<SampleType, false> (SampleType) noexcept
    {
        jassert (isInitialised());
        auto increment = MathConstants<NumericType>::twoPi * frequency.getNextValue() / sampleRate;
        return generator (phase.advance (increment) - MathConstants<NumericType>::pi);
    }

    /** Returns the result of processing a single sample using a lookup table. */
    SampleType JUCE_VECTOR_CALLTYPE processSample<SampleType, true> (SampleType) noexcept
    {
        jassert (isInitialised());
        auto increment = MathConstants<NumericType>::twoPi * frequency.getNextValue() / sampleRate;
        return (*table) (phase.advance (increment) - MathConstants<NumericType>::pi);
    }

btw. out of curiosity, I used the benchmark link, I just learned:
http://quick-bench.com/QLek0OPw8FFzUk9QwbwnZGWkFt8


#7

cool link!

just out of curiosity, why - MathConstants<NumericType>::pi ?


#8

@daniel When I opened your link, the graphs showed the functor version as being much slower, but then I hit “clear cached results” and ran it again, and all versions came out the same!


#9

…and actually, running it multiple times gives all kinds of random results - I think there’s a lot of jitter in the benchmarks. But on average, they all look pretty much the same.


#10

Very confusing, so better forget about the link :wink:
I was trying with and without optimisation, and it also made a difference, if I did the increment inside the argument…
So the first test I did was probably the most accurate, when the direct call vs. functor ended up 60 to 65 ratio, which was what I expected, a little but measurable overhead…
So when playing around and adding all kind of tests I probably spoiled the test setup.

What about the idea of two specialisations, one for the generator function and one for the lookup table?


#11

TBH there are probably more elegant ways to make the lookup table optional than having a flag like that. e.g. we could have remove all knowledge of lookup tables from the Oscillator class, and let it either take a std::function or some other kind of generator object that could abstract away the lookup table. Looking at the class again now, there are better ways it could have been done…


#12

Thanks for the feedback.
My idea came from the thought to move the decision of using a table or not to the compile time rather than runtime (using a delegate is a bit like using a lambda, just that it uses a fixed interface).
But you are right, there are a lot options to choose from.