I think that, in the DSP area at least, Juce should be trying to introduce more generic code and less code involving overriding virtual classes.
To make sure we’re all on the same page, generic programming is C++'s version of duck typing and uses template functions or methods; and there is also object-oriented programming, which uses classes, inheritance and virtual methods - a bit more is here.
Generic programming is harder to write but gives significantly better results for CPU-critical code like DSP.
Love Your Optimizer
Reason one is the optimizer. The stats in the last place I worked were that the optimizer speeded your “average” C++ program by a factor of 6 - that’s such a huge factor that making sure that the optimizer has enough information to do its work is critical for high-performance code.
Unfortunately, the C++ optimizer is powerless before the discontinuity of a virtual method call. It has no idea what code is actually being executed at run time, so not only can’t it inline the code, but it has to assume that its registers are all trashed. It is depressing when you step through sequences of virtual method calls at an instruction level and see the compiler reloading the same values into its registers at each call level.
Compare and contrast with what the optimizer can do with generic code. The optimizer knows at code-generation time exactly what C++ code is in each function or method and even has ideas as to the possible range of values that can occur. So not only can it arrange to never reload registers, to unroll loops (if the count is a template parameter) and to simply inline a large number of small functions and never have to disturb the stack or registers at all - but it can actually make code completely vanish, lots of code you wouldn’t imagine it could possibly kill.
Here’s a cool case I ran into fairly recently (though the explanation is modified from an example in this amazing series (which every C++ should read and memorize).
[code]void fillChild(Foo* foo) {
if (!foo->child) {
// Tons of difficult code here!
}
}
void fillFoo(Foo* foo) {
fillChild(foo);
fillMore(foo);
}
void main() {
Foo* foo = makeFoo();
foo->child->setName(“kid”);
fillFoo(foo);
printFoo(foo);
}[/code]
Now, if you looked at the optimized result, you might be surprised to discover that fillChild is completely missing (under at least some compilers and optimization settings). It seems like a mistake, and you see bugs filed against the optimizer like this, but the fact is that it’s perfectly correct, and part of a large class of optimizations that can dramatically accelerate your generic code.
In this case, it hinges on the idea of undefined behavior - yes, C++ is deliberately under-defined to allow the optimizer to have a fighting chance.
Some bad actions on the part of a program have undefined results: the compiler can do “whatever it wants” if this happens - in other words, the optimizer simply acts as if those cases can never happen (because you’ve done something bad like reference an undefined pointer or had signed arithmetic overflow and there’s no real way to recover).
Look at the second line of main(). If foo->child is NULL, then you are attempting to dereference a NULL pointer - which has “undefined results”, probably a SEGV. That means the optimizer can quite rationally assume that if you get past that line then foo->child cannot be NULL. Then, while attempting to inline fillFoo and fillChild, it discovers that the if statement in fillChild never fires because foo->child is known to be non-NULL, and then optimizes out the whole thing.
tl; dr: the optimizer is very important for performance; use generic code to get best results.
(And if you’re compiling for MMX or any other numeric coprocessor, this all goes treble for you. You want to be fitting as much of your work as you can into your coprocessor pipeline, and if the code generator can see all the code it gets, it has a far better chance of really filling your bandwidth to your coprocessor!)
Mix-ins, extensibility and plain old data.
Classes have the issue that they tend to grow extremely large, which makes them more fragile and harder to use.
In DSP, you have a common issue that your new code needs some methods from one class and some methods from another. This is common when there are multiple similar formats coming from different versions of a standard.
If I remember e.g. the mpg123 code, I believe it uses “double-dispatching” - tables with function pointers. This is an elegant solution to the problem (and the syntax is very nice because it looks like a function call).
Generic code can get the same results without the function pointer. Because generic functions only have to refer to the services and data members they actually use, they can operate on any class you like as long as you are careful to satisfy their needs. In specific, code that’s never actually required to be compiled to get your results isn’t ever compiled, i.e. “parts you don’t use don’t need valid implementations”.
Let’s look at AudioSampleBuffer as a good example.
I have long argued that an interleaved memory format would be more efficient for it - I also need a 16-bit sample version of this for my CD buffering work, which I did by creating my own class with just the methods I need.
I think a 16-bit per sample AudioSampleBuffer would be a Generally Useful Thing - or a 24-bit AudioSampleBuffer would be very handy for recording purposes… or even an mp3-compressed AudioSampleBuffer…?
There is also the matter of the float* vs short* cast that is at the heart of AudioSampleBuffer’s ability to take either integer or floating point samples. Even though everyone believes that this “should work every time,” the behaviour of such casts is undefined in the C++ standard (you’re supposed to use a union).
Why? Wait for it… the optimizer! Basically, the optimizer needs to know that writing through a pointer-to-short over here isn’t going to change the read through a pointer-to-float over there, so it doesn’t have to keep checking, say, array values it has in a register.
The fact is that AudioSampleBuffer is already a generic class struggling to get out - a class with two implementations, one for float samples, one for int samples, and that’s why the cast exists.
Putting a virtual method call into the memory model of AudioSampleBuffer would slow it down substantially - but if AudioSampleBuffer2 were generic on its memory model, which offered just a few simple services, then I or some other person could create our own memory model and implement just the services we need to get our results, or even just enough to run performance tests.