Variadic templates VS. object file size

Hi everybody,

I am currently writing a system that uses variadic templates to build up DSP algorithms. It started off from the idea of the juce::dsp::ProcessorChain class template and was extended by different container types which can be nested:

using my_dsp = chain<gain, reverb, split<compressor, chain<delay, gain, feedback>>>;

// 
using fx = split<gain, my_dsp>;

fx myProcessor;

// 1, 2, 0 points to the compressor.
myProcessor.registerNode<1, 2, 0>(comp, "comp");

myProcessor.get("comp").setParameter("Threshhold", -12.0);
myProcessor.get("comp").setParameter("Ratio", 2.0);

It works fine, however there is one problem that might get more significant over the time and that is the amount of code that this variadic template system is creating: even with the proof of concept templates I created for debugging (less than 40 nodes), I am hitting the (weird) /bigobj limit in VS Studio and I am afraid that if I start building a library of “meta” nodes which then can be reused, I am hitting object file sizes (and compilation times) which are ridiculous.

How do people work around this issue when using the juce::dsp::ProcessorChain class (or similar concepts of variadic templates)?

I’ve made a bit progress that might be interesting:

First of all using the Pimpl idiom for “meta” classes helps. I lose the ability of the compiler optimizing the chains for these use cases (plus there’s a minor performance penalty for heap access), but it’s a viable tradeoff via exponential code generation.

// header:

class wrapped_meta:
{
    wrapped_meta();
    ~wrapped_meta();

    void process(ProcessData& d);
    void* pimpl;
}

// Source.cpp

using MyType = container::chain<gain, compressor, container::split<whatever, delay>>;

wrapped_meta()
{
    pimpl = new MyType();
}

~wrapped_meta()
{
    delete reinterpret_cast<MyType*>(pimpl); // Ah, the good old raw delete...
}

wrapped_meta::process(ProcessData& d)
{
    reinterpret_cast<type*>(pimpl)->process(d);
}

// Usage in other file

// this signal chain is not able to optimize it on compile time due to the opaque pimpl pointer,
// but rather treats it as encapsulated entity.
using MyChainUsingMeta = container::chain<gain, wrapped_meta, phaser>

Now the most obvious flaw of this approach is the excruciating ugliness of the void* pointer and the usage of reinterpret_cast and raw new and delete calls. However the alternative would be a forward declaration of MyType in the header file, which would lead to the template instantiation in the header so we would be back at square one. Am I missing another option here?

The other thing I noticed is that there’s a significant reduction of the object file size if I choose another implementation technique for the variadic template containers:

// JUCE way (heavily simplified)

template <class Processor, class SubType> 
class element
{
    void process(ProcessData& d)
    {
        obj.process(d);
    }

    Processor obj;
};

template <class First, class... Others> 
class chain: public element<First, chain<Others...>
{
    using Base = element<First>;

    void process(ProcessData& d)
    {
        Base::process(d);
        others.process(d);
    }

    chain<Others...> others;
};

// Alternative implementation

template <class... Processors>
class AlternativeChain
{
    std::tuple<Processors...> processors;

    void process(ProcessData& d)
    {
        for_each(processors, [d](auto& p){ p.process(d); });
    }
}

the for_each is also heavily simplified and includes compile-time iteration over the std::tuple using a std::index_sequence - also the lambda is a member function.

Using the alternative implementation reduces the object file size about 50% (plus the implementation gets much easier to understand). Now my variadic template skills hardly extend beyond copy & pasting stuff from Stack overflow, but the logical explanation is that it does not have to generate classes for all the base classes of chain:

chain<gain, amp, reverb, chorus>;

=> creates classes:

class c1; // chain<chorus>
class c2; // chain<reverb, c1>
class c3; // chain<amp, c2>
class c4; // chain<gain, c3>

Instead, it creates one class containing a std::tuple and then just one process method implementation per tuple element. Assuming this is correct, is there a reason why the juce::dsp::ProcessorChain class used the other approach?

I don’t have a perfect way of doing this, but some things you can try and decide whether they’re worth it.

You can avoid the raw void*/new/delete by forward-declaring a nested struct Impl; in the header, and then defining it in the .cpp as:

struct wrapped_meta::Impl final : container::chain<gain, compressor, container::split<whatever, delay>> {};

Now you can use unique_ptr with it. The drawback is that chain has to be non-final.

You can avoid the heap indirection by using std::aligned_storage for the impl. In the header, you hardcode the needed size. In the .cpp where container::chain is known, you static_assert that your hardcoded size is indeed equal to the size needed for the container::chain<...>. The drawback is that you have to hardcode the size. And you’ll probably have to hardcode different sizes for different platforms/compilers/versions. Also you’re back to (placement) new, delete and reinterpret_cast.

Maybe you’ve considered this, but an alternative would be to use a tagged union for your processors, e.g.:

using Processor = std::variant<gain, compressor, delay, /* ... */>

Now your chain would have a std::array<Processor, N> instead of the std::tuple. So for example, all chains of size 3 are the same class (one template instantiation), no matter what the 3 child processor types are. It doesn’t need heap access, it allows inlining, but you need a switch/if branch on every access into the variant.
And I’m not sure if the variant can do chains inside chains, maybe wrapping is needed for that.

Hopefully some of this is helpful. I tried to keep it short, but let me know if you need more info on something :slight_smile:

You can avoid the raw void*/new/delete by forward-declaring a nested struct Impl; in the header, and then defining it in the .cpp

Thanks, I’ll try that. Since chain has no virtual functions, making it non-final shouldn’t be a (performance) issue.

The drawback is that you have to hardcode the size.

Yeah, this isn’t very feasible - I am targeting all platforms, and I’ve did something similar using placement new in the past, which caused all types of problems - I ended up with 27 different object sizes (Win-Mac-Linux / x64-x86 / Debug-Release), and there’s no way to detect them without going into each configuration, do a constexpr i = sizeof(T) and then hover over the text and type the value from the popup (if the IDE supports it).

Also this is just one part of the entire thing - the idea is to have a dynamic prototyping environment that spits out these classes, so in the end you don’t have to touch C++ at all.

Maybe you’ve considered this, but an alternative would be to use a tagged union for your processors

std::variant is C++17 AFAIK and I have to stick for C++14 for now. However it might be an idea on the horizon. Forgive me if I don’t understand it fully, but is the branching done on compile time or run-time?

Ouch! Yeah…I figured that one wouldn’t be very practical if this is a part of HISE.

std::variant is C++17, yeah. There are implementations for C++11, but then that would be a new dependency to drag in. The branching is done at runtime, unless the compiler can find a way to know the type that was used in the particular case, and optimise it away.