Passing std::vector as params - real time friendly


#1

Hi all,

I just want to sanity check something in my own mind and the more learned C++ guru’s on the forum will more than likely set me straight as per usual.

When passing a std::vector to a function am I right in saying that most compilers will treat std::vector as a type and not handle the vector in the same was as an array being passed ? i.e. take the address.

So:

void foo(std::vector myVector);

Am I right in thinking the above call passes the vector by value and thus results in a copy constructor ?

Is anyone able to clarify for me whether this results in a dynamic / heap memory allocation when copy constructing the vector and it’s elements for the local function scope ?

In other words would it be considered an unsafe approach for real time audio to call such a function from the the callback thread ?

I’m using a library in a project for audio feature extraction and it uses std::vector as an argument for pretty much all time and frequency domain buffers and containers.

EDIT: Trying to work out whether I need to change the API in the library to use const references to the vector parameters.

Cheers all.


#2

Yes, passing a std::vector by value will allocate memory when the contents of the source is passed in (via the copy constructor).

You can get around this by moving (std::move) in to the argument as long as you no longer need the std::vector after calling the method.

However, it sounds like you should be doing this feature extraction on another thread to avoid the problem (and spread CPU load if the feature extraction takes a long time) rather than editing a 3rd party library. Suppose the library needs to modify the contents of the std::vector you’ve passed in?


#3

Hi Dave,

Thanks a lot for the info.

The trouble is I need to extract the features per buffer in real time. The project involves real-time audio classification on short term audio frames so the feature extraction needs to happen on the callback thread.

As far as I can tell the contents of the vectors are never modified, the library is pretty small and consists of just a few modules.

The actual routines run quickly enough to perform on the callback thread and use the Kiss FFT and just compute things like spectral centroid and some MFCC’s.

I think it’s more a case of the author of the classes not having considered it at this point. I’ve been in touch with him once or twice so might ask his thoughts and point him to this thread for a quick read. If not maybe just roll out my own routines.

Thanks a bunch for the advice.


#4

Ok, it sounds like the library needs a bit of a tweaking then.

Thinking about it, you can’t use move semantics here because if you move in to the methods they will almost certainly have to deallocate them after use so they don’t all stack up.

Passing by const reference is probably the most sensible solution.
(Most libraries actually pass by raw const float* with a number of samples as this doesn’t tie you to a specific container type).


#5

Hi Dave,

Yeah cheers, that was my thoughts on it as well to be honest. I wonder whether the author normally calls the functions from a GUI thread as I know he uses the library in some visualisations. May be that a blocking issue has never occurred or it simply hasn’t been used from the callback thread.

The lib is declared to be a real-time feature extraction library though so hoping it’ll be of use mentioning it to him and be added or OK’d as a pull request.

Thanks for the help.

P.S. moved to Linux recently for all my JUCE work and got myself a copy of Tracktion whilst I was at it. Loving it :wink:

Josh


#6

Great stuff! Lots more cool things to come with Tracktion…


#7

Cheers @dave96

Quick question. What would be the behaviour when returning a std::vector from a function in the instance where the result was being assigned to a pre-allocated vector ?

i.e.

Function signature:

std::vector foo();

Assignment (a vector with pre-allocated heap space via resize or similar):

myPreAllocatedVec = foo();

Would this result in an element wise copy or move ?

Just wondering whether allocation/deallocation would occur as a result in this instance. I’m a little lost as to whether all compilers under C++11 will simply move the elements without any dynamic allocation for the return variable.


#8

To add some context lets take the function below from the library I am using:

template <class T>
std::vector<T> MFCC<T>::melFrequencyCepstralCoefficients (std::vector<T> magnitudeSpectrum)
{
    std::vector<T> melSpec;

    melSpec = melFrequencySpectrum (magnitudeSpectrum);

    for (int i = 0; i < melSpec.size(); i++)
    {
        melSpec[i] = log (melSpec[i] + (T)FLT_MIN);
    }

    return discreteCosineTransform (melSpec);
}

In this instance if I were to change the above function so that the vector param is passed by reference and the melSpec vector is pre-allocated would you say that no dynamic memory allocations should occur ? I’m just trying to understand if provided I assign the return value of this function to a pre-allocated vector there will be no copy/allocation as the contents of the return value will simply be moved under C++11. Would the version below be real-time safe ? It might be a more agreeable change to the author than using T pointers and returning T[] arrays on the stack.

template <class T>
std::vector<T> MFCC<T>::melFrequencyCepstralCoefficients (const std::vector<T>& magnitudeSpectrum)
{

    melSpec = melFrequencySpectrum (magnitudeSpectrum);

    for (int i = 0; i < melSpec.size(); i++)
    {
        melSpec[i] = log (melSpec[i] + (T)FLT_MIN);
    }

    return discreteCosineTransform (melSpec);
}

Any advice much appreciated. Will have a quick refresh/read up on move semantics while waiting for any replies.

EDIT: I think in this case the melSpec vector will only be returned with move semantics by explicitly calling return with std::move(melSpec) as otherwise the compiler will copy in the return statement.

Probably be a lot easier to just remove all the vectors…using const reference for params and returning via move semantics would be a non-breaking change for existing code using the library in question though I think.

Unless the author would be open to having the functions just return a const vector reference to the MFCC member vectors (melSpec etc.). Not sure this would be agreeable though so std::move might be the way to go.

Cheers


#9

It sounds like there’s a bigger problem here which is a matter of ownership.
It seems the library currently gets around this issue by just taking copies of std::vectors which means it owns the copy and just destroys it when it’s no longer needed (or passes it out of functions).
This is usually a good approach, but not in audio processing when you need to avoid memory allocations/deallocations.

It’s unclear from your example what melFrequencySpectrum is doing. E.e. does it receive by value, return by value? This is important because the other thing you need to be aware of is “return value optimisation” and “named return value optimisation”.

This is basically when you have a temporary inside a function which you’re returning, the compiler can just use the object on the stack that will be returned in to as this temporary, thus saving the temporary construction. Read up on this for more information about it but the key thing is explicitly using std::move to return the object will disable NRVO (because you’ve explicitly told the compiler to move it).
Further, even if RVO/NRVO can’t be utilised, the compiler is very good at automatically moving objects when possible so you shouldn’t have to write moves by hand when returning objects.

In essence, don’t use std::move to return. Use the correct semantics for what you mean i.e. “I want to return this object that is about to expire”, let the compiler decide the most efficient way to do this for you.

More generally, when you’re passing std::vectors around to and from libraries it’s very difficult to determine that they won’t mutate the size in some way, thus allocating. The only ‘soft’ way around this is for the library to provide comments about the behaviour.

This is why you often see either raw float* arrays with a size passed to library functions (to avoid the coupling to any specific container) or at the very least passing by reference (const for input parameters, non-const for output parameters).

I’ll admit this isn’t the cleanest approach but it’s probably the safest at the moment. Ideally, a span would be used (a wrapper around a contiguous set of data with a size). I don’t think this made it in to C++17 (but std::string_view did seem to) but you can find one in the GSL.

For now though I would stick to raw pointers, making them const if the function doesn’t mutate them.


#10

Hi @dave96,

Thanks a bunch for taking the time to explain.

So the melFrequencySpectrum function is below:

template <class T>
std::vector<T> MFCC<T>::melFrequencySpectrum (std::vector<T> magnitudeSpectrum)
{
    for (int i = 0; i < numCoefficents; i++)
    {
        double coeff = 0;

        for (int j = 0; j < magnitudeSpectrum.size(); j++)
        {
            coeff += (T)((magnitudeSpectrum[j] * magnitudeSpectrum[j]) * filterBank[i][j]);
        }

        melSpectrum[i] = coeff;
    }

    return melSpectrum;
}

The only reason is suggested std::move() was due to melSpec being a member variable. In the current version of the library melSpec is declared as a local/temporary in the melFrequencyCepstralCoefficients function as below:

template <class T>
    std::vector<T> MFCC<T>::melFrequencyCepstralCoefficients (std::vector<T> magnitudeSpectrum)
    {
        std::vector<T> melSpec;

        melSpec = melFrequencySpectrum (magnitudeSpectrum);

        for (int i = 0; i < melSpec.size(); i++)
        {
            melSpec[i] = log (melSpec[i] + (T)FLT_MIN);
        }

        return discreteCosineTransform (melSpec);
    }

As I understand it return value optimisations will work only for local temporaries and rvalues ? So for a member variable you would need to either return a reference/const reference to the member vector or call std::move(returnVector), assuming that it is not an issue for the member variables contents to be moved out.

These methods would be called once in every processBlock() call for example so the vectors returned by either const reference or via std::move() would be re-populated / modified every time these methods are called which should mean the move would OK ?

At the moment I’ve just changed things so that const vector references are returned from all functions like those above.

So things have been altered to something like the below:

//==================================================================
template <class T>
const std::vector<T>& MFCC<T>::melFrequencyCepstralCoefficients (const std::vector<T>& magnitudeSpectrum)
{
    melSpecCoefficients = melFrequencySpectrum (magnitudeSpectrum);

    for (int i = 0; i < melSpecCoefficients.size(); i++)
    {
        melSpecCoefficients[i] = log (melSpecCoefficients[i] + (T)FLT_MIN);
    }

    return discreteCosineTransform (melSpecCoefficients);
}

//==================================================================
template <class T>
const std::vector<T>& MFCC<T>::melFrequencySpectrum (const std::vector<T>& magnitudeSpectrum)
{
    for (int i = 0; i < numCoefficents; i++)
    {
        double coeff = 0;

        for (int j = 0; j < magnitudeSpectrum.size(); j++)
        {
            coeff += (T)((magnitudeSpectrum[j] * magnitudeSpectrum[j]) * filterBank[i][j]);
        }

        melSpectrum[i] = coeff;
    }

    return melSpectrum;
}


//==================================================================
template <class T>
const std::vector<T>& MFCC<T>::discreteCosineTransform (const std::vector<T>& inputSignal)
{
    T N = (T)inputSignal.size();

    T piOverN = M_PI / N;

    for (int k = 0; k < dctOutputSignal.size(); k++)
    {
        T sum = 0;
        T kVal = (T)k;

        for (int n = 0; n < inputSignal.size(); n++)
        {
            T tmp = piOverN * (((T)n) + 0.5) * kVal;

            sum += inputSignal[n] * cos (tmp);
        }

        dctOutputSignal[k] = (T)(2 * sum);
    }

    return dctOutputSignal;
}

Edit: in discreteCosineTransform the variable dctOutputSignal is also a pre-allocated member vector.

Obviously the above is probably not ideal (pretty scrappy) but would you say this approach can be safely called from the callback thread for now ? The const vector references passed as parameters are never modified or resized so everything can be passed as const. It’s possible to do this for all the methods in the library as far as I can tell so I have taken this approach throughout in my own fork. The magnitudeSpectrum vector for instance is a member variable of the MFCC<T> objects owning class, so I think I’m right in saying the only way to avoid copying is to pass it as a const reference as above ?

Basically there are a few functions in the library where vectors like melSpec are created as local temporaries inside the function scope and either have `std::vector.resize()``` called or are assigned to as per the function above. I think I’m right in saying this is all no-go behaviour for real-time audio…

As a quick and dirty solution to get things working for my plugin I basically changed vectors like melSpec to be member variables which are now sized/allocated and initialised in their owning class’s constructor (not inside any real-tme calls).

I would go with the melFrequencyCepstralCoefficients(const T* buffer, size_t bufferSize) approach also but for now I’m just trying to make sure I’ve got my head around move semantics and return value optimisations correctly. Keeping the vectors and changing to some const references was a much easier/quicker change.

I’d just like to make sure that the approach I have taken above is actually real-time safe. The author of the lib is looking into all these but is hesitant to remove the use of std::vector for the moment. However he does have a plugin using the library so I think he’ll be up for the necessary changes.

I’ll probably consider rolling out my own feature extraction routines after the prototype stage if the library has not been updated in time but for now I’d like to keep using my modified fork provided I’ve got my head around things correctly and I’m not causing any blocking behaviour.

Sorry I realise this was a pretty large post/reply. Really appreciate the lessons!

Cheers


#11

Well it’s still very unclear to me what the library adheres to.
Yes, in theory if you accept parameters as const& and return them by const& there ‘could’ be no allocation going on. But then it’s still up to the library implementation to enforce this.

Well this sounds dodgy to me. If you move out of an object you can’t use it again. I.e. you can’t call a method which uses that member again. Surely this means that you can’t process more than one block?

resize will allocate (if the size actually gets bigger). Assigning will usually allocate but it might not if the vectors are already the same size.


#12

So would it not be possible to just update the member vector that had previously been moved in the next processBlock call ? i.e. if that vector is only ever written to and accessed in the processBlock ? I had thought the move just moved out the contents of the vector but left the actual vector itself alone (i.e. keeps it’s size and does not destroy the variable) but I think from the sounds of it I’ve got that wrong! I think I’ll be keeping the const reference returns regardless.

So just to check the line:

melSpecCoefficients = melFrequencySpectrum (magnitudeSpectrum);

is assigning the returned vector to melSpecCoefficients which is the same size as the return value/vector from melFrequencySpectrum. Are we saying despite the vectors being identically sizes allocation could still occur as a result of the assignment ?

Cheers @dave96


#13

Yes, I think you’ve got this very wrong. Moving from an object essentially renders it useless. The whole point of move semantics is that you’re moving the contents of one object to the other. How can it do that (including the allocated memory) and then be ready for use next time around?
I would advise against writing things with move semantics until you’re absolutely certain about what you’re doing as you’ll get all kind of hideous bugs, exceptions and memory corruptions that will be virtually impossible to track down otherwise.

Did you read the link? I would imagine that in most cases this won’t allocate but I don’t think there are any guarantees made about this by the STL. There is at least one case in which it will re-allocate, if the std::vectors use different allocators.
Basically, it just feels a bit wrong to use the assignment operator here if you’re trying to guarantee no allocation.


#14

Right OK, I had thought it was possible that if you moved it out and the end of the processBlock and then assigned new values to all the elements at the start of the next processBlock call things would work. That’s fine, wasn’t going to be the approach I would take but just wanted to understand the actual resulting behaviour.

This link confused me a little in regards to ensuring you don’t read from a moved object until after you have set it’s value again.

http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-move

In specific the following from the core guidelines threw me a little:

“Note Ideally, that moved-from should be the default value of the type. Ensure that unless there is an exceptionally good reason not to. However, not all types have a default value and for some types establishing the default value can be expensive. The standard requires only that the moved-from object can be destroyed. Often, we can easily and cheaply do better: The standard library assumes that it it possible to assign to a moved-from object. Always leave the moved-from object in some (necessarily specified) valid state.”

Agh sorry, had missed the link. It makes more sense after reading that. Thanks.

OK thanks that all makes sense. Think it’s going to have to be a case of removing the lib and writing my own routines at a smaller scale with T* style arguments and return values.

Really appreciate the explanations @dave96, thanks for taking the time to give me a proper schooling on that one!

Josh


#15

Ahhhh I guess moving out the vector and then trying to assign to it right after in the next processBlock would be a problem as the vector no longer owns it’s initially allocated heap space and is in an undefined or default state. So it would have to allocate to be used again via resize or similar…I might have that wrong but it would make sense.


#16

For anyone else reading this thread, newbies or folks like me that haven’t yet spent enough hours studying C++ nitty gritties the video below might be useful.

@dave96’s comments were spot on.

If I take a std::vector member variable and return it from a processBlock function via std::move only the first block will work.

After calling std::move the member vector will no longer refer to it’s original heap space (similar to setting the rvalue reference’s array/heap memory pointer to nullptr in the move constructor in the above video) and thus would need to be resized/have memory allocated before assigning to it once again at the start of the next processBlock…no real-timey happy.

So with a std::vector the only loosely possible real-time friendly approach in functions like those I have previous posted would be to take in a const std::vector reference and then potentially return a reference or const reference to to the relevant member vector. (Or use an non const vector reference as an output parameter - again seems messy).

However, as @dave96 has said, even when with the vector you assign to from the functions return result is the exact same size as the return variable (a const std::vector& or a std::vector&) it cannot be 100% guaranteed that dynamic memory allocation will occur during the assignment because of the possibility of a different allocator type being used for the assigned to and returned vectors. See: might

SO:

std::vector vectorAssignable;`

vectorAssignable.resize(nElements);

vectorAssignable = foo(); //Where foo() returns a const `std::vector&` reference to a class member for example

May not be guaranteed not to reallocate memory in the assignment to vectorAssignable (despite vectorAssignable being the same size and type as the vector reference returned from foo()) and thus may not be real-time safe. I think in my particular instance things would possibly be OK and not require reallocation but it’s definitely not the approach to take.

So yeah…In regards to the library code that caused me to start this thread and correct my shoddy knowledge (kind of embarrassed I hadn’t read into this more before hand)…Using T* arguments and return values with a bit of thought is a far better option. Bye bye vectors.

Thanks once again to @dave96 for pointing me in the right direction.

Hopefully I’ve not made a disaster (mistakes) explaining that for anyone confused by this thread. Might be of some use to folks new to audio and C++ etc.

Cheers.


EDIT: The more times I work everything over in my head I realise how daft the assumption that a moved-from vector keeping its heap space was…how would the moved-to vector take advantage of move semantics without taking control of the moved-from vector’s contents/heap space in the first place…donut.