juce::String and memory ownership (across DLL boundaries)

sandormatyi · April 3, 2023, 2:51pm

Hey All,

I’ve been dealing with a seemingly simple but frustrating issue for a while and I have yet to find a solution that I find satisfying.

I have 2 binaries, a DLL and an executable. The executable calls exported functions from the DLL. Both of them are compiled using the same version of stdlib and JUCE, but the DLL might be used with other apps later, so I cannot use juce::String as return types of exported functions. So far, sounds pretty simple.

Here is a minimal code example outlining my issue:

Public API in the DLL

// This returns a pointer to statically allocated memory (no issues here)
__declspec(dllexport) const char* getVersion()
{
    return _VERSION;
}

// This returns dynamically allocated memory, making it the caller's responsibility to free it
__declspec(dllexport) char* getSomeDynamicData()
{
   return _strdup(state.someData.toRawUTF8());
}

On the caller’s side, I want to avoid calling free() manually and I’d love to convert the char* to a juce::String as soon as possible. However, if I’m reading the JUCE code right, juce::String calls memcpy when constructed from a char*.

Here are the solutions that I considered:

Using std::unique_ptr<char>:

std::unique_ptr<char> someDynamicData(dll.getSomeDynamicData());
juce::String someDynamicDataString(someDynamicData.release());

This introduces a completely redundant memcpy and I also don’t like how it looks. Other than that, it gets the job done.

Returning pointers without _strdup and copying the data on the caller’s side:

__declspec(dllexport) const char* getSomeDynamicData()
{
   return state.someData.toRawUTF8();
}

juce::String someDynamicDataString(dll.getSomeDynamicData());

This looks beautiful. The constructor of juce::String would take care of copying and deallocating the returned data. The only problem is that I assume that I cannot guarantee complete thread safety this way and it is a deal breaker for me.

The oldschool C way:

__declspec(dllexport) void getSomeDynamicData(char* buf, size_t size)
{
   memcpy(buf, state.someData.toRawUTF8(), num);
}

I just hate how this looks, especially when the API function has other parameters. It’s ugly on the caller side as well.

Is there something I’m missing here? I get the feeling that this shouldn’t be this complicated. Is there maybe a way to initalize a juce::String and make it also take ownership of the pointer that I pass into it?

Thanks for all your input in advance!
Matyi

reuk · April 3, 2023, 6:00pm

It’s normally best to avoid passing ownership of memory across DLL boundaries. For that reason, I’d suggest going with option two.

If your concern is that the data returned by getSomeDynamicData() might be invalidated by another thread during the read operation, then you may be able to use standard thread safety techniques to avoid this case. For example, you could take a lock before reading and release it once the data has been consumed, ensuring that the background thread can’t invalidate the data during the read. Alternatively, you could use a lock-free queue.

sandormatyi · April 3, 2023, 6:26pm

@reuk Thanks for your feedback!

For example, you could take a lock before reading and release it once the data has been consumed

The DLL has no knowledge of when the data is being consumed, that depends entire on the caller. The lock must therefore be locked by the caller as well, right? How do you suggest going about this?

reuk · April 3, 2023, 6:49pm

Probably expose a lock and an unlock function that takes a pointer to whatever entity is producing the data.

// opaque pointer to something that produces dynamic data
struct DataSource;

// attempts to get some data from the source
extern "C" char* getDynamicData (DataSource*);

// take/release mutex on DataSource
extern "C" void lock (DataSource*);
extern "C" void unlock (DataSource*);

It’s a bit difficult to say exactly what the best solution would be without knowing how the data is produced, and where the threading concerns come from.

sandormatyi · April 3, 2023, 7:27pm

@reuk I have been trying to avoid it for the following reasons:

We have ~30 API functions that return dynamically allocated strings. That would mean 60 additional API functions, just for locking.
In order to reasonably assure that the unlock() function will actually get called, I would have to implement a RAII wrapper on the caller’s side. I would have to do it in every client that wants to use this library.
The lock() function needs to be called before returning the pointer. This means that we have two options for how we want to call it. The first option is to call it from the caller’s side before calling the actual function. This means that it’s possible that the getDynamicData() function is called without locking the lock. We need to check for this and we need a way to indicate this error to the caller.
The second option is to call lock() from the DLL before returning the pointer. unlock() would still need to be called by the caller. This might not be terrible, but just seems weird to me.

Considering all these, I’m leaning towards transferring ownership but I’d love to be convinced otherwise

JeffMcClintock · April 3, 2023, 9:11pm

as you know - you can’t pass a c++ string across a DLL boundary in a portable manner.

Instead, I pass an interface that represents a string. e.g. returning a string back to a caller…

This lets the caller manage the lifetime of the string in a safe manner.

From the caller side it looks like…

I think it is a fairly easy way to pass strings around in a thread-safe and portable manner, and it can also be made C compatible, so you can use it between different languages as well. I hope that makes sense

PluginPenguin · April 3, 2023, 9:34pm

You could also consider adding a thin header only wrapper around your pure C interface that you then use to access the functions from your dynamic library. Technically, this will be compiled along with your executable that calls the exported functions, but it might be distributed along with the headers for your DLL and reduce a lot of boilerplate code. Untested, but I think it should just work like this

template <char*(libFn*)(DataSrc*)>
juce::String getFromDLL (DataSrc* src)
{
    lock (src);
    juce::String str (libFn (src));
    unlock (src);
    
    return src;
}

// Usage
auto someDynamicData = getFromDLL<someDynamicDataString> (dataSource);
auto someOtherData = getFromDLL<someOtherDataString> (dataSource);

sandormatyi · April 4, 2023, 6:51am

Thank you, this seems reasonable.

I assume the gmpi_sdk::MpString allocates and copies the data in its setData() method. I could use a juce::String internally in the implementation to solve all my issues.

I’m not a huge fan of returning data as an argument but it’s prettier that the standard C-style.

sandormatyi · April 4, 2023, 12:12pm

Thanks for the suggestion! The reason why I’m hesitant to go down this route is because I would have to add ~30 locks to various places in the code. Also, the template seems elegant, but it would either require a major refactor to add a layer of abstraction directly below the public API to make the current code generic enough or I would have to write the template specification for each individual function.

Topic		Replies	Views
Issues with String on Windows (VS 2013) Windows	5	559	April 4, 2014
Char * and juce::String General JUCE discussion	3	1674	May 12, 2017
Juce::String to const char*? General JUCE discussion	29	9365	February 2, 2020
Juce 5.1.2: juce_String.h does not compile with JUCE_DLL_BUILD=1 Windows	2	591	September 22, 2017
Juce::String global operators __declspec General JUCE discussion	1	269	July 4, 2010

juce::String and memory ownership (across DLL boundaries)

Purchase

Discover

Learn

Support

About

Events

juce::String and memory ownership (across DLL boundaries)

Related Topics

Purchase

Discover

Learn

Support

About

Events