Sorry for the delay, more urgent tasks have diverted my attention until now.
I suggest to anyone who is going to read this to grab a cup of coffee and some concentration: I fear this won’t be neither easy nor short.
Now, I will write this while redoing the whole thing, as a further attempt to get some inspiration for a viable solution.
So, I am working on this cross-platform plug-in mainly on a Mac, and occasionally build it on Windows for testing/debugging before releases.
During one of these dev sessions on Windows, the plug-in started to consistently hit a jassert located in the LeakedObjectDetector
destructor.
The DBG
message associated with the jassert is:
*** Dangling pointer deletion! Class: KeyPress
This puzzled me because none of the code that was recently modified in the project has anything to do with KeyPress
instances at all.
I spent an entire day hunting for KeyPress
mismatched allocations (and I use RAII a lot, so that was a strange symptom anyway), but I could find literally nothing wrong.
Out of desperation, to see if I was going mad, I tried the following code:
{
KeyPress p; // (1) an instance of KeyPress is created here
} // (2) and it is destroyed upon exit from this curly
This literally cannot trigger any leak-detection-related issue, right, RIGHT?
Wrong.
When stepping over line (2), I get that dreaded assertion.
Ok, so… something is seriously wrong with what’s happening here, and it is not due to actual mismatched allocations, double deletions or memory leaks, becasue the code above is certainly correct but still it hits the assert.
In addition, if really there were a double deletion somewhere in the code, that should trigger the assertion also on the Mac build, which shares exactly the same code. But on Mac it runs just fine without assertions.
Out of ideas on how to investigate this, I check if the assertion is consistently hit in all the targets of the Windows build, which are:
- 64-bit stand-alone app target -> no assertions hit
- 32-bit stand-alone app target -> no assertions hit
- 64-bit polymorphic plug-in target (VST2, VST3, AAX) -> no assertions hit
- 32-bit polymorphic plug-in target (VST2, VST3, AAX, RTAS) -> assertion is hit only in this config!
So, it only happens in the 32-bit plug-in build. What makes that different from all the others? RTAS
To confirm my suspicions, if I disable the RTAS build in the offending target with:
#define JucePlugin_Build_RTAS 0
then rebuild and run, no assertions are hit with RTAS disabled.
Ok… so I have the culprit, or at least a strong clue, but RTAS support is mandatory for this project and disabling it is not an option.
Furthermore, still it is unexplained what RTAS has to do with that absurd jassert regarding KeyPress
, so I went further.
For a fact, I know that the assert is being hit when the LeakCounter
object for the KeyPress
class is decremented past 0, going into the negative.
The LeakCounter
instance for a given class is a singleton, obtained in all places where it is needed by calling the static method LeakedObjectDetector::getCounter()
.
Then, to follow what is happening to the LeakCounter
for the KeyPress
class, I added the printouts shown below:
static LeakCounter& getCounter() noexcept
{
static LeakCounter counter;
// ***** start code added by me
if (strcmp (getLeakedObjectClassName(), "KeyPress") == 0)
DBG (String::formatted ("counter object: %p, value: %d", &counter, counter.numObjects.get()));
// ***** end code added by me
return counter;
}
And then I watched what was being printed when running this seemingly harmless code, previously mentioned:
{
KeyPress p; // (1) an instance of KeyPress is created here
} // (2) and it is destroyed upon exit from this curly
This is the result:
counter object: 0DE6CC70, value: 0 // printed when stepping over line (1)
counter object: 0DE6BE90, value: 0 // printed when stepping over line (2)
Oh well, this means that:
-
When the KeyPress
instance p
is created at line (1), the LeakCounter
instance which is incremented has address 0DE6CC70
-
When the KeyPress
instance p
is destroyed at line (2), the LeakCounter
instance that is decremented has an address, 0DE6BE90
, which is different from the one that was incremented in the previous line, which means that it is not the same LeakCounter
object, but another one.
This ultimately results in line (2) trying to decrement a LeakCounter
instance whose value is 0 already (as shown by the printout), and that in turns triggers the jassert that I am hitting.
The problem is clearly that the two lines (1) and (2) above are using two distinct LeakCounter
instances, instead of the same for both as it would be expected.
But still, it is also true that both lines (1) and (2) are also getting their LeakCounter
invoking that same getCounter()
method where I have added my printouts, otherwise I wouldn’t be getting two lines printed to the output because of my DBG
s.
The code of getCounter()
is seen above, and it is quite simple. Without my printouts the original code is:
static LeakCounter& getCounter() noexcept
{
static LeakCounter counter;
return counter;
}
This is a common and well known pattern, for creating and returning singletons: the function constructs the static
variable upon its first invokation, and then returns a reference to it for that and all following invokations.
So, how is it possible that the same getCounter()
function is called, but it yields two different instances of LeakCounter
in return?
That’s not possible, unless the getCounter()
function being called is not really the same in both places…
As an additional check, I add to my DBG
printouts the actual function pointer of the getCounter()
being called, like this:
static LeakCounter& getCounter() noexcept
{
static LeakCounter counter;
if (strcmp (getLeakedObjectClassName(), "KeyPress") == 0)
DBG (String::formatted ("function pointer %p, counter object: %p, value: %d", getCounter, &counter, counter.numObjects.get ()));
return counter;
}
And the result is this:
function pointer 1017C030, counter object: 115ECC70, value: 0
function pointer 100C02A0, counter object: 115EBE90, value: 0
Oook, so this means that:
-
line (1) calls a getCounter()
function, whose address is 1017C030
, and obtains in return a LeakCounter
object whose address is 115ECC70
.
-
line (2) calls another getCounter()
function, whose address is 100C02A0
, and obtains in return a different LeakCounter
object, whose address is 115EBE90
This somehow makes some sense: each of the two getCounter()
functions (1017C030
and 100C02A0
) creates and returns its static LeakCounter
object (115ECC70
and 115EBE90
, respectively).
But why do those two getCounter()
functions even exist?
Only one should be generated by the LeakedObjectDetector
class template, for any given argument class (KeyPress
in this case)
So, let’s search the addresses of those funcion pointers inside the .map
file generated by the linker, to see if it is possible to understand what they correspond to:
-
1017C030
is matched in the following line:
?getCounter@?$LeakedObjectDetector@VKeyPress@juce@@@juce@@CAAAVLeakCounter@12@XZ 1017c030 f i juce_audio_processors.obj
which makes sense, because that seems a legit decorated name for a function whose full name is LeakedObjectDetector<KeyPress>::getCounter()
.
-
100C02A0
is matched in the following line:
?getCounter@?$LeakedObjectDetector@VKeyPress@juce@@@juce@@CGAAVLeakCounter@12@XZ 100c02a0 f i juce_audio_plugin_client_RTAS_1.obj
the decorated name here looks surprisingly similar to the one above, with the only notable difference of one character (CGAAVLeakCounter
instead of CAAAVLeakCounter
).
This finding confirms my suspicions: the second function pointer corresponds to another LeakedObjectDetector<KeyPress>::getCounter()
, different from the first one.
Ok, but what exactly makes them different?
Thanks to the undname.exe tool, provided with Visual Studio, I can get to know what exactly each of those decorated names mean:
the first corresponds to:
static LeakedObjectDetector<KeyPress>::LeakCounter& __cdecl LeakedObjectDetector<KeyPress>::getCounter()
while the second means:
static LeakedObjectDetector<KeyPress>::LeakCounter& __stdcall LeakedObjectDetector<KeyPress>::getCounter()
The difference is now clear: they are two copies of the same function, differing only because if their calling convention (cdecl
and stdcall
respectively).
Unsurprisingly, the one which uses stdcall
comes from juce_audio_plugin_client_RTAS_1.obj
, which makes sense because the RTAS plug-in wrapper is in fact built with that calling convention.
To confirm what I have found, if I replace the stdcall
calling convention in every place where it is used (also in the PT SDK projects that build the libraries needed for RTAS), I can build the polymorphic 32-bit plug-in just fine, and it will load correctly (as a VST), showing the correct printouts:
function pointer 0CA2F8F0, counter object: 0DF5AE90, value: 0
function pointer 0CA2F8F0, counter object: 0DF5AE90, value: 1
but unfortunately, doing so results in Pro Tools 10 not seeing it as a RTAS plug-in any more (as I described in more detail in my first post).
And this, gentlemen, is the story of why I am searching for a way to build the PT SDK with cdecl
calling convention, which the SDK documentation says it is entirely possible, but that I am somehow unable to put in practice.
THE END
POST CREDITS SCENE:
But wait?! Why in the world does the destruction of a KeyPress
instance in my simple code (which is in a cpp file built using cdecl
) end up calling a getCounter()
function which is defined in some other RTAS module, and that uses an entirely different calling convention???
