Bug? File drag doesn't work in release builds

I’m getting one of these very obscure bugs.

My Components that inherit from FileDragAndDropTarget stop reacting to drag and drop events. However - that bug only happens in Windows, only in Release builds and only using CLion or CMake command line builds - not in Visual Studio.

I tried debugging it, but strangely enough the bug doesn’t happen when building the RelWithDebugInfo target.

Here’s a minimal example, using the latest develop branch:
clone the repo:

  1. Using command line, try the following commands:
cmake -G Ninja -B build -S . -DCMAKE_BUILD_TYPE=Release
cmake --build build --target FileDragBug --config Release
  1. Drag a file to the screen - if it worked correctly the text on the label should change after dragging. You can compare with the debug build, too.

Any idea what this could be?

Edit: The actual code that’s triggering the feature bug is here:

Eyal

try public inheritance, i.e. “public juce::FileDragAndDropTarget”.

the reason is, that JUCE internally does a dynamic_cast<FileDragAndDropTarget*> on the target component (see juce_ComponentPeer.cpp) and if you omit “public”, then the C++ default it private - and if you inherit privately the C++ compiler is free fail a dynamic_cast (because this inheritance is only know to the child). And my guess is that the MSVC compiler retains type info for private inheritance in Debug builds but not in Release builds. I can’t be bothered to test this, but try running the following in Debug and Release in MSVC:

#include <iostream>
struct A{};
struct B : A {};

int main()
{
    std::clog << dynamic_cast<A*>(new B()) << std::endl;
    return 0;
}

If I’m right the release builds outputs 0 and the debug build prints the address of the leaking B.

Cheers

Thanks @PixelPacker ! IIRC, in structs the default should be public, shouldn’t it?

Anyway I double checked and in my actual code project where this issue has spawned I did use an explicit public, and just got too lazy with the demo code…

class FileDragLabel
    : public Component
    , public juce::FileDragAndDropTarget

Edit: I checked our example and I’m getting the right address in both debug and release builds, tried both Clang and MSVC

too bad! have you tried doing a clean rebuild ?

Yeah, I tried removing the cmake build folder multiple times on two different machines, and even recreated the issue in a whole different repo (which is the one I posted).

To make things more confusing:
Using the Visual Studio cmake generator works fine.
Using MinGW works fine.
Using both Clang and MSVC are broken - but only in release builds and without the debug information.

I don’t reproduce the issue. I ran the commands you wrote in the OP, then ran FileDragBug.exe, dragged some file onto the window and the label changed. I tried with different files, it worked consistently.

Some details about my environment:

  • I ran everything in “x64 Native Tools Command Prompt for VS 2019”, with cl version “19.28.29333 for x64”.
  • I added cmake version 3.16.8 and ninja version 1.9.0 to the PATH

Thank you so much for looking into it!

I just managed to get the same behavior (buggy with Ninja and works fine with the VS exporter) on a 3rd machine.

Granted - all 3 machines that display the bug are Macs running Boot Camp. My Ninja version on all computers I tested is 1.10.2 which could also be related.

Here are the two executables I’m building, one with Ninja (faulty) and one with VS (works fine):

What I also notice is that the Ninja generated exe is much smaller. So my current thinking is that it’s somewhat related how it links with the runtime library.

In my repo, I’m calling:

set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>")

But I’m guessing it’s not affecting the Ninja build somehow?

Looking a bit deeper, the bug seems to be related to the target juce::juce_recommended_config_flags

Removing that target makes the code break using the Visual Studio exporter as well, and bringing it back makes the code work again (but only using the VS exporter).

I’ve updated the repo in the original post with that flag commented out.

@reuk, would you mind taking a look?

1 Like

After a lot of help from @reuk and @McMartin, I managed to track down the issue.

The ‘fix’ on the user side is rather simple, but I think that there are some modifications to the JUCE CMake implementation needed for other people to not lose their minds over this like I did.

Background:

  1. As @reuk found out, using dumpbin on the two executables shows that the one built by the Visual Studio exporter declares more dependencies than the one built with the Ninja exporter. This happens despite both of them built from the same CMake file and the same compiler/linker.

  2. The juce_recommended_config_flags target adds to the release builds the /Ox flag, which means “Full Optimizations”.
    Despite the title, @McMartin found out that that flag is actually less optimized than the default for release builds which is /O2.

Since apparently the Ninja exporter ignores that flag, it meant that with that exporter, the compiler was allowed to remove all kinds of functions from the binary, and those included functions that declare dependency on Windows API DLLs.

  1. For reasons unknown to me some computers were still able to trigger the file drag without those DLLs, and some were not, so it seems to me like the ‘safe’ way to make sure the full functionality works on every computer is to always explicitly declare those DLLs as dependencies.

So, to solve that problem, what I needed was to force the JUCE_API macro to export the symbols.

I’ve done that using:

target_compile_definitions(${TargetName} PRIVATE JUCE_DLL_BUILD=1)

Despite my app being a regular standalone app and no DLLs involved, this made it possible to use the /O2 flag on all exporters, and get the correct functionality, with the trade off of a bigger binary.

On Mac, that flag made no impact, as it looks like the API was ‘exported’ by default anyway.

I’ve tried to repro this on a couple of machines now with no luck. I’m also able to run both the demo apps you posted in the zip above with no issues on both machines, so I don’t think the issue can be entirely due to missing dependencies.

What version of Windows are you using? I’m testing with 20H2 which works fine - perhaps the issue only affects specific versions of Windows.

I’m on the very latest Windows version in all computers this appears on (three different ones) But what’s 100% certain is that the issue is just a symptom of not declaring dependencies correctly - my guess is it’s the machines that do not show the problem that have some additional dev tool installed that makes it work in all modes even without the dependency explicitly declared.

BTW, if you’re investigating the issue, I’m happy to let you do a Team Viewer session in one of the problematic machines.

The thing is, I’m fairly certain that many users would have “faulty machines” that will have the same “configuration problem” as the three machines I tested.

I also believe this wasn’t reported in the past because up until recently devs only built JUCE apps/plugins using Visual Studio.

1 Like

How have you come to that conlusion? Sorry, that’s still not obvious to me.

It is true that the two executables declare different dependencies when queried with dumpbin /dependents. However, as far as I can tell, the drag-and-drop behaviour depends on ole32.dll which is linked in both executables. Could you try checking the version of this library on your systems? If it matches the version I have, that’ll at least let us rule out that there’s a problem with this specific library.

PS C:\Windows\system32> (Get-Item C:\Windows\System32\ole32.dll).VersionInfo

ProductVersion   FileVersion      FileName
--------------   -----------      --------
10.0.19041.1     10.0.19041.1 ... C:\Windows\System32\ole32.dll

Could you try checking the version of this library on your systems?

I’m getting an identical string to what you’re getting. Same exact version.

How have you come to that conlusion? Sorry, that’s still not obvious to me.

After getting the same issue in 3 machines, and always reproducing the same identical behavior (with explicit dependencies - works, without them - fails), I think we pretty much isolated all other variables that I’m aware of.

It’s true that I have not figured out the exact mechanics of the failure. But what is obvious: All versions that do work correctly (Visual Studio builds, Mac builds, debug builds) all have a ‘fatter’ binary and export more dependencies.

Also I think the more interesting issue here isn’t the exact nature of the failure I’m getting. The interesting part is that I’d like to believe the more optimized version (O2) is how most devs would want to distribute their program, and the ‘reduced binary’ version that was achieved by the /Ox version is less desirable in every way, and bug prone.

as far as I can tell, the drag-and-drop behaviour depends on ole32.dll which is linked in both executables

You might be on the right track here.

Thanks for checking the library version, guess we can rule that out.

I thought that the differing dependencies were a side-effect of the different optimisation level (Ox vs O2). These flags may also have other side-effects that we haven’t discovered yet, so I don’t think that we can definitively say that the issue is due to the dependencies that are/aren’t included in the final build - it might be due to some other difference between the Ox and O2 builds.

I’m not sure I understand this. Previously you said that in VS, using juce_recommended_config_flags fixed the issue, and removing it caused the program to break. With the recommended flags, the build was using Ox, and without it was using O2 - the rest of the build commandline was identical. To me, this suggests that the O2 flag is somehow to blame, but here you seem to be implying that the Ox version is problematic.

I noticed in juce_win32_Windowing.cpp that we’re not checking the return value of RegisterDragDrop (hwnd, dropTarget);, so perhaps this is failing somehow in the broken program. Might you be able to add some logging (Logger::outputDebugString) to a broken version of the program to check whether this call succeeds? It might also be interesting to stick some logging in the methods of the FileDropTarget in the same file.

That’s what I thought too, but then we saw that the O2 builds work fine in all machines when JUCE_DLL_BUILD is set to on, which again lead me back into the dependency route.

What I’m saying is - I believe the dependency bug with O2 was always there. I’m gonna go out on a limb and guess that /Ox was historically added as the ‘default’ optimization setting in the Projucer after unexplainable problems with the dependencies started showing up on some users machines with O2.

So, I added logging to the following piece of code:

            std::cout << "Registering drag and drop\n";
            auto r = RegisterDragDrop (hwnd, dropTarget);
            std::cout << "Drag drop value: " << r << std::endl;

Which printed:

Registering drag and drop
Drag drop value: 0

a few times during construction.

I also added similar logging to all virtual functions:
DragEnter, DragOver and DragLeave were called, but Drop was never called.

With all that logging, dropping files were still broken on the release build. It’s important to note that during the ‘broken’ state, the system actually changes the cursor to a “no entry” zone when I’m trying to drag a file into the program window.

Two additional notes:

  1. If the optimizer was indeed removing a few functions there, I’d expect the bug to show up on every computer running the same binary, which it didn’t. It seems like a runtime bug that only happens after the OS loaded the binary.

  2. Building the previous code adjustment with the logging in Debug and RelWithDebInfo produces a similar output, with the only difference that Drop() is called, and the drag/drop functionality works.

Thanks for trying that, I really appreciate it. Given that DragEnter is being called, and the cursor is being updated, it sounds to me like DragEnter may (incorrectly) be signalling that the file drop will not be accepted. Perhaps there’s some UB or a misuse of the drag and drop API somewhere in win32_Windowing.cpp which is benign most of the time, but which triggers under certain circumstances.

Might you be able to check the value returned by DragEnter? From what you’ve described, I think there’s a chance it’s returning something other than S_OK on the broken machines.

I changed the ending of DragEnter to always print out the result:

            auto d = DragOver (grfKeyState, mousePos, pdwEffect);
            std::cout << "dragOver value was " << d << std::endl;

            return d;

And it always returned 0 (S_OK).

I also debugged DragOver itself and added the following debug codes:

        JUCE_COMRESULT DragOver (DWORD /*grfKeyState*/, POINTL mousePos, DWORD* pdwEffect) override
        {
            std::cout << "DragOver\n";

            if (peerIsDeleted)
            {
                std::cout << "DragOver failed\n";
                return S_FALSE;
            }
            
            //Original code executing here...

            std::cout << "DragOver success\n";
            return S_OK;
        }

Which always print “DragOver success”, despite the file drag/drop failing.

Perhaps there’s some UB or a misuse of the drag and drop API somewhere in win32_Windowing.cpp which is benign most of the time, but which triggers under certain circumstances.

In that case, won’t the same built binary have the same bug in every system?

FWIW, in the past I noticed that file drag and drop becomes unavailable on a window (i.e. the displayed cursor is the “forbidden” sign) if the executable is launched in admin mode.

Which makes sense from a security standpoint, so that a file picked from “user-land” doesn’t have such an easy chance of being thrown over a window that belongs to “admin-land”.

Could it be that somehow something is triggering this behaviour on the misbehaving executables that you have?