There’s a livelock bug in the WaitFreeListeners MIDI implementation that causes my app to reliably hang at shutdown after turning an attached ROLI seaboard off and on again while running. There is a simple fix and I’ve written a PR and tested it thoroughly:
master ← emezeske:midi-hang-fix
opened 02:11AM - 21 May 26 UTC
# [JUCE 8.0.13] Two bugs in Windows MIDI (legacy winmm): WaitFreeListeners::call… () race + removeConsumer typo
Disclosure: Claude helped me find these bugs and write this report. I have personally read through the full bug report and fix code and confirmed that it is all correct. I edited this bug report for clarity.
I ran into this hang myself and it is easily reproducible by turning my ROLI seaboard off and on during a standalone app session. Every single time, JUCE hangs.
## Summary
The legacy winmm MIDI input path on Windows (JUCE 8.0.13) has two bugs that
together can cause a JUCE standalone app process to hang on shutdown after a
MIDI device hot-plug event:
1. **`WaitFreeListeners::call()` has a race** that can leave the per-entry
"in-use" bit permanently set, which makes `WaitFreeListeners::remove()` spin
forever in its CAS loop. The race is triggered whenever `call()` is invoked
concurrently on the same instance. The header comment says concurrent
`call()` is forbidden — but JUCE's own winmm `midiInCallback` violates that
contract because the singleton `allInputs()` is shared across all opened
MIDI input devices, and each open device gets its own Windows-managed
callback thread.
2. **`WindowsMidiHelpers::Win32::InputDevice::removeConsumer` calls
`consumers.add(c)` instead of `consumers.remove(c)`** (copy/paste typo from
`addConsumer`). The WinRT path immediately above it is correct; only the
legacy winmm path is affected.
Files affected:
- `modules/juce_audio_devices/midi_io/juce_WaitFreeListeners.h`
- `modules/juce_audio_devices/native/juce_Midi_windows.cpp`
## Repro
Windows 11 (build 10.0.26200), JUCE 8.0.13, standalone JUCE app that opens
MIDI inputs via the default `AudioDeviceManager` path. With a MIDI device
plugged in (in our case a ROLI Seaboard), plug/unplug the device, then close
the app. The process disappears from the taskbar but a thread keeps spinning
at 100% CPU forever, never exiting.
We captured a live process minidump and walked the stack. The hot thread is
parked inside `WaitFreeListeners::remove()` (ICF-folded across the
`<Consumer>` and `<InputDevice>` template instantiations), invoked from
`InputDevice::~InputDevice` via `allInputs().remove(*this)` during
`AudioDeviceManager`'s teardown:
```
WinMainCRTStartup
invoke_main
juce::JUCEApplicationBase::main()
juce::JUCEApplicationBase::shutdownApp()
... ModalComponentManager teardown ...
juce::AudioDeviceManager::~AudioDeviceManager()
~vector<unique_ptr<juce::MidiInput>>
juce::MidiInput::Impl::~Impl()
~InputImplNative (WinmmTraits)
device->removeConsumer(*this) // bug #2: calls consumers.add()
~InputDevice // bug #1 manifests here
allInputs().remove(*this)
WaitFreeListeners::remove() // CAS loop spinning, RIP here
```
## Bug #1: race in `WaitFreeListeners::call()`
`juce_WaitFreeListeners.h` line 120-129:
```cpp
for (auto& entry : callerCopy)
{
const auto entryAsInt = entry->fetch_or (1);
auto* const entryAsPtr = reinterpret_cast<Listener*> (entryAsInt & ~(uintptr_t) 1);
if (entryAsPtr != nullptr)
callback (*entryAsPtr);
*entry = entryAsInt; // ← non-CAS restore races with a concurrent call()
}
```
Two threads A and B running `call()` on the same listener list, same entry:
| step | thread A | thread B | entry |
|------|--------------------------------|--------------------------------|--------------|
| 1 | `fetch_or(1)` → returns `ptr` | | `ptr \| 1` |
| 2 | | `fetch_or(1)` → returns `ptr\|1` | `ptr \| 1` |
| 3 | callback runs | callback runs | `ptr \| 1` |
| 4 | `*entry = ptr` | | `ptr` |
| 5 | | `*entry = ptr \| 1` | `ptr \| 1` ⚠ |
After step 5 the in-use bit is set with no `call()` actually in flight. Any
subsequent `remove()` on that entry will spin forever inside this CAS loop
(lines 99-100), because `expected` is computed once at the top of the
function with the bit cleared, and `tmp` is never 0:
```cpp
const auto expected = entry.load() & ~(uintptr_t) 1;
auto tmp = expected;
while (! entry.compare_exchange_weak (tmp, 0) && tmp != 0)
tmp = expected;
```
### Where concurrent `call()` actually happens
`juce_Midi_windows.cpp` line 2725, inside `midiInCallback`:
```cpp
static void CALLBACK midiInCallback (HMIDIIN, UINT uMsg, ...)
{
auto* collector = reinterpret_cast<InputDevice*> (dwInstance);
allInputs().call ([&] (auto& l)
{
if (collector != &l)
return;
// ...
});
}
```
`allInputs()` is a static `WaitFreeListeners<InputDevice>` — one global
listener list shared across **all** opened MIDI input devices. Each opened
`HMIDIIN` has its own Windows-managed callback thread. When multiple devices
are open (including a transiently-still-alive one during hot-plug), Windows
can deliver `midiInCallback` from different threads simultaneously, all
calling `allInputs().call(...)`. That is the contract violation that triggers
bug #1.
### Suggested fix
Replace the plain restore at the bottom of the loop with a CAS that only
clears the bit if it's still in the state we left it in:
```cpp
auto expected = entryAsInt | (uintptr_t) 1;
entry->compare_exchange_strong (expected, entryAsInt);
```
If another concurrent `call()` already restored the entry, our CAS fails
harmlessly and we don't clobber the value. This preserves the wait-free
contract for the audio thread (only `add`/`remove` are blocking) and is a
strict superset of the current behavior in the single-caller case.
Walkthrough of the race with the patch applied:
| step | thread A | thread B | entry |
|------|--------------------------------|--------------------------------|--------------|
| 1 | `fetch_or(1)` → returns `ptr` | | `ptr \| 1` |
| 2 | | `fetch_or(1)` → returns `ptr\|1` | `ptr \| 1` |
| 3 | callback runs | callback runs | `ptr \| 2` |
| 4 | CAS(`ptr\|1` → `ptr`) succeeds | | `ptr` |
| 5 | | CAS(`ptr\|1` → `ptr\|1`) fails | `ptr` ✓ |
I've verified the four interleavings (A-B, B-A, A-CAS-then-C, etc.) and the
interactions with `remove()` and "entry already 0" paths.
## Bug #2: `removeConsumer` calls `add` instead of `remove`
`juce_Midi_windows.cpp` lines 2581-2589:
```cpp
void addConsumer (ump::Consumer& c)
{
consumers.add (c);
}
void removeConsumer (ump::Consumer& c)
{
consumers.add (c); // ← should be remove(c)
}
```
The WinRT variant of the same class (line 127 in the same file) has the
correct `consumers.remove(c)`. Only the legacy winmm path is wrong.
Independent of bug #1, this is a leak (consumers accumulate in the per-device
listener list and are never removed) and a use-after-free (the
`InputImplNative` containing the `Consumer` subobject is destroyed without
removing itself from the consumer list, so subsequent `midiInCallback`
invocations on the same device will dispatch to a dangling pointer).
In our hang repro, bug #2 isn't directly on the spinning stack — but it
contributes to keeping zombie consumers attached, which makes the race in
bug #1 easier to hit.