Bug: Unresponsive wait in ReadWriteLock

This bug was accidentally found when I’m learning some formal methods for verifying concurrent programs and decided to model JUCE’s ReadWriteLock as a practice. It is not hard to find cases where this can cause delays of several hundred milliseconds in typical consumer-producer setups.

There is a design mistake causing threads with ReadWriteLock::enterRead() / ReadWriteLock::enterWrite() to halt when there is a waiting writer thread, a running writer thread and a waiting reader thread. The running writer finishes and wakes up either the reader thread or the other writer thread. If the writer thread wakes up, things will run just fine. However, if the reader thread wakes up, the program will halt. The two waiting threads will wait until 100 ms timeout (or until the next exitRead() / exitWrite() happens).

Although I only tested it on Linux (JUCE 5.4.5), in theory this bug affects all platforms and all JUCE versions. It sounds a pretty scary thing that it remains hidden in the source code for so many years.

I wrote a detailed analysis here:

Also included:

  • C++ source code to reproduce the bug (and bug fix).
  • Two different solutions to fix the bug.
  • A Promela port of ReadWriteLock to show what’s going wrong.

Here is a fork of JUCE with bug fix (based on the second solution):

1 Like

Thanks for the detailed report! This should now be fixed in 2916812.

1 Like