Deadlock/hang in WebInputStream destructor

We have recently been encountering a bug where our web requests, which open WebInputStreams, will hang indefinitely, well beyond the defined idle timeout. This bug is incredibly hard to reproduce, so we have only been operating on theories until recently when we were able to extract stack traces. The nature of the bug, specifically, is that we will hit a jassertfalse in the ThreadPool destructor that complains that the request thread had to be force killed as it was not responding to the kill notification. We audited our code to find spots where we weren’t calling threadShouldExit, but nothing jumped out.

Once we finally got the stack trace, it became clear that our request was hanging because the WebInputStream it spawns was hanging in its destructor. I’ve posted the relevant stack trace. I’m going over the JUCE source now to see if I can spot the exact cause, so I’ll post here if I find anything.

I’ve partially redacted the stack trace for my team’s privacy; I’d be glad to share the full contents of my findings with the JUCE team privately. Please let me know if there’s any other information I can provide.

  thread #23, name = 'Pool'
    frame #0: 0x000000018ada53cc libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000018ade40e0 libsystem_pthread.dylib`_pthread_cond_wait + 984
    frame #2: 0x00000001083df420 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
    frame #3: 0x000000010401ad94 `void std::__1::condition_variable::wait<juce::SharedSession::removeTask(NSURLSessionTask*)::'lambda'()>(this=0x0000600000caf060, __lk=0x000000016dd6a540, __pred=(unnamed class) @ 0x000000016dd6a4e0) at condition_variable.h:147:5
    frame #4: 0x000000010401acec `juce::SharedSession::removeTask(this=0x0000600000caf020, task=0x000000013670c860) at juce_Network_mac.mm:162:17
    frame #5: 0x000000010401ac20 `juce::TaskToken::~TaskToken(this=0x000000013670c538) at juce_Network_mac.mm:374:22
    frame #6: 0x000000010401aab8 `juce::TaskToken::~TaskToken(this=0x000000013670c538) at juce_Network_mac.mm:370:5
    frame #7: 0x000000010401aa0c `juce::URLConnectionState::~URLConnectionState(this=0x000000013670c470) at juce_Network_mac.mm:400:5
    frame #8: 0x0000000104014428 `juce::URLConnectionState::~URLConnectionState(this=0x000000013670c470) at juce_Network_mac.mm:398:5
    frame #9: 0x0000000104013b48 `std::__1::__optional_destruct_base<juce::URLConnectionState, false>::reset[abi:ne200100](this=0x000000013670c470) at optional:318:15
    frame #10: 0x0000000104094c54 `juce::WebInputStream::Pimpl::~Pimpl(this=0x000000013670c3f0) at juce_Network_mac.mm:845:20
    frame #11: 0x0000000104094c24 `juce::WebInputStream::Pimpl::~Pimpl(this=0x000000013670c3f0) at juce_Network_mac.mm:844:5
    frame #12: 0x0000000104094bf0 `std::__1::default_delete<juce::WebInputStream::Pimpl>::operator()[abi:ne200100](this=0x000000016dd6aa38, __ptr=0x000000013670c3f0) const at unique_ptr.h:78:5
    frame #13: 0x0000000104094bb4 `std::__1::unique_ptr<juce::WebInputStream::Pimpl, std::__1::default_delete<juce::WebInputStream::Pimpl>>::reset[abi:ne200100](this=0x000000016dd6aa38, __p=0x0000000000000000) at unique_ptr.h:300:7
    frame #14: 0x0000000104094b60 `std::__1::unique_ptr<juce::WebInputStream::Pimpl, std::__1::default_delete<juce::WebInputStream::Pimpl>>::~unique_ptr[abi:ne200100](this=0x000000016dd6aa38) at unique_ptr.h:269:71
    frame #15: 0x0000000103f7492c `std::__1::unique_ptr<juce::WebInputStream::Pimpl, std::__1::default_delete<juce::WebInputStream::Pimpl>>::~unique_ptr[abi:ne200100](this=0x000000016dd6aa38) at unique_ptr.h:269:69
    frame #16: 0x0000000103f748cc `juce::WebInputStream::~WebInputStream(this=0x000000016dd6aa28) at juce_WebInputStream.cpp:45:1
    frame #17: 0x0000000103f74958 `juce::WebInputStream::~WebInputStream(this=0x000000016dd6aa28) at juce_WebInputStream.cpp:44:1

// ... moves into our code's stack trace, in the processRequest function where the WebInputStream is used.

Are you using an up-to-date version of JUCE’s develop branch? There was a fix for an issue that sounds similar at the end of June this year:

That’s good to hear! This definitely seems like the issue we’ve run into. We’re on 8.0.3. It looks like this change didn’t make it into 8.0.8, so I’ll try to post here when we bump to 8.0.9. Thank you!