JUCE window stalls for 10 sec when Internet is on

opengl
windows

#1

Hello guys!

Windows 7, 32GB RAM, 4 GPUs, 3GHz 4 core CPU.
I have written a simple realtime window with OpenGL component that shows interactive rendering of 3D scene, few buttons, controls and status bar. I love the flexibility of JUCE, how every thing can be nicely tuned. There were few problems with OpenGL detach (he-he) but they were solved. My OpenGL window shows a picture of interactive rendering which is computed using a complex CUDA based application (not OpenGL primitives). OpenGL is responsible for just displaying an image. The image itself is computed in a parallel thread, and JUCE OpenGL thread just takes it ASAP inside TryCriticalSection LeaveCriticalSection and displays at 20FPS.
This JUCE window has an owner of host app window of host 3D application Cinema 4D (it’s a 3D scene editor). Making this helps to implement AlwaysOnTop feature only for owner window (not for others) - in Windows only.

Everything runs very cool and fast. Almost always. But JUCE window together with host Cinema 4D app stalls for around 10 seconds if I turn on Internet connection and some message alerts are shown like email notification or Skype notification that someone is online or some antivirus notification. It happens 1 or 2 times during the work in the host app while my JUCE window is running. It doesn’t crash but stalls. And it is not any background thread but a MAIN GUI thread. This problem forces a user to wait for responce of main host app (Cinema 4D) which is very bad. After stall it continues running well and probably stalls one more time in 1minute. It’s crazy strange! I can’t believe it. AND when there is no Internet connection everything works fine without ever stalling. The stall happens somewhere inside JUCE code (maybe message loop or whatever). I have checked all my routines, they take less than 1ms. I still haven’t found what stalls in JUCE, continue digging the code. Would be glas if someone helps me.
Btw, before JUCE we used wxWidgets, that didn’t have this stall problem (however there were other weird problems like inflexibility, very slow initial window opening, etc.).

License is Personal for testing. I want to buy Pro if we fix the issue because everything else is so cool. But I need help to solve this issue.

Best regards, Kirill


#2

…and these described stalls happen later than default JUCE Splash screen disappears.


#3

Could you run it in the debugger and hit pause when it’s stalled? This might show what it’s waiting for.


#4

it seems I can’t debug in such a way since we build Juce window as static lib, then link it to the lib of Cinema 4D render plugin an then run Cinema app usually.

However, I have injected a lot of time evaluations and printf’s if something is computed very long. And I have also commented out in juce_OpenGLContext.cpp in renderFrame function() the Locker, doWorkWhileWaitingForLock, MessageManager lock. However, in my callback renderOpenGL now everything is super thread-safe. Only TryEnterCriticalSection and few atomics are used as data sync. I also reimplemented resource and task free in a way that OpenGL->detach doesn’t wait for critical section to finish (it’s already finished before detach). Now it DOESN’T stall at every single popup of Internet and Facebook messengers. Open/close is fine too. Also I had to avoid overlaying the components on top of OpenGL window.

BUT! It still stall sometimes (quite rare) for few sec in case a user additionally loads web-brouser with Youtube, FB, or other overloaded sites. Specific functions that STALL are: SwapBuffers (from wgl***) or glTexImage2D (when I load my computed pixels to GPU inside critical section). If these functions block a background thread for several seconds then the Main GUI thread is also blocked. Why? Should it? Is there some internal syncronization inside wgl related to GPU resource swapping? Or maybe some JUCE thread waits for runJob(){renderFrame() {renderOpenGL()}} ? Please consult me on this.


#5

If the GL swap functions are blocking while another process connects to the openGL system then it’s almost certainly your GL driver that’s the culprit. There’s really nothing we can do if a basic OS call randomly takes 10 seconds to return.


#6

Hi Jules! I would like to say thanks for your great job on JUCE GUI flexibility, it is what I was really looking for. And that’s why didn’t want to give up with OpenGL issues.
Just consult if I have commented anything that’s not good to comment in the code. See juce_OpenGLContext.cpp, I don’t really need multiple OpenGL windows, contexts and control overalys and adapted my callback functions as described in prev message.


#7

when lines were uncommented in this code the stalls happened a lot more often in random places (not even detected). With commented the situation become significantly better so I can sleep almost calmly.


#8

Well if what’s happening (which I’m guessing only based on what you said) is that it stalls when another app connects to the GL driver, then presumably that’ll happen inside any GL call. So making fewer GL calls will reduce the chances of it happening a bit, but won’t actually fix the problem, which sounds like a driver issue.


#9

wxWidgets with the same OpenGL calls, driver, same computer, with the same software ecosystem around it didn’t have this issue at all. Please, reply on my last questions? Mainly, what routine inside JUCE ever communicates with a thread that executes renderFrame?


#10

Well the only thread that interacts with this is the message thread. Is it just that something is blocking your message thread for that length of time?

It’s be much easier if you were to just debug it, pause it, and see clearly what’s going on! We don’t stand much chance of guessing what’s going wrong without actually seeing it!


#11

It’s not easier. In current ecosystem it’s not possible to debug in usual way. But if we run it as standalone separatelly from ecosystem then conditions are too easy to rely on them.

Well the only thread that interacts with this is the message thread. Is it just that something is blocking your message thread for that length of time?

Pls, can you hint in what place of code? Does the message thread need something more than component access? In my limited modification the components are not used over OpenGL subwindow


#12

The message thread does pretty much everything! I honestly don’t know what to suggest if you can’t debug it, or at least find a way to figure out exactly where it’s getting blocked.


#13

Let’s look at renderFrame() function again:

This one really block message thread because we access the components that are mostly processed by main thread. But in OpenGL thread not used because context.renderComponents is off.

Why do you need this? What does it syncronize?

Why do you enable/disable the context at the begin/end in each loop? Does it have any effect on stability?

I have commented all of this and now is working a lot more stable/smooth.
Would be glad to get concrete replies at least on this message.


#14

The NativeContext::Locker locks openGL, and is essential!

Honestly, I don’t understand why you’re suggesting commenting-out this stuff, this function has been developed over many years, and has to handle edge-cases. Removing or changing even the tiniest thing in there is almost certainly going to break something!


#15

This is what it does on Windows OS. Nothing.
As for the rest it seems you suggest me to dig by myself. However, I hoped you can help. But it seems you are like on a vacation and play the game with “I don’t understand this or that”.
You try to find excuses in my drivers, hoever my PC is pretty hi-end so do the latest drivers. And you don’t even pay attention that some other system with same wgl*** usage for OpenGL on Win works fine under same conditions. I switched to JUCE for the reasons to get better interface but faced problems with OpenGL. There are many reports here about not being able even to close/detach OpenGL compopnent without hang. How you did it like that? I strongly recommend to sit calmly and polish all these things again.


#16

My answers, which you’re being so critical of, are based on the fact that you said this happens when you launch other apps.

If you’re right about that, then the only thing that can cause two apps to influence each other to that extent MUST involve a driver, or other OS-level process over which we have no control. It doesn’t matter whether other apps have the same problem or not, because if it were to be e.g. a driver bug, then it’s naive to expect that all apps would behave the same way.

If you’re wrong about other processes being involved, then for us to help, you’d need to either

  • debug it and give us more of clues about where it’s getting stuck
  • OR give us a decent piece of test-code which so we can try to reproduce it

And maybe look at this thread:

Based on the content of this thread, no, there’s not enough clues for me to understand what’s happening. I’m not psychic.

If you search far enough back, every juce bug ever found is in the forum somewhere. Doesn’t mean they’re still there.


#17

Jules, honestly I didn’t want to offend you. I rate your work very high! It’s a good example for me too as I really like your design. I just wanted to encourage you to scan what you know best for potential bottlenecks in my case. E.g. maybe for my case we can remove some listeners or commands from Main thread to OpenGL thread.

There is no problem when I test the Juce window App in debug mode from VS. It’s excelent. But the reality is inside the zoo of some complex ecosystem: 1) 3D scene editing in Cinema 4D, 2) parallel scene rendering by several GPUs, 3) parallel image processing and pixel buffer creation (also quite costly), 4) In parallel renderOpenGL which just takes the pixels. From all of this the renderOpenGL() thread is the smallest from compute point of view.

I have made hundreds of tests today wrapping everything relevant with time evaluations and it seems that most often glTexSubImage2D stalls for few seconds (also stalling Main thread). The actual image is not large usually ~4MB. Less frequently the stall happens in SwapBuffers(). It happens rare and randomly. Sometimes without even other app launch, just inside our zoo. It’s not very annoying already, but we would like it to be ideally fluent. Btw, it was quite annoying/a lot more often when all things were uncommented in rendeFrame() code I listed above. Maybe you have some ideas.
The strange thing is that I call glTexSubImage2D inside renderOpenGL() to refresh the screen texture content quite rare once per 200ms initially and when user doesn’t change scene for some time the refresh rate increases. Usually it happens when user does something with mouse very actively. The stall in glTexSubImage2D() also blocks main thread. For this reason I have asked if some internal thing listens, waits or does something else with renderOpenGL() I have evaluated the message processing loop with time evaluations and there were no bottleneck there.

Need to benchmark things more and will get back to you. I think about PBO, but it can be a problem since it eats more memory and we support 10K image rendering for crazy people, and image size may also change dynamically leading to uncontrolled memory fragmentaion. But need to evaluate all this.


#18

…and that’s why encouraging the usage of a good bug tracking system would be appreciated, because in that you can clearly mark a bug as resolved.


#19

Why isn’t there a bug tracker for JUCE? What bug-tracker do you guys recommend or use for your own projects?