Multithreaded plugin crackles host

hi there.
i have (another) little problem.
my plugin renders its “DSP” internally with two threads (JUCE, c++, windows).
basicly i notify two thready when entering plugins process() callback.
wich then render along a semi prallel rendering que

//main rendering thread of the host enters my plugin:
   //....some stuff happens before i enter parrallel processing
   foreach(parallelStages as stage)
      foreach(threads as thread)
         thread.notify() //tells the thread to begin its work.
      //tell the main host thread to wait for the threads to finish stage.
   //...some more stuff serial stuff afterwards

althought the cpu load is very low (3,5%) in reaper and else where
the host/driver produces crackles in the output when playing the plugin live.
the crackles are generated outside the plugins .i tested it with voxengo recorder. the signal leaves my plugin clean and intact.
so its seems my plugin blocks the host or the driver, or some other system component. i have no clue where and what to look for. i tested thread prioritys from highest to time critical. in theory everything should just work fine. i have just one very short lock in the rendering path, just 5 simple type assignment, or so.
the rendering path of the threads seem to work just fine.
when i turn of multi threading the path is exactly the same, and crackling is gone. crackling occurs with one internal thread too. as soon as i use my own thread instead of the hosts main renderng thread i get the crackles.

any ideas on what to look for? any tips?
seems to be neither a race condition nor a deadlock (obviously).



i also asked here:

Which version of juce do you use?


i think a timeout of 500 ms for a realtime thread is too much.

your threads are waiting, then your main processing loop will signal those thread and wait on 2
semaphores, so the threads can wake up and start processing, when they are finished they signal those 2 semaphores
and your main processing loop can finalize. but it must not wait at maximum 500ms for them to be complete, i think
the timeout should be pretty lower here, try to measure how much your threads will take in processing their block.