Cuda phase vocoder

You are probably very right Xenakios. It’s a very complicated algorithm with lots of interleaved parts difficult to separate that’s for sure.
Taking this one as an example …
http://blogs.zynaptiq.com/bernsee/pitch-shifting-using-the-ft/

Inside this loop for example
/* main processing loop */ for (i = 0; i < numSampsToProcess; i++){

Is it possible to divide up blocks of samples and send them out to processes ?
In theory they could be reassembled in place in the master process.
Perhaps this would allow larger or smaller frames or buffer sizes to be processed and improve performance.
Easy to say but probably doesn’t help in practice and a bugger to figure out …
Someone smarter than me might be able to parallelize some of those calculations without creating static.
That would certainly be on my wish list a simple JUCE pitch shift or pitch detection with FFT classes I could use …
Sean