You’ll need a custom audio processor that collects enough samples, windows this fixed amount (usually 1024, 2048, etc), performs the DFT (I suggest FFTW), you’ll need a custom buffer class to hold the spectral data (or else use FFTWs stuff), apply whatever algorithm you want, then reverse the process. Then you’ll have to hop a certain amount of samples forward in time and repeat the process (usually 1/4 or less of the window size). You may also need to store the last spectral buffer as well (probably).
I remember making a phase vocoder ages ago using the method I described so I know it’s possible, my implementation was garbage though before.
You could trigger the samples with the sampler sound class but as Jules said, it’s not really apart of the processing phase. Whichever way the sound is input, you’ll have to run it through a real-time processor like I described above.