Hey,
I’ve built a VST3 plugin that integrates Stable Audio Open for real-time AI music generation and would love some feedback on the architecture.
What it does:
-
User inputs text prompts
-
LLM generates optimized audio generation parameters
-
Stable Audio Open generates audio (~10s latency)
-
VST handles playback, MIDI triggering (C3-B3), and tempo sync
-
8-track sampler with page switching (A/B/C/D per track)
Technical stack:
-
Frontend: JUCE framework (VST3)
-
Backend: Python server (FastAPI) handling AI inference
-
Audio processing: Real-time time-stretching to match host tempo
-
Communication: REST API between VST and inference server
Architecture challenges I’m tackling:
-
Latency management: Generation takes ~10s — how to handle this UX-wise?
-
Audio buffer handling: Managing generated samples in real-time playback
-
Tempo sync: Stretching AI-generated audio to match host BPM without artifacts
-
MIDI integration: Mapping C3-B3 to trigger samples reliably
Current approach:
-
Asynchronous generation queue
-
Background threads for API calls
-
Local caching of generated samples
-
Simple time-domain stretching (looking into phase vocoder)
Questions for the community:
-
Anyone working on similar plugin-server architectures?
-
Best practices for handling long-running async operations in JUCE?
-
Recommendations for high-quality time-stretching libraries?
GitHub: GitHub - innermost47/ai-dj: The sampler that dreams. AI-powered VST3 for real-time music generation. Generate tempo-synced loops, trigger via MIDI, sculpt the unexpected. 8-track sampler meets infinite sound engine. No pre-made tracks—just raw material you control. Play with AI. Stay human.
License: AGPL v3.0 (open source)
All code is public. Happy to discuss implementation details or architecture decisions.
Thanks!
