OpenAI Whisper C++ and JUCE

ans · May 5, 2024, 7:23am

There are very capable open source ML models out there that can be useful for audio projects. The footprint is reasonably small for offline/standalone apps, even mobile. While searching for a C++ binding for OpenAI Whisper, I came across this one:

Unless I miss something, it should be easy to make a JUCE module for speech recognition based on this. The tensors/ML C++ library under the hood also looks very promising:

Together with a reasonably small open source LLM (also based on ggml), 100% voice-controlled audio apps come within reach of the JUCE community.

Has anyone tried this already?

ans · May 6, 2024, 7:00am

What I’m particularly interested in is streaming. Audio input is parsed continuously and a Juce memory stream churns out recognized tokens. There needs to be an audio chunk size of several seconds, because words can’t be disambiguated in isolation. Keeping a context window of 10,000 tokens or so should do the trick.

kerfuffle · May 6, 2024, 9:43am

Whisper works on 30-second chunks of audio. To do this in a real-time context you’d end up with quite a bit of latency.

In “streaming mode”, the technique that’s typically used is to have a sliding window of partially overlapping 30-second chunks, and then you filter out the words from the overlapping parts. See also: Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

You’d need to run Whisper in a background thread. Use the audio thread to fill up a FIFO with 30 seconds of audio, copy it into the Whisper input buffer, run Whisper in the background thread to do inference. Put the output text in a FIFO and read this from the UI thread. This only works well if Whisper runs faster than real-time, which it should do.

moritzsur · May 6, 2024, 9:49am

You might want to take a look at ARA to avoid latency problems

RolandMR · May 6, 2024, 1:13pm

I’ve used whisper.cpp, it’s pretty easy to get it up and running. My idea was to automatically detect words like um, uh, and automatically remove them. However, those tokens automatically get removed. Anybody have an idea how to keep them in?

ans · May 8, 2024, 2:35pm

Thanks. Overlapping 30s chunks this way sounds like a great plan. You can do so much more with Juce and C++ than with Python (not to say it wouldn’t be possible, just not this straightforward and robust).

The LLM that comes behind the voice recognition is the harder part. Training hundreds of functions and parameters for your app is a daunting task. Not sure if that is even possible with current open source (offline) models.

ans · May 8, 2024, 2:38pm

If the recognized tokens come with timestamps you can map them back to the source and identify the gaps between as fillers to cut out.

audiobabble · December 27, 2024, 1:54pm

I’m interested in this area. Regarding the ‘30-second chunks of audio’, I think this is where we should be looking for ‘real-time’ implementation of whisper:

[ Real-time audio input example ]

[ whisper.cpp/examples/stream ]

5amx · April 6, 2025, 8:50pm

Hi, RolandMR, did you was capable to use whisper.cpp along with Juce? I want to create a similar app than yours, a standalone app (or maybe a plug-in) that helps to remove irrelevant words from recorded audio (specifically for podcasts) for a specific client. Is it something too complex?

ramen · April 17, 2025, 12:06am

Hi, I’d like to share an open source project I’ve been working on which integrates with whisper.cpp: Introducing ReaSpeech Lite - Tech Audio

This is a VST3/ARA plugin built using JUCE 8, and it uses a WebView for its user interface. Right now it mainly targets REAPER, but I have been able to run it in Cubase as well.

Please let me know if you have any questions!

Topic		Replies	Views
Realtime audio by JUCE : Newbie Q General JUCE discussion	9	1296	May 12, 2017
Using Juce in 3D engine for Audio processing General JUCE discussion	1	334	January 14, 2011
MP3 support General JUCE discussion	23	12088	May 12, 2017
Anyone interested in doing a few odd-jobs? JUCE Jobs	40	2518	February 1, 2021
Should I Juce? General JUCE discussion	5	1935	August 10, 2015

OpenAI Whisper C++ and JUCE

Purchase

Discover

Learn

Support

About

Events

OpenAI Whisper C++ and JUCE

Related topics

Purchase

Discover

Learn

Support

About

Events