My team-of-two company just released our first ONNX-based product built with Juce! It’s called Deprox, and its main use is to remove proxy-frequency buildup from recordings of singers and speakers.
There are several things I’d like to touch on here:
For those of you looking to integrate onnx and juce, I’ve made a gists for the setup here. We’ll likely use CoreML in our next version for Mac & keep ONNX for Windows, so I’ll make sure to post our CoreML code as well.
I got slammed on Gearspace about our pricing, and TBH I’m not sure the best way to swing it. We anchored on the price of Soothe2 over a year, and as we’re updating the models in the plugin monthly, I thought a subscription would be the way to go. It’s also the only way to offer a free trial via Gumroad. But clearly the price is a turnoff for some of the repliers on that thread. OTOH, the seat-based subscription model is fine for some broadcasters we’re working with here in the Nordics. I’m down with trying out any fair pricing model & still have a lot to learn, so if folks have suggestions I’d love to hear them.
I hope those of you working with vocalists have the chance to give the plugin a spin! One studio here has been using a beta copy for around a month and the results have been really nice. Some examples from those sessions are posted on the Gumroad page
P.S. Many thanks to @ncthom for his inspiring talk on selling audio plugins
Just out of interest, how do you deal with the fact that Onnxruntime is not real-time safe? Do you offload your rendering to a separate thread and use a fallback strategy in case a forward pass does not return in time or don’t you do anything like that and just use it direct from the audio thread, accepting memory allocations?
We use the audio thread and accept memory allocations.
We did a hyperparameter sweep of our models using the Wandb sweep feature and shortlisted only the ones that had consistently-good realtime performance on a Mac M1.
These were almost always the smaller models, independent of their breadth or depth. The smallest one we use in production is 70kb and the largest is 150kb. Anything about 150kb was too unstable, and we searched up into the 500kb range. So there may be a correlation between model size and the number of allocations.
Do you expect your prospects to think “hey, I always wanted to ditch gunk!”?
(I had to look up what it actually means)
I suggest to start with selling a product. Consider subscriptions when you have a good track record and/or sold enough to have a better feeling for good price levels, and $180 per year is unrealistic, imho.
“proxy frequency buildup”… now, I must admit, I have never ever heard of that. also couldn’t google a page for it.
Don’t you mean “comb filter” by it? That would make your plugin a “Intelligent comb reversal fiter”
We re-released the plugin at a $49.99 price point and will offer model updates for $19.99. I’m still learning about pricing - the most important thing is that it’s fair. I’ve also learned that conversations about price and usage is completely different when I’m talking to national radios like Yle vs independent studios and singers, so I need to somehow account for that. One deal I’m currently negotiating, for example, is using it to improve archived audio when it’s asked for on-demand, and ofc that’s completely different than the realtime DSP world (but Juce is equally useful for it).
I also simplified the marketing text so that it hits at the essential. No one ever says “Hey, I always wanted to ditch the gunk”, but if I had a nickel for every time I’ve said “ah crap I wish the singer had stood a bit farther away for that take”, I’d have at least $56.95.
@PaulDriessen I’m pretty sure that the proximity effect is meant. I see the plugin as a kind of AI driven dynamic equalizer, especially targeted at reducing the low frequencies introduced by the proximity effect if I get that right?
We are now using onnxruntime for a few years in some of our AI driven plug-ins. The model size and especially the architecture of the model definitively is the key factor when it comes to average performance. When profiling the code with e.g. Instruments, calls to malloc are definitively a hot path, still allocations are quite fast in average on modern systems, so they wouldn’t be considered a bottleneck in a non realtime-critical context, especially in an heavily optimised framework like onnx. The problem as always with heap allocations is that they can block for much longer than the average case from time to time, causing occasional undeterministic spikes in execution time, especially on a system where some other things are going on. Therefore we decided to not execute any calls to onnxruntime from the audio thread itself and use some tricks where realtime inference is needed, which required us quite a lot of work in the past.
I’m really interested to see how others solve these kinds of problems with AI based audio processing becoming more and more widespread.
One comment regarding pricing etc: I don’t see any obvious option for a demo version of your plug-in. I’d say that this is a must if you want to sell any plugin that costs more than 10$ or so. I personally wouldn’t spend money on software if I couldn’t try it out in my personal projects to see if it’s really as good as the marketing claims.
And last but not least: In my opinion, the product website should show an image of the plugin UI right on the top, since this is the visual thing that people connect to your plugin if they might have seen it somewhere in a review, on a friends computer etc.
So what @PluginPenguin said about the proximity effect (I should have used that term) definitely applies. It’s AI driven EQ. We’ll have another plugin coming out soon that is AI-driven room-reflection suppression (not sure what to call that one… re-prox)?
TBH I’m struggling to define the behavior of the plugin using classical DSP terms. I can see the effect in the waveform, and it does squash frequencies from around 100-220Hz, especially when the “Aggressive mids” setting is on. But mostly we arrived at it by setting up data and the model to faithfully recreate the sound of a singer a bit farther away from the microphone. Here’s a video of one of the rigs we used to come up with it, it was super fun to put together:
Like most ML algos, we are optimizing for a “target”, in this case, good mic placement. It winds up having an effect on gain and the proximity effect, but the goal is mostly to reproduce the phenomenon of being “farther away”. That said, we didn’t add in the room reflections that comb when you’re farther away, so it has an artificial aspect to the farness, but one that sounds (to my ears at least!) quite pleasant.
Have you considered using CoreML? We just started our first experiments with it. It still has gaps (ie no PReLU for 3 or 5 dimension tensors if the values of the slopes ( α-values) are different, which rules out lots of models that use Conv1D). But we’re writing hacks to paper over the gaps and trying to get it to work. If you’re interested in chatting about that, I can definitely sync with you over private messages (it’s not very JUCE-y, so this forum likely isn’t the best spot to discuss that stuff). My hope is that it solves some of the RT flakiness with ONNX.
That’s a good idea about a demo version - we’ll ship a demo mode next week, I’ll ping this thread when it’s done.
And I’ll also tweak the Gumroad page on Monday so that the gif of the plugin in action figures more prominently.
We are building plug-ins for macOS and Windows, so we prefer cross-platform solutions where possible. Still we tried CoreML at some points to see if we can at least get some better performance for the macOS versions of the plug-ins but we had to find out that onnx performed better in our use-cases on both Intel and ARM macs. But of course you should do some benchmarking yourself.
By the way, although my responses here might seem like that, I’m no real expert when it comes to the core AI topics, we have a separate AI team that takes care of developing and training the actual networks. They than export their work in a suitable format so that the C++ developer team, which I’m part of, can integrate them into the actual plugins. I have a rough understanding of all those different kinds of layers etc. but I cannot really comment on how to work around missing CoreML features etc – this is usually the job of our AI developer team Usually I analyse the performance of various approaches on the C++ side and discuss my findings with the AI team which then often come up with optimised versions based on this feedback. Still, if you have any specific questions on that part, hit me up with a private message and I’ll do my best sharing what I can without sharing confidential IP.
on my close-mic’d voice you can hear it right away. One way to think of it is the “reverse-podcast-mic” effect. That is, the proximity effect is something that radio hosts often use to make a voice sound more boomy, but in the context of a musical mix, it’s often tricky to use.
@PluginPenguin just following up on this old thread - we encountered an issue recently where we’ve had to build a static version of ONNX runtime to link on Windows. Have you had to go down this route as well? In general, I’m curious if you have any tips about shipping onnx-based plugins on Windows. Thanks!
To my knowledge, onnxruntime is not really intended to be built as a static library. I remember trying to get that setup working for quite some time but we finally decided to use a dynamically linked build of onnxruntime. This thread has some information on possible ways to deploy dynamic libraries on windows, especially the latest post describes an interesting option that I’d consider if I had to do it again from scratch today. We went the system32 route and put our custom onnxruntime build there under a versioned, company specific name to make sure that it does not infer with possible other versions of the library.
AFAIK Logic on Apple Silicon can only run ARA via Rosetta, and I believe ProTools’s ARA support is currently limited to specific partners. This led us to focus initially on a non-ARA plugin.
ARA is a game changer, and we’re following it closely as it becomes more widely available. That being said, constraints can lead to serendipitous outcomes, and the need to squeeze out realtime performance resulted in several engineering choices we’re pretty happy with & that we’d likely use in the ARA version as well.
getting slammed for the price is well-deserved tbh. 15€/month is a lot, unless it’s like serum where you own the full license after a while. but it seems to be an ongoing subscription forever thing. don’t get me wrong, i totally understand that from a technical pov it makes sense to let machine learning- cloud-things be subscription-based, since you have to run those servers nonstop for everyone’s plugins to work, but the current economy just really sucks and people try to save money more than ever. it’s a really bad time to ask people for subscribing to something financially, especially when it’s just a single plugin that solves a bit of an edgecase problem in music production, even if it does so in the most novel way and better than anything before. the only real solution i see is to figure out how people can run these processes locally, so it’s their own responsibility to have a massive graphics card working for the processing nonstop or whatever it needs.
To my knowledge, onnxruntime is not really intended to be built as a static library.
That’s what I’ve gathered from reading the GitHub issues as well. In the meantime, we’ve adapted NeuralNote’s approach to static linking, although the resulting .lib is huge, which makes iterating on Windows a little more annoying. Delayed loading for the DLL is probably ideal here, thanks for the tip! Meanwhile, on the macOS build we didn’t have to wrestle with library load paths or anything like that, though I’m more accustomed to tooling in *nix systems.
Just to clarify a bit, the plugin runs locally through the CPU. There are APIs in both macOS (CoreML) and Windows (DirectML) that would allow us to make use of the GPU, although ensuring support for different hardware configurations can be a pretty daunting task.
The models are designed with real-time performance in mind (at least given our dependency with ONNX Runtime), so we try to scale them down and employ techniques like float16 inference to keep them fast in the CPU.