What is throttling my JUCE application's CPU usage?


#1

I have reached a point where I have the beginnings of a useful synthesizer application and interface from JUCE. However, I note that something is throttling the amount of CPU that this JUCE application can access.

Here are some screencaps of the function when built in standalone mode from Visual Studio. The knob seen here is a “partial count” for an additive sine wave synth. It is currently set at 6 polyphony. That means that at a # partials of 5, there are 30 sine waves running. For a #partials of 100, there are 600 sine waves running.

Case #1: Knob set to 5 (30 sine waves) running normally, using low CPU:

Case #2: Set to >100 partials, garbled output, but still only using ~32% CPU:

The odd behavior I’m noting is that it works perfectly fine up to around 29 on my laptop (desktop is down currently), but past that becomes severely glitchy in its output. Monitoring it in Task Manager and a 3rd party CPU monitor both show that no more than ~32% of each core or my total CPU is ever being recruited. It reaches around this utilization from 35 on the knob and up and never takes more CPU no matter how much higher I turn it.

ie. Something is preventing it from using as much resources as it needs to run.

I tried setting the Performance in Task Manager to “Real Time” in case Windows was throttling it there. But that didn’t help.

What ought to happen is I should be able to turn that Partial Number knob up until my CPU is completely maxed out. Yet it never lets me get there. Instead the sound starts degrading and becoming garbled way before that point (ie. around 34% CPU or >30 on the partials knob).

Is there any reason in general this might be happening? There is nothing in the synth design that specifies any limit to the number of partials, and the degree of “garbling” of the sound changes as I go up and up on partials, which tells me it’s responding to the increased partials in its own way. It’s just not getting the resources to produce them.

Thoughts?

Thanks.


#2

You don’t mention at all how you are multithreading the synth voices, so I am assuming you are not running them with multiple threads. You won’t be able to use a multicore CPU’s full capability without multithreading. If you are using the JUCE Synthesiser classes, those do not have multithreading implemented and you can’t even really add that yourself either. (Because of the way how the synth voices are supposed to be mixed into the shared audio buffer.)

edit : you also don’t mention if you are testing with a release build. Performance should always be investigated with release and not debug builds.


#3

Thanks Xanakios. I have now tried it on debug and release mode. I also tried deactivating Intel SpeedStep in BIOS. That raised the performance at a given CPU %, but it still can’t use more than ~30%+ of my CPU at any setting.

It sounds like a lack of multithreading as you describe is the problem, and that poses a real problem for me. I am using the JUCE Synthesiser classes.

What would be a reasonable solution, if any? Can you summarize or explain what I would need to do to fix this so I can read/learn more about the process?

This is partly sad and hilarious to me, since I initially switched to JUCE from Reaktor solely because I thought JUCE would give me greater efficiency and CPU utilization. Yet now after learning C++/JUCE just for this purpose, it is giving me dramatically worse CPU utilization.

Any pointers on where I can go from here would be appreciated, even if they’re complex or challenging.


#4

Well, instead of wanting to use more of your CPU, you could look for ways to use less CPU in order to be able to render more sines/voices with a single CPU thread. How are you generating your sine waves at the moment? Is there something else going on besides the sine generation and summing? Are you testing with an ASIO driver? What is the hardware buffer size?

I don’t suppose there’s a simple solution for enabling the multithreading, the whole JUCE Synthesiser stack would need to be rewritten to support that. But in any case multithreading shouldn’t be the first thing to try when having performance problems. (Multithreading with audio works best in situations where some prerendering can be arranged, so it’s not ideal for low latency instrument playing use.)


#5

Thanks Xenakios. It definitely looks like that’s the problem. I just tried disabling “Hyperthreading” in my BIOS and it allows me to get this result on my dual core laptop:

According to Task Manager, I’m now maxing out at using 49% of my total CPU (ie. one full core in single threaded mode, which is what I would expect, right?). But yet in the Open Hardware Monitor, it seems to show the burden being split between both of the cores with the highest core only at ~76%.

How does this add up or make sense? If it is being run in single threaded mode now and maxing out a core, shouldn’t that core show at ~100% usage?

As to the basic question of why I would be needing so much CPU: Additive synthesis is inherently very computationally expensive. Sine waves for example can be simplified and made more CPU efficient. But certain other elements like resonant bandpass filters are costly no matter what you do. Modal and additive synthesis methods are my interest. They’re what I enjoy doing. It seems unfortunate to think that in 2018 I can buy a 12 or 18 core CPU if I want, but yet I can’t render any more complex a synth than I could have a decade ago, because I’m limited to one core/thread.

Any further clarification or thoughts very appreciated.


#6

Just a comment on the load of 1 thread being apparently split between 2 CPU cores. That happens because the operating system can migrate threads between the CPU cores. But for example in a dual core system, one thread can use maximum of about 50% of the full CPU. Hyperthreading can complicate the situation further. It usually doesn’t help much in audio applications. It used to even possibly be harmful for performance with older CPU models. On newer CPUs having it enabled likely doesn’t harm the performance but likely doesn’t help anything either.


#7

Thanks yet again Xenakios. That was helpful to clarify the CPU load metering. So in other words, with hyperthreading off, Windows allows an application to max out one processor worth of CPU (so in this case ~49% of total CPU power) and it will then juggle that maximum processing automatically across the available cores at its own discretion.

So if that’s the case, then the only potentially practical way to overcome the processor limitation would be to redesign the synthesizer class to implement multiple threads.

I’d still be interested in brainstorming solutions along these lines. Core counts keep rising every year, but the processing speed per core really isn’t rising so readily.

What would need to be done so that each voice in a current JUCE synth would be allocated to a different “thread” and thus spread to a different core if needed?

I see a basic example of threading in C++ laid out here:

Would that example then automatically allow the three threads it implements to be allocated to 3 cores if needed?

That is a simple example but could I not build something that allocates one thread per voice in a similar way, then wait for all threads to finish each sample, then run something in yet another thread to add all the voices together to create the output, and thus utilize (# voices +1) number of threads/cores when needed?

I’m happy for even any ideas just in the abstract at this point on how/if it could work.

Thanks


#8

I’ve thought a bit more about this problem. A CPU in hyperthreading mode can run two simultaneous threads.

Thus implementing threading in my synth would just allow the CPU core to max out with hyperthreading turned on. (Currently I can only max a CPU core with hyperthreading turned off.) But overall, this would yield no obvious improvement in how much complexity I can implement in the synth. Because either way, Windows will throttle me to one core of CPU max, right?

The only way to implement greater complexity would be to spread the processing out among the cores (ie. 1 voice per core), but that is a different matter from simply instituting threading, right?

If so then I might be stuck with the limitation of one core, and just leave hyperthreading off on my recording computer …


#9

So wait…The computer you are testing on doesn’t even have 2 physical CPU cores? What model exactly is the CPU? (OK, I see in your screenshot it’s some sort of Intel i7, so that should be a real multicore CPU…)

And Windows is not probably “throttling” anything. If you have just 1 thread of execution, you can only use 1 physical CPU core’s worth of CPU. If the machine has a real multicore CPU, the operating system can move the work from core to core, giving the impression the work is happening on multiple CPUs but the total amount of work with 1 thread can only ever use 1 CPU core’s worth of CPU time. So,

  • If you have a 2 core CPU, around 50% is the maximum your single threaded code can use out of the total CPU capability.

  • If you had 2 threads doing useful work and a 4 core CPU, about 50% would likewise be the maximum CPU load possible.

  • If you had 1 thread and an 8 core CPU, you could use about a measly 12.5% of the total capability of the CPU.

“Spreading the processing out among CPU cores” and “multithreading” are the same thing, as far as real multicore CPUs are concerned.

That said, even with a real multicore machine, multithreading the audio processing is no magic cure and can be complicated to implement and may not allow reliable low latency operation. I would strongly suggest looking into ways to optimize the code so that it can run as efficiently as possible in just one thread/CPU core.


#10

Just to clarify Xenakios, that’s not exactly what I’ve experienced. I have a dual core i7 on my laptop I’m using for testing right now while my desktop is down.

As noted in the screencaps from my first post, when I have hyperthreading enabled, the absolute most I can get of total CPU utilization from one synth instance is around ~33%.

In the screencaps I posted later, only when I disable hyperthreading can I get up to ~49% of total CPU utilization from one synth instance (ie. the equivalent of one full core).

Either way, Windows appears to share this burden among the cores.

Disabling hyperthreading seems like an absolute necessity to maximize the CPU I’m able to allocate per synth instance given that I want the most processing power possible.

I can see building the synth to allow hyperthreading will be of no major benefit in this case. The only benefit it would offer is I could get the same ~49% CPU usage out of a single synth instance with hyperthreading still on if I rebuilt the synth to use threading. Since I can more easily accomplish this same goal by turning off hyperthreading, that is what I will do instead.

I of course will try to optimize everything, but as I said, additive synthesis is expensive, and I enjoy it regardless. High CPU requirements are just part of the deal that goes with it.

Thanks again.