Pointer/Reference syntax issue? Why is this giving me a read access violation?

mikej · April 24, 2021, 4:07am

I am trying to get my first attempt at Thread use off the ground as summarized in this thread. I am first trying to run a custom thread in my synth PluginProcessor.cpp and assign the basic renderNextBlock synthesiser processing to that thread.

I need to get a reference to my synth and the buffer into the Thread inherited class object testThread so it can process them there.

My attempt however is giving me a read access violation upon attempting to give the testThread a reference to the usual buffers. I presume this is just a minor syntax error but I am not good enough at C++ to know where it’s coming from.

Here’s what I am running in my PluginProcessor.cpp:

void AudioPlugInAudioProcessor::processBlock (AudioBuffer<float>& buffer, MidiBuffer& midiMessages) {
        
	buffer.clear();
	
  	//OLD RENDER METHOD: 
	//mMpeSynth.renderNextBlockCustom(buffer, midiMessages, 0, buffer.getNumSamples());
	
	//NEW RENDER METHOD:
	testThread.inputReferenceBuffer(buffer, midiMessages, 0, buffer.getNumSamples());

	while (testThread.newBufferToProcess) {
		//wait until testThread is done before finishing this processBlock
	}
}

My Thread class is:

class ThreadInherited : public Thread {

public:
	ThreadInherited(const String& threadName, MPESynthesiserInherited *mMPESynthIn, size_t threadStackSize = 0) : Thread(threadName) {
			mMpeSynthPtr = mMPESynthIn;
	}

	//this is the function that is breaking:
	void inputReferenceBuffer(AudioBuffer<float>& outputAudio, const MidiBuffer& inputMidi, int startSampleIn, int numSamplesIn) {
		*outputAudioBufPtr = outputAudio; //this line gives read access violation
		*inputMidiBufPtr = inputMidi; //this line gives read access violation
		DBG(outputAudioBufPtr->getNumSamples());  //this line gives read access violation
		startSample = startSampleIn;
		numSamples = numSamplesIn;
		newBufferToProcess = true;
	}
	void run() override {
		while (!threadShouldExit()) {
			if (newBufferToProcess) {
				//not even getting this far ...
				mMpeSynthPtr->renderNextBlockCustom(*outputAudioBufPtr, *inputMidiBufPtr, 0, outputAudioBufPtr->getNumSamples());
				newBufferToProcess = false;
			}
			else { //thread running idle
			}
		}
	}
	
	bool newBufferToProcess = false; //to tell when it needs to process or if it is done

private:
	MPESynthesiserInherited* mMpeSynthPtr;
	AudioBuffer<float>* outputAudioBufPtr;
	MidiBuffer* inputMidiBufPtr;

	int startSample = 0;
	int numSamples = 512;

};

I’ve explored the threading issues a bit in the other discussion and I know it can be complicated but in this case it seems like I’m just not getting the buffer reference into the Thread properly because it’s not even getting to the point of trying to process the buffer. It gives me a read access as soon as I try to even put the buffer references into the thread.

So I presume this is just a syntax error of some kind regarding pointers/references but I’m not sure what/why it is happening. How do I get a working pointer/reference to the buffers inside my testThread here so I can even run simple functions on them in Thread like getNumSamples()?

Thanks.

benvining · April 24, 2021, 4:21am

Why are you even passing the buffer to your worker thread? Do you need any input audio from the host in order to do the synthesis?

If the worker thread’s job is just to generate audio, then all it really needs to know is how many samples the processor wants*. You shouldn’t really be passing AudioBuffer objects back and forth.

*and in reality, the worker thread could/should be working into the future, rather than only making one sample when the processor needs it.

What I would be tempted to try is this:

make a private audio buffer in the thread class that only it owns and works with, this is where it will store the synthesized samples
pick an arbitrary buffer size, maybe 512 or 1024
as soon as the thread gets started, begin generating your audio and writing samples to this buffer. Stop once the buffer is full.
From the processor side, take samples as you need them (this requires a FIFO)
in the worker thread, as soon as it sees that its storage buffer isn’t full anymore, start generating more samples again

benvining · April 24, 2021, 4:23am

I know you’re using this “one main worker thread” idea as an educational experiment to understand threads a bit better, but I do want to point out that this:

makes all of this literally pointless. I would be shocked if this doesn’t run waaaay slower than without the threading attempt (as it’s written now, I think this would hang your computer, and maybe crash your DAW).

The only reason to introduce more threads is if you have more work you can do while that other work is being done in the background. So if your plan is literally “wait for this thread to do its work”, then multithreading is useless for that scenario.

mikej · April 24, 2021, 6:19am

Thanks again Ben. Very helpful as always. You’re right that the output buffer I’m passing in is empty so it doesn’t really matter about that. But I would need to pass in the midiBuffer.

I switched it to a non-pointer/reference method in the Thread and the read access violations went away.

I can now render the synth successfully on its own custom thread using the following code. However, it seems terrible inefficient. I can now only run the synth around 70% of the capacity that it was running when single threaded.

My working Thread functions are:

void inputNewBuffer(AudioBuffer<float> outputAudio, const MidiBuffer inputMidi, int startSampleIn, int numSamplesIn) {
		outputAudioBuffer.makeCopyOf(outputAudio);
		inputMidiBuffer = inputMidi;
		startSample = startSampleIn;
		numSamples = numSamplesIn;
		newBufferToProcess = true;
	}

void run() override {
		while (!threadShouldExit()) {
			if (newBufferToProcess) {
				mMpeSynthPtr->renderNextBlockCustom(outputAudioBuffer, inputMidiBuffer, 0, outputAudioBuffer.getNumSamples());
				newBufferToProcess = false;
			}
			else {
			}
		}
	}

AudioBuffer<float> returnBuffer() {
		return outputAudioBuffer;
	}

In PluginProcessor, since it only runs processBlock once a block, the only easy solution I could think of within the current architecture was to let my output fall one block behind constantly:

void AudioPlugInAudioProcessor::processBlock (AudioBuffer<float>& buffer, MidiBuffer& midiMessages) {

	buffer.clear();

	AudioBuffer<float> bufferCopy;
	bufferCopy.makeCopyOf(testThread.returnBuffer()); //return the last block's output buffer

	testThread.inputNewBuffer(buffer, midiMessages, 0, buffer.getNumSamples()); 
	buffer.makeCopyOf(bufferCopy); // copy the last block's output buffer into this output buffer
}

It definitely works with smooth audio output. But it is costing a massive amount of CPU/inefficiency. I had to cut my synth processing level to ~70% of the previous level to get it to render smoothly. I have the testThread set at priority 9 which is the same as JUCE’s real time audio priority.

What is the reason for this massive cost for such a simple change? I haven’t put any expensive routines in for syncing the threads or splitting up voices or anything really. All I’ve asked is that the renderNextBlock take place on another thread. All my cores are identical speed. So what would be causing all this inefficiency?

The buffer copying is trivial. So if the renderNextBlock takes place on one core or thread vs. another, what does that actually matter? Why would that be so problematic and looking at just this simple example, would there be any solution? Or is this just the inherent expense of multithreading somehow?

Thanks again. At least you taught me to run a thread successfully within 24 hours of me first starting to try, so that’s something and I am grateful.

benvining · April 24, 2021, 6:59am

I’m glad to hear it’s working, at least!

That’s what I’ve been trying to tell you this whole time about this approach! It will be slower, by a lot, than sticking with the main audio thread. Unless you spend years becoming a low-level C++ expert, your attempts at multithreading WILL be slow. Plain and simple.

You’ve still got allocations happening in several places in your code:

mikej:

void inputNewBuffer(AudioBuffer<float> outputAudio, const MidiBuffer inputMidi, int startSampleIn, int numSamplesIn) {
		outputAudioBuffer.makeCopyOf(outputAudio);
		inputMidiBuffer = inputMidi;
		startSample = startSampleIn;
		numSamples = numSamplesIn;
		newBufferToProcess = true;
	}

in the first line, makeCopyOf will resize outputAudioBuffer, and potentially allocate memory. But, you don’t need to be copying or resizing here. The very first thing you do in your processBlock is buffer.clear(). The buffer being sent in to inputNewBuffer() has no sound in it. And even if it did, you are not doing anything with that sound.
I think your thread class should have a prepare() method that pre-allocates its internal buffer to hold a maximum number of samples. This function should be called from the processor’s prepareToPlay().

The real purpose of calling the inputNewBuffer method is to tell the thread class, “I’m ready for some more samples now”. That function should not take an AudioBuffer as an argument (especially by value!), because you don’t need it. All you need from that buffer is to know the number of samples, so make your function have an integer argument, and then send in buffer.getNumSamples() from processBlock().

Then there’s your processBlock() code:

mikej:

void AudioPlugInAudioProcessor::processBlock (AudioBuffer<float>& buffer, MidiBuffer& midiMessages) {

	buffer.clear();

	AudioBuffer<float> bufferCopy;
	bufferCopy.makeCopyOf(testThread.returnBuffer()); //return the last block's output buffer

	testThread.inputNewBuffer(buffer, midiMessages, 0, buffer.getNumSamples()); 
	buffer.makeCopyOf(bufferCopy); // copy the last block's output buffer into this output buffer
}

Here you are creating and allocating a new audio buffer every time processBlock is called. Instead, your processor should have an AudioBuffer as a member variable, perhaps outputStorage, which gets pre-allocated in prepareToPlay. Then, in processBlock, I would do

const auto temp = testThread.returnBuffer();
outputStorage.copyFrom (0, 0, temp, 0, 0, temp.getNumSamples());

But really, this is not ideal either, because you’re still returning an entire AudioBuffer object by value with testThread.returnBuffer().

Really, what you should do is create an audio FIFO in your test thread, so that the processBlock bit becomes this:

outputStorage.clear();
for (int s = 0; s < testThread.getNumReadySamples(); ++s)
    outputStorage.setSample (0, s, testThread.popSample());

Based on your code and the language you use, I have a sneaking suspicion that you’re thinking about all audio processing as “working on a buffer” and then that “buffer” being passed around. But in reality, what gets passed on from one thing to the next in the signal chain is the sample values – a “buffer” is just a container that can hold some sample values, and your buffers should always be preallocated ahead of time and never resized or copied by value during the real time audio rendering.

And this doesn’t even mention the other architectural problems here – right now, you’re simply calling the test thread’s inputNewBuffer method and its returnBuffer methods from the processBlock – which is now in a different thread. So at the instant you call either of those from the processBlock, the test thread itself could be in the middle of executing its rendering or performing some other logic. This will lead to data corruption and probably some fun unintended consequences.
To really get this to work properly, you’ll most likely need to implement a FIFO not just for the audio samples, but also for the actual commands getting passed from the processor to the test thread. Something like, the thread has a FIFO of messages, and the processor adds a new one to it, and in the thread’s run(), once it’s done rendering samples it executes all the messages in the FIFO.

Like I said in the other thread, this will quickly become a very very deep rabbit hole. It will not be simple to get it working optimally.

Like I told you in the other thread (ha!) the audio thread set up for you by the host is an OS-level high priority thread. It has scheduler privileges that your custom threads never will. The main audio thread is always, always, always the safest place to do real-time critical things – any other threads on your machine may be interrupted, paused, waiting, or even killed in favor of letting the audio thread do its work.

daniel · April 24, 2021, 8:27am

I don’t know how exactly you did that, but it sounds to me that you created a copy, which allocates.

Your approach tries to share data between threads which is not possible with any of these strategies:

a FIFO data structure
locking (should be avoided at all cost)

Have a look at the great talk from Fabian and Dave (watch both parts if you can spare the time, it is well spent):

I haven’t seen a better summary of the realtime and multi threading problems. Even after many years I still learned a lot from this talk.

mikej · April 24, 2021, 8:28am

Thanks Ben. This is all interesting to me whether or not it leads to an easy solution (or any solution at all). I am still learning more about coding/threading architecture which is useful to me either way.

I recognize from your prior post I am copying and passing around a lot of empty buffers and I can clean that up in the manner you suggest. But that is also not the source of my major loss of efficiency. I can leave all those buffer copies in place and the synth still runs at good efficiency on normal threading mode.

You mention also that the thread I’m creating does not have the OS level scheduler permissions. However, either way, if I have loads of free cores all of which can run 4.5 Ghz or so, and they are sitting totally idle, I think then this shouldn’t be the major issue either.

Regarding the subject of the thread I’m creating is desynchronizing from the main thread under duress: This definitely seems to be happening when I increase the synth complexity. When I try to run the multi-threaded version at the level of performance it can usually run single threaded, I do get strange behaviors. Only about 50% of the MIDI data gets through to the synth. The audio comes out but choppy and garbly.

This suggests that the testThread is not finishing by the time its output is taken and it is given a new buffer by processBlock.

However, under normal threaded performance, the synth must also finish the processing by the time the next buffer comes through. At 44100 kHz and 441 sample buffer, for example, the synth has exactly 10 ms to finish processing each buffer either way.

This is true whether I am multithreading or not. And similarly, it would be true whether I did a FIFO approach as you describe or not. Either way, if it can’t get the buffer processed in 10 ms at this speed, I will get abnormal behaviors.

So then the issue would not be whether I am feeding it a full buffer at a time or one sample at a time (if I don’t mind running behind by one full buffer length as it is currently). The issue would be: Why is 10 ms enough for the single thread to complete the task, but not enough for testThread to complete the exact same equations?

Why is testThread so slow? Why can’t it get the exact same math done in the same amount of time?

I think I just figured out the cause now as I’ve been testing and it’s an interesting one. I can see what’s happening with my CPU monitor but I don’t know why.

Here is the evidence. Perhaps you can help me understand what it means:

My cores all idle at 0-3% maximum in Windows with no synth open (hyperthreading is disabled in BIOS so these are all true individual cores).

If I open a synth using the testThread method, it will immediately tax two cores in a balance that adds up to 100% (so one core might get 63%, the other 37%). The numbers will change but it is clearly affecting two cores to an amount of 100% total. The CPU demand starts immediately as soon as the synth opens (even before I play a note and the synth is running). This suggests the testThread is running continuously from opening and the synth in total is being throttled to exactly one core worth of processing power, but split to two cores. Whether a note is running or not, it uses two cores to a max of total 100% power summed from them both.

If I open a synth using the normal rendering method (and without letting testThread start), it opens with 0-3% CPU usage (synth voice calculations don’t start running until a note is played). When I start a note, it uses 100% of one CPU.

So both methods are actually using exactly the same amount of CPU, as the testThread method is not being allowed to exceed one total CPU’s worth of processing.

Would this suggest the problem is then Windows and its thread management? It would seem although the testThread method is splitting the processing to two cores, neither core is allowed to function fully and there is a hard “one core processing cap.”

What do you think this represents? Thanks for any thoughts. I think this unveils the real issue here.

benvining · April 24, 2021, 8:37am

Do you know that for sure? Because as I just explained, you have many allocations happening in your multithreaded code. That can easily cause glitches in realtime audio even with no threading involved.

See above where I said this:

I’d be willing to bet that’s what’s going on here.

I really don’t think you can make that assumption. There are so many problems with your current code that the issue could be many things. It’s likely a problem with the communication between threads, and the data sharing.

That’s not actually how computing works. Just because you’re processing 441 samples, doesn’t mean that your slice of time on the audio thread is equivalent to the real-world duration of those 10 samples. In reality, if your plugin is running as part of a signal chain in Ableton, your time slices on the audio thread might be a few hundred microseconds each. That’s just a guess though, I’m no expert.

I don’t know if that’s the issue. I think everything you’re seeing it do can be explained by data corruption caused by improper sharing of data resources between threads acting simultaneously.

Have you actually spent some time learning about threading, the scheduler, and how CPUs work? Here is a good place to start: https://www.youtube.com/watch?v=Jkmy2YLUbUY

mikej · April 24, 2021, 8:52am

Hey Ben, I don’t know if I remember adding the extra information as an edit - maybe you didn’t see it. But there is definitely some thread throttling taking place that is causing the reduced performance. As I said (though perhaps this was not there when you replied):

My cores all idle at 0-3% maximum in Windows with no synth open (hyperthreading is disabled in BIOS so these are all true individual cores).

If I open a synth using the testThread method, it will immediately tax two cores in a balance that adds up to 100% (so one core might get 63%, the other 37%). The numbers will change but it is clearly affecting two cores to an amount of 100% total. The CPU demand starts immediately as soon as the synth opens (even before I play a note and the synth is running). This suggests the testThread is running continuously from opening (as I request it to) and the synth in total is being throttled to exactly one core worth of processing power, but split to two cores. Whether a note is running or not, it uses two cores to a max of total 100% power summed from them both.

If I open a synth using the normal rendering method (and without letting testThread start), it opens with 0-3% CPU usage (synth voice calculations don’t start running until a note is played). When I start a note, it uses 100% of one CPU.

So both methods are actually using exactly the same amount of CPU, as the testThread method is not being allowed to exceed one total CPU’s worth of processing.

Would this suggest the problem is then Windows and its thread management? It would seem although the testThread method is splitting the processing to two cores, neither core is allowed to function fully and there is a hard “one core processing cap.”

What do you think this represents? Thanks for any thoughts. I think this unveils the real issue here.

It seems definitely the problem is CPU/thread throttling of some kind. Though I’m not sure why it is happening.

benvining · April 24, 2021, 8:53am

OK, well, since you know for sure what the problem is, then I’m sure it will be easy for you to figure out

mikej · April 24, 2021, 8:56am

Hey Daniel, thanks for chiming in. You are always very knowledgeable. I got rid of the data sharing and locking and got the custom thread working, but it won’t operate efficiently. I think I found the issue though I don’t understand why. I just replied as you were typing yourself to summarize it.

In short, with my custom thread running, I can split the processing to two cores as desired, but Windows seems to be throttling me to exactly one core worth of CPU power. (eg. Core 1 might be using 72% at any given moment, and core 2 is using 28%, constantly in flux, but always adding up to ~100%.)

By contrast without the custom thread, the synth lives on only one core which can run up to 100% all by itself.

So it’s no wonder the custom thread is inefficient - it’s not allowing me to access more CPU at all. Whether I use my custom thread or I keep it to one thread, either way the exact same amount of CPU is available to me. And splitting it to two threads likely means the synth processing thread is always getting a varying amount of CPU available to it (which is likely changing slower than if it was all one one core).

Do you know why my synth would be throttled to one CPU worth of processing, even when split over two cores? I am working in standalone mode. Would it be Windows throttling the thread? Or something in my synth?

Thanks

daniel · April 24, 2021, 9:06am

I think @benvining summarised it all pretty well. In a DAW there are no cores “idling around”.
If you attach a debugger to a DAW you will see how many threads are there at work. The threads you spawned in your plugin (btw. how would that scale if the user has three of your plugins open?) are now batteling on OS level with the DAW resources. That’s why in a DAW the multi threading effort of a plugin is in 99% a waste.

If you however run your synth as a standalone, it is much different and multi threading might be a good option.

The problems you are facing (to do yeat another incomplete summary):

concurrent access of data (need to be synchronised via lock or copying)
priority inversion (the audio thread needs to wait for all others to finish)

And as reply to your cpu meters:

CPU meters are a coarse estimate, to get real figures use a profiler
did you measure a release build?

Good luck

mikej · April 24, 2021, 9:17am

I am speaking solely about standalone function right now. I haven’t tested anything in a DAW because as you say, that adds a load of complications to interpretation. Yes I am using a release build. I would find it useful even if the multithreading only works in standalone.

Even with the CPU meters being a coarse estimate as you say they are clearly adding up to exactly two cores being utilized to a total of ~100%. It’s clearly not a coincidental number or balance that is occurring. I can sit here and watch the two cores oscillate on the meter but it’s always balancing out to 100% total.

I don’t think the two threads are waiting on each other at all. Because I’m just copying a buffer into the custom thread and reading it back out at the end of each block, they are quite desynchronized if they want to be, to the point where if I push the synth settings to the max only half the midi notes even get through to the synth, and yet it all keeps running. This shows they are not waiting on each other at all but rather able to run independently to an extent.

I don’t think there’s any concurrent use of the data now either since I’m just copying a midi buffer into the custom thread and an output audio buffer out each time processBlock runs.

I concede I obviously don’t understand all this stuff but it seems strange to me. Something is obviously aware of how much one core is worth, and that something is limiting my synth from exceeding that value in processing. I imagine that must be Windows, since the synth itself has no way to know how much a core is or to balance each thread out so the total never exceeds 100%. Only Windows could have the knowledge to do that.

mikej · April 24, 2021, 9:27am

I should also mention that the CPU usage is the same on my multithreaded version (100% total CPU split between two cores) even if I am not actually running any commands through the testThread. Even if I just let it run with startThread() and do nothing else with it so it never even touches a single buffer, the same limitations apply.

This should clearly prove the issue has nothing to do with what I’m running on the thread, but rather how the thread is existing on its own. (Or what Windows is choosing to do with it.)

benvining · April 24, 2021, 9:30am

When developing any complex software, it is important to test every small part individually before adding things together into a more complex system. That way errors are more diagnosable. Imagine if the developers of Ableton just wrote all their code and didn’t test any of it… they didn’t study DSP, or real time audio, or threading, or anything beyond a few YouTube tutorials, and then wrote all the code that they thought “should” work for a DAW and then put it all together and run it. And when the entire computer goes haywire, say “OK, so how do I fix my DAW?”

Can you see the problem with this approach?

That’s what you are doing.

You’re looking at the entirety of a complex system made up of untested complex parts that you barely understand, and you’re making assumptions about what you think is happening.

Do you know that for sure? Do you even know what if would look like if the two threads were waiting on each other?

Do you know that for sure? Do you know what it would look like if there was concurrent use of data?

Do you know that for sure? It’s entirely possible that what’s happening is something like this (just a guess): your synth is being assigned two cores, one core for the main audio thread, and one core for testThread. Due to some sort of pattern in your code, the threads happen to “take turns” having a higher processing load because, perhaps they take turns trying to copy data the other is working on at that moment? Or something else entirely. My point is that you’re looking at a meter and then concluding, with 100% certainty, “Windows is limiting my synth”. Which may be the case. But do you really think that your multithreading code is well written, and doesn’t need any further improvements?

If that’s true, then it sounds like your synth was getting two cores all along from the OS, and testThread was never actually being run in a separate process at all.

I’m not trying to be rude, but I am slightly frustrated that I have tried to give you some very in depth advice about how your code could be improved, and you seem to be fixated on “oh no I know for sure that the only reason my synth doesn’t work is because Windows is messing it up”

benvining · April 24, 2021, 9:37am

Do you even know what the margin of error for a coarse CPU meter is?

If you’ve never once opened a profiler in your life, how do you know that you wouldn’t see that the numbers are actually 218% and 386%, and that’s just how far off the coarse CPU meter is? You don’t know.

mikej · April 24, 2021, 9:45am

Ben I am not saying my multithreading approach or coding in general is perfect. Far from it. Obviously it is not. However, it is important to identify the correct issue before you try to fix something. For example, if the issue was I needed to fix it to a FIFO approach, then that wouldn’t explain why I am having the same CPU limitation issue even when the thread is not receiving a buffer at all.

The fact that even when the thread is not receiving or processing any buffers of any kind it is still running in this manner suggests the issue is something else, so I need to find what that is.

I am open to improvements to my coding and I have taken a number of your suggestions already like using the built in method to copy the audiobuffers rather than using an = sign. Without your help I still would not even understand what a thread is (as was the case just 1-2 days ago). I also recognize completely how silly all the empty buffer copying is.

But all that said, it is definitely true that if I just run testThread.startThread(); on initialization and testThread does nothing inside of run() except:

void run() override {
	while (!threadShouldExit()) {
	}
}

And no buffers or any other data is fed into the testThread I still get this behavior. The standalone synth is split between two cores which always add up to 100% (eg. 75% on one, 25% on the other, etc.).

Whereas on the other hand if I don’t start the testThread, I will just get it running with one core going up to 100% utilization.

I don’t know what that means except that testThread is not getting the resources it needs to run freely as an independent process. I certainly don’t know how to fix it and I’m very much open to suggestions.

I can run the CPU profiler in Visual Studio but I don’t know how to get a per core output from it if that would help.

benvining · April 24, 2021, 10:00am

If you’re not even sure what this behavior means, then why are you so sure it’s a problem, and why are you using it as a metric?

With threading, many things can happen that are not a direct result of your code, but are a result of the OS/scheduler messing with you… such as context switching to other cores, balancing work time between various processes on the same core, etc.

It’s entirely possible that:

When you have a thread with an empty run() method, the OS figures out it’s just an endless loop doing nothing and messes with your threads, etc
Your synth doesn’t need to use 100% of each core when you’re in multithreading mode, so that’s why they’re each less than 100%
There are errors in your usage and/or implementation of the juce::Thread class

One or all of these things could be the case, it is very hard to guess without seeing your entire source code.

Is it really an issue? Or are you just meter watching and upset that two cores aren’t each using 100%? When you run your test with the testThread that does nothing, do you get any kind of glitches in your audio output, or can the main audio thread handle the normal processing load? Because I suspect that this isn’t actually an issue, it’s just… a weird thing that happens with CPU meters on Windows.

I think it’s highly unlikely that the issues you’re having with the full-fledged multithreaded enabled test are due to Windows limiting your CPU usage. And even if that is a factor, I would be shocked if that was the sole and only reason your synth doesn’t work.

Yes, you do. You are still most likely getting data races between your two threads. I think it is highly likely that if you follow my advice from the previous posts, and fully refactor your multithreading implementation to use a non-blocking FIFO, then you will get much better performance.

daniel · April 24, 2021, 11:12am

That is exactly the case. The Optimiser will remove the loops from inside to outside.

Watch the video at 30:15 (or start with the example before as well). Fabian follows the steps that the optimiser might do (and did in his experiments):

mikej · April 24, 2021, 10:33pm

Hey Ben, I found a way to prove you were likely correct and the CPU splitting I was seeing was not the real problem. By setting the thread affinity for my testThread to a given core, I was able to lock it to one core and let it use 100% CPU of that core. But it was no better for processing efficiency. So your “thread races” theory is likely the answer for the issue. Sorry for doubting you.

I spent the day playing with it and trying set up separate buffers for the thread. I built a circular midi and output audio buffer within testThread for a static size of 4096 samples (meant to be bigger than a typical audio buffer so won’t run out of room).

processBlock only runs once per buffer output from the synth, so it needs to get a full buffer out by the time it’s finished. So I told processBlock to each time it runs first input all the new midi data for the block. Then I had my testThread start working through the processing. The processor then asks for the next buffer full of processed samples from testThread to put together the final output buffer for processBlock to finish. To give testThread a head start, I let it return 0’s for the first 1024 samples that were requested of it.

I was able to get some audio output though it’s completely garbled even at low synth processing requirements. The midi seems to be looping itself somehow too. I think the problem is adding an empty event to the midi buffer doesn’t erase the existing entry, so I am just constantly adding new data to the midi buffer and it’s getting busy.

Internally within testThread I set up three timers:

midiAddTimer (too keep track of where midi events are added to the midi buffer),
retrievalTimer (to keep track of which sample index to return from the thread’s output buffer), and
renderTimer (to keep track of which sample index is being rendered at a given time by the synth).

I also used two counters to keep track of what work has been done or needs to be done:

numSamplesToProcess to know how many samples of midi data needed to be processed by the synth and
numSamplesReadyToRetrieve to keep track of how many samples were rendered and ready to be retrieved by the processor.

In theory the idea would be that this gives the main thread and custom thread each a bit more breathing room to desynchronize. It’s not working properly of course though. I’m not sure what I’ve screwed up. Will take some time to sort through. But before I do more of that, I’m wondering: Is this essentially what you were referring to?

Here’s the PluginProcessor code:

void AudioPlugInAudioProcessor::processBlock (AudioBuffer<float>& buffer, MidiBuffer& midiMessages) {

	buffer.clear();
	
	//INPUT MIDI TO CUSTOM THREAD
	MidiBuffer::Iterator midiIterator(midiMessages);
	int midiMessageTime = 0;
	MidiMessage messageToInput;
	for (int i = 0; i < buffer.getNumSamples(); i++) {
		midiIterator.setNextSamplePosition(i);
		midiIterator.getNextEvent(messageToInput, midiMessageTime);
		testThread.inputNewSample(messageToInput);
	}

	//FILL THE OUTPUT BUFFER
	while (writePointer < buffer.getNumSamples()) {
		if (testThread.numSamplesReadyToRetrieve > 0) {
			buffer.addSample(0, writePointer, testThread.retrieveSample());
			writePointer++;
		}
	}
	writePointer = 0;
}

Then my full Thread class is now:

class ThreadInherited : public Thread {

public:
	ThreadInherited(const String& threadName, MPESynthesiserInherited *mMPESynthIn, size_t threadStackSize = 0) : Thread(threadName) {
			mMpeSynthPtr = mMPESynthIn;
			customOutputBuffer.setSize(1, 4096);
			for (int i = 0; i < 4096; i++) {
				customOutputBuffer.setSample(0, i, 0);
			}
	}

	void inputNewSample(MidiMessage midiMessageIn) {
		customMidiBuffer.addEvent(midiMessageIn, midiAddTimer);

		midiAddTimer++;
		if (midiAddTimer > 4095) {
			midiAddTimer = 0;
		}
		numSamplesToProcess++;
	}

	void run() override {
		while (!threadShouldExit()) {
			
			while (numSamplesToProcess > 0) {
				
				//render one sample at a time from renderTimer
				mMpeSynthPtr->renderNextBlockCustom(customOutputBuffer, customMidiBuffer, renderTimer, 1);

				numSamplesToProcess -= 1;
				numSamplesReadyToRetrieve++;
				renderTimer++;
				if (renderTimer > 4095) {
					renderTimer = 0;
				}
			}
		}
	}

	float retrieveSample() {
		float sampleToReturn = customOutputBuffer.getSample(0, retrievalTimer);
		numSamplesReadyToRetrieve -= 1;
		retrievalTimer++;
		if (retrievalTimer > 4095) {
			retrievalTimer = 0;
		}
		return sampleToReturn;
	}

	int numSamplesReadyToRetrieve = 1024;

private:
	MPESynthesiserInherited* mMpeSynthPtr;
	AudioBuffer<float> customOutputBuffer;
	MidiBuffer customMidiBuffer;
	int midiAddTimer = 0;
	int renderTimer = 0;
	int retrievalTimer = 0;
	int numSamplesToProcess = 0;
};

Maybe that’s not the right way to do it but it’s the only way I could think of that made sense. What do you think? Is that the general idea you were suggesting? Thanks again.

Topic		Replies	Views
How to use multithreading so each synth voice renders on a different CPU thread/core? General JUCE discussion	14	2096	April 26, 2021
How Optimize the Performance of a Synth and Sampler - Threading, SMID, bad coding? Getting Started	11	306	May 26, 2025
How do DAWs like Cubase handle threading when there are numerous instances of a synth? General JUCE discussion	16	551	September 20, 2024
Threading - How to rebuild critical processing component from GUI input while processing is ongoing? General JUCE discussion	16	386	August 30, 2024
How many audio threads can I create without problems? General JUCE discussion	29	1455	January 10, 2026

Pointer/Reference syntax issue? Why is this giving me a read access violation?

Purchase

Discover

Learn

Support

About

Events

Pointer/Reference syntax issue? Why is this giving me a read access violation?

Related topics

Purchase

Discover

Learn

Support

About

Events