OpenGLRenderer Massive Performance Hit With Multiple Instances


#1

We’re encountering lots of performance issues with juce::OpenGLRenderer on Windows 10 when running multiple instances of our plugin.

After a few instances things start becoming very unresponsive, and with more instances the rendering of the OpenGLRenderer gets slower as well.

Our plugin itself uses component rendering and an OpenGLRenderer, but we encounter the same issue if we disable the component rendering.

Below is a simple PIP that exhibited the problem for us (we ran on Release as a VST on Reaper and Mixbus). Since there is no component rendering going on (meaning we don’t need to acquire a MessageManager::Lock in OpenGLContext::CachedImage::renderFrame()).

I would have thought that each thread could run its rendering without experiencing these issues since they seem to be more specifically related to the use of the MessageManager::Lock, but the PIP example slows down considerably when we start opening multiple instances :confused:

/*******************************************************************************
 The block below describes the properties of this PIP. A PIP is a short snippet
 of code that can be read by the Projucer and used to generate a JUCE project.

 BEGIN_JUCE_PIP_METADATA

  name:             GLTestPlugin

  dependencies:     juce_audio_basics, juce_audio_devices, juce_audio_formats, juce_audio_plugin_client, 
                    juce_audio_processors, juce_audio_utils, juce_core, juce_data_structures, juce_events, 
                    juce_graphics, juce_gui_basics, juce_gui_extra, juce_opengl
  exporters:        vs2017

  type:             AudioProcessor
  mainClass:        GLTest

 END_JUCE_PIP_METADATA

*******************************************************************************/

#pragma once

//==============================================================================
class GLTestEditor : public AudioProcessorEditor,
					 public OpenGLRenderer
{
public:
	GLTestEditor(AudioProcessor &p) : AudioProcessorEditor(p)
	{
		setSize(100, 100);

		context.setContinuousRepainting(true);
		context.setComponentPaintingEnabled(false);
		context.setRenderer(this);
		context.attachTo(*this);
	}

	~GLTestEditor()
	{

	}

	void newOpenGLContextCreated() override
	{

	}

	void renderOpenGL() override
	{
		OpenGLHelpers::clear(Colour((uint32)Random().nextInt()).withAlpha(1.0f));
	}

	void openGLContextClosing() override
	{

	}

	void paint(Graphics &g)
	{
		g.setFont(32);
		g.setColour(Colours::white);
		g.drawText("JUCE", getLocalBounds(), Justification::centred);
	}

private:
	OpenGLContext context;

	JUCE_DECLARE_NON_COPYABLE_WITH_LEAK_DETECTOR(GLTestEditor);
};

//==============================================================================
class GLTest  : public AudioProcessor
{
public:
    //==============================================================================
    GLTest()
        : AudioProcessor (BusesProperties().withInput  ("Input",  AudioChannelSet::stereo())
                                           .withOutput ("Output", AudioChannelSet::stereo()))
    {
    }

    ~GLTest()
    {
    }

    //==============================================================================
    void prepareToPlay (double, int) override
    {
        // Use this method as the place to do any pre-playback
        // initialisation that you need..
    }

    void releaseResources() override
    {
        // When playback stops, you can use this as an opportunity to free up any
        // spare memory, etc.
    }

    void processBlock (AudioBuffer<float>& buffer, MidiBuffer&) override
    {
        ScopedNoDenormals noDenormals;
        auto totalNumInputChannels  = getTotalNumInputChannels();
        auto totalNumOutputChannels = getTotalNumOutputChannels();

        // In case we have more outputs than inputs, this code clears any output
        // channels that didn't contain input data, (because these aren't
        // guaranteed to be empty - they may contain garbage).
        // This is here to avoid people getting screaming feedback
        // when they first compile a plugin, but obviously you don't need to keep
        // this code if your algorithm always overwrites all the output channels.
        for (auto i = totalNumInputChannels; i < totalNumOutputChannels; ++i)
            buffer.clear (i, 0, buffer.getNumSamples());

        // This is the place where you'd normally do the guts of your plugin's
        // audio processing...
        // Make sure to reset the state if your inner loop is processing
        // the samples and the outer loop is handling the channels.
        // Alternatively, you can process the samples with the channels
        // interleaved by keeping the same state.
        for (int channel = 0; channel < totalNumInputChannels; ++channel)
        {
            auto* channelData = buffer.getWritePointer (channel);

            // ..do something to the data...
        }
    }

    //==============================================================================
    AudioProcessorEditor* createEditor() override          { return new GLTestEditor(*this); }
    bool hasEditor() const override                        { return true;   }

    //==============================================================================
    const String getName() const override                  { return "GLTestPlugin"; }
    bool acceptsMidi() const override                      { return false; }
    bool producesMidi() const override                     { return false; }
    double getTailLengthSeconds() const override           { return 0; }

    //==============================================================================
    int getNumPrograms() override                          { return 1; }
    int getCurrentProgram() override                       { return 0; }
    void setCurrentProgram (int) override                  {}
    const String getProgramName (int) override             { return {}; }
    void changeProgramName (int, const String&) override   {}

    //==============================================================================
    void getStateInformation (MemoryBlock& destData) override
    {
        // You should use this method to store your parameters in the memory block.
        // You could do that either as raw data, or use the XML or ValueTree classes
        // as intermediaries to make it easy to save and load complex data.
    }

    void setStateInformation (const void* data, int sizeInBytes) override
    {
        // You should use this method to restore your parameters from this memory block,
        // whose contents will have been created by the getStateInformation() call.
    }

    //==============================================================================
    bool isBusesLayoutSupported (const BusesLayout& layouts) const override
    {
        // This is the place where you check if the layout is supported.
        // In this template code we only support mono or stereo.
        if (layouts.getMainOutputChannelSet() != AudioChannelSet::mono()
            && layouts.getMainOutputChannelSet() != AudioChannelSet::stereo())
            return false;

        // This checks if the input layout matches the output layout
        if (layouts.getMainOutputChannelSet() != layouts.getMainInputChannelSet())
            return false;

        return true;
    }

private:
    //==============================================================================
    JUCE_DECLARE_NON_COPYABLE_WITH_LEAK_DETECTOR (GLTest)
};

#2

The issue seems to be that having multiple instances of these plugins in the DAW means the concurrent OpenGL calls create an interleaved set of commands - causing the GPU to have to switch contexts constantly to carry out what we need. This type of situation is described here in this post under the “Performance” section:

When I run the plugins as standalone applications (or in a host that supports running plugins as dedicated processes), the issue does not occur, but not many hosts currently support dedicated-process plugins.

I had noticed there is the NativeContext::Locker object, but the body of that class only seems to apply to macOS and not Windows or Linux. It seems that would need some sort of synchronization code implemented for these platforms?

Likewise, using setComponentPaintingEnabled(true) should keep these calls in the correct order - since only one context can be holding the MessageManager lock - but with it enabled we still run into the issues described here:

Our Win10 test machine uses an AMD Radeon RX 480 with driver version 22.19.172.769 for reference


#3

Not sure if this is related to your problem, but I noticed if you use a lot of images, you can improve performance drastically by avoiding texture uploads. Yes I profiled this :slight_smile:

void setImageCacheSize (size_t cacheSizeBytes) noexcept;
in OpenGLContext is about 8MB per default IIRC. Every Image is cached as a GLTexture and if the pixeldata is changed it is re-uploaded per frame (or even drawImage() call).

This means, if you use big image strips it can easily exceed the cache size, constantly reuploading new textures per plugin instance.
Also if you use some form frame clipping mechanism, via
Image getClippedImage (const Rectangle< int > &area) const;
the clipped Images are also reuploaded every time, because the ImagePixelData pointer is actually detected as new pixeldata. So you may want to cache your clipped Images before using g.drawImage(…).

Yes, performance could be better. I mean it improved a lot during the last years. But still, compared to a modern 2d game, even with OpenGLContext, the performance for images could be better by a big factor. I understand it’s probably due to compatibility with the software renderer, but is it really necessary to split a g.drawImage() call into many strips (height of area)? Why not use a single quad? The cost of big vertex uploads is probably higher than unnecessary fill rates on GPU, or not?


#4

Actually I just corrected the problem, turns out the issue was all related to the swap interval!

By default OpenGLContext sets the swap interval to 1, but once I set it to 0 everything ran fine. In the case of our plugins I noticed no vertical tearing or anything as well so we will likely end up keeping it off.

I’m unsure of why it caused such an issue, but from what I’ve read while trying to research the problem calling SwapBuffers(HDC) from a non-main thread may cause problems. I’ll have to investigate into that some more.


Performance issue with OpenGL in multiple windows