drawImageAt slowness

I'm working on a plug-in and I have a frequency response sort of graph which is updated about every 50ms.  I noticed that updating the graph takes a considerable amount of CPU.  In some cases the CPU usage is much more than the DSP processing.  I see similar performance on Mac and Windows.

The bulk of CPU usage seems to be from drawImageAt.  After doing some profiling on the Mac I found most of the time is spent doing color conversion routines.  Here's the profiling info:


Running Time Self Symbol Name
1776.0ms 10.2% 0.0 juce::Graphics::drawImageAt(juce::Image const&, int, int, bool) const
1776.0ms 10.2% 0.0 juce::Graphics::drawImageTransformed(juce::Image const&, juce::AffineTransform const&, bool) const
1776.0ms 10.2% 0.0 juce::CoreGraphicsContext::drawImage(juce::Image const&, juce::AffineTransform const&)
1775.0ms 10.2% 0.0 juce::CoreGraphicsContext::drawImage(juce::Image const&, juce::AffineTransform const&, bool)
1767.0ms 10.2% 1.0 CGContextDrawImage
1766.0ms 10.2% 0.0 ripc_DrawImage
1667.0ms 9.6% 1.0 ripc_AcquireImage
1666.0ms 9.6% 0.0 CGSImageDataLock
1665.0ms 9.6% 1.0 img_data_lock
1659.0ms 9.5% 0.0 img_alphamerge_read
1482.0ms 8.5% 0.0 img_colormatch_read
1130.0ms 6.5% 0.0 CGColorTransformConvertData
1111.0ms 6.4% 0.0 CGCMSTransformConvertData
1111.0ms 6.4% 0.0 CMSTransformConvertData
1111.0ms 6.4% 0.0 CMSColorWorldConvertData
1109.0ms 6.4% 0.0 ConvertImageGeneric
1108.0ms 6.4% 0.0 ColorSyncTransformConvert
1105.0ms 6.3% 1.0 ColorSyncCMMApplyTransform
1103.0ms 6.3% 0.0 AppleCMMApplyTransform
1102.0ms 6.3% 1.0 DoApplyTransform
1096.0ms 6.3% 1.0 CMMProcessBitmap(CMMConversionParams*)
1090.0ms 6.3% 0.0 ConversionManager::ApplySequenceToBitmap(CMMConvNode*, CMMEncoDec&, CMMRuntimeInfo*, unsigned long, CMMProgressNotifier*)
1089.0ms 6.2% 2.0 long ConversionManager::DoConvert<CMM8Bits>(CMM8Bits&, CMMConvNode*, CMMEncoDec&, CMMRuntimeInfo*, unsigned long, CMMProgressNotifier*)
385.0ms 2.2% 0.0 CMM8Bit3ChanNoConvEncoder::DoEncode(CMM8Bits&, CMMRuntimeInfo*, unsigned long*, unsigned long*)
385.0ms 2.2% 385.0 CMM8Bit3ChanNoConvEncoder::InnerDoEncode(CMM8Bits const&, CMM8BitBuffer&, unsigned long*, unsigned long*)
366.0ms 2.1% 0.0 CMMConvRGBToRGB::Convert(CMM8Bits&, CMMRuntimeInfo*, unsigned int, unsigned int) const
355.0ms 2.0% 355.0 vCMMVectorConvert8BitRGBToRGB
11.0ms 0.0% 11.0 CMMConvRGBToRGB::Convert8BitMtxOnlyWithLookup(CMM3x3Type, int*, unsigned int, unsigned int, void const*, void const*) const
317.0ms 1.8% 0.0 CMM8Bit3ChanNoConvDecoder::DoDecode(CMM8Bits const&, CMMRuntimeInfo*, unsigned long)
317.0ms 1.8% 317.0 CMM8Bit3ChanNoConvDecoder::InnerDoDecode(CMM8Bits const&, CMM8BitBuffer const&, unsigned long)

 

Based on this blog (http://1014.org/?article=516) a colleague of mine made some changes that removed the color conversion overhead and seemed to make a big performance improvement.   In juce_mac_CoreGraphicsContext.mm he added the following code:

//==============================================================================
class MacColorSpace
{
public:
    static CGColorSpaceRef GetCGColorSpace ()
    {
        return MacColorSpace::instance ().colorspace;
    }
    
private:
    MacColorSpace ()
    {
        colorspace = CreateMainDisplayColorSpace ();
    }
    
    ~MacColorSpace ()
    {
        CGColorSpaceRelease (colorspace);
    }
    static MacColorSpace& instance ()
    {
        static MacColorSpace sInstance;
        return sInstance;
    }
    
    static CGColorSpaceRef CreateMainDisplayColorSpace ();
    
private:
    
    CGColorSpaceRef colorspace;
};
//==============================================================================
CGColorSpaceRef MacColorSpace::CreateMainDisplayColorSpace ()
{
#if TARGET_OS_IPHONE
    return CGColorSpaceCreateDeviceRGB ();
    
#elif MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_X_VERSION_10_7
    ColorSyncProfileRef csProfileRef = ColorSyncProfileCreateWithDisplayID (0);
    if (csProfileRef)
    {
        CGColorSpaceRef colorSpace = CGColorSpaceCreateWithPlatformColorSpace (csProfileRef);
        CFRelease (csProfileRef);
        return colorSpace;
    }
#else
    CMProfileRef sysprof = NULL;
    if (CMGetSystemProfile (&sysprof) == noErr)
    {
        CGColorSpaceRef colorSpace = CGColorSpaceCreateWithPlatformColorSpace (sysprof);
        CMCloseProfile (sysprof);
        return colorSpace;
    }
#endif
    return 0;
}

 

//In CoreGraphicsContext::CoreGraphicsContext add:
rgbColourSpace = MacColorSpace::GetCGColorSpace();

//In CoreGraphicsContext::~CoreGraphicsContext() remove:
CGColorSpaceRelease (rgbColourSpace);

 

After making these changes drawImageAt is appears to be about 10 times faster on the Mac in this scenario.

144.0ms 1.0% 0.0 juce::Graphics::drawImageAt(juce::Image const&, int, int, bool) const
144.0ms 1.0% 1.0 juce::Graphics::drawImageTransformed(juce::Image const&, juce::AffineTransform const&, bool) const
142.0ms 1.0% 0.0 juce::CoreGraphicsContext::drawImage(juce::Image const&, juce::AffineTransform const&)
141.0ms 1.0% 0.0 juce::CoreGraphicsContext::drawImage(juce::Image const&, juce::AffineTransform const&, bool)
132.0ms 0.9% 0.0 CGContextDrawImage
132.0ms 0.9% 1.0 ripc_DrawImage
115.0ms 0.8% 0.0 ripc_RenderImage
114.0ms 0.8% 0.0 RIPLayerBltImage
101.0ms 0.7% 0.0 ripd_Mark
10.0ms 0.0% 0.0 ripd_Lock
2.0ms 0.0% 1.0 ripd_Unlock
1.0ms 0.0% 1.0 CGBlt_initialize
1.0ms 0.0% 0.0 <Unknown Address>
8.0ms 0.0% 1.0 ripc_GetRenderingState
4.0ms 0.0% 0.0 ripc_AcquireImage
2.0ms 0.0% 2.0 ripc_GetImageTransformation
1.0ms 0.0% 1.0 DYLD-STUB$$CGGStateGetShouldAntialias
1.0ms 0.0% 0.0 <Unknown Address>
5.0ms 0.0% 0.0 juce::CoreGraphicsImage::createImage(juce::Image const&, CGColorSpace*, bool)

 

Does this seem like a reasonable change to you guys?  Can you think of any downsides to doing it this way?  Any chance this could be incorporated into the Juce sources?  

The performance on Windows of drawImageAt also seems to be pretty slow, although I haven't profiled it yet, so I'm not sure what might be the cause.  It would be great to improve that also.

Thoughts?

 

Thanks,

Chris

 

 

Hmm, I think this must be device-dependent or something - I just tried it myself and it made absolutely no difference at all to the rendering times in the juce demo. Does the rendering demo run faster on your test platform?

(..also, it would need to be smarter than just calling ColorSyncProfileCreateWithDisplayID (0) because people do have multiple displays, and those displays may change while your program is running)

So far we've reproduced the problem on a Mac Mini (2013, i7, OSX 10.8.6) and a MacBook Pro ( 15-inch. Mid 2009, MacOSX 10.9.4 version ).  

And I can repro the problem with the Juce Demo using the "2D Rendering" on the "Images: RGB" setting.  It helps too make the difference more obvious if in the drawImageTransformed in ImagesRenderingDemo in GraphicsDemo.cpp is changed to drawImageAt.  More specifically:

from: g.drawImageTransformed (image, transform, false);
to:    g.drawImageAt(image, 0, 0);

When I do that the rendering time averages about 5.0ms on my Mac Mini.  When I make the color space change the rendering time is about 0.3ms, which is comparable to the 10x difference we are seeing in the profiler with the plugin.  The difference does happen with drawImageTransformed, but can be a little bit obscured by some of the overhead in that function.

What do you see when using drawImageAt?

What do you see when using drawImageAt?

There's no measurable difference in performance on my MBP running 10.9, regardless of which monitor I use or the way that drawimage is called.. It's always 0.2ms. Maybe mine is using hardware acceleration for colourspace conversion and yours isn't?

 

My Mac Mini has integrated Intel HD 4000 video, which is of course not the greatest in terms of performance.  I would assume your MBP has a higher performance dedicated video card.  I also tried it on a 2011 13" MBP and a 2014 MBA.  The problem did NOT occur on the Pro but did occur on the Air.  Not surpringly the Air has Intel HD 4000 graphics, and the Pro has a NVIDIA GeForce 320M card.  

Given how common the Intel integrated graphics are on Macs these days I would love to find some sort of resolution for this issue.

Well, if you could suggest some code that would cope with different monitors and changes to the display set-up, and which still causes an improvement on your machine then I'll work with you to figure something out..

Any chance you could add an option to allow the user to draw without color space conversion if desired?  In this case any potenital color rendering issues would be up to the user to deem acceptable or not.  Kind of ugly, but I don't have any better ideas.

Not sure what you mean.. In CoreGraphics it's not possible to create or draw a bitmap without specifying some kind of colour space. I was already using CGColorSpaceCreateDeviceRGB which is about the closest function to "I don't care about the colour space" as I could find.

I guess the option would be "use primary display color space", which would result in no color conversion being applied, since it would already be in the right format.  

That's exactly why I always used CGColorSpaceCreateDeviceRGB(). This is what the docs say about it:

https://developer.apple.com/library/ios/documentation/graphicsimaging/reference/CGColorSpace/Reference/reference.html#//apple_ref/c/func/CGColorSpaceCreateDeviceRGB

 

I suspect the docs are incorrect.  I found this slightly earlier version of the CGColorSpace reference which says:

http://www.filibeto.org/unix/macos/lib/dev/documentation/GraphicsImaging/Reference/CGColorSpace/CGColorSpace.pdf


CGColorSpaceCreateDeviceRGB

Discussion

In Mac OS X v10.4 and later, this color space is no longer device-dependent and is replaced by the
generic counterpart—kCGColorSpaceGenericRGB—described in “Color Space Names” (page 19).
If you use this function in Mac OS X v10.4 and later, colors are mapped to the generic color spaces.
If you want to bypass color matching, use the color space of the destination context.

Colors in a device-dependent color space are not transformed or otherwise modified when displayed
on an output device—that is, there is no attempt to maintain the visual appearance of a color. As a
consequence, colors in a device color space often appear differentwhen displayed on different output
devices. For this reason, device color spaces are not recommended when color preservation is
important.

Availability
Available in Mac OS X v10.0 and later.

The first paragraph seems to totally contradict the second paragraph.  The first paragraph seems to match up exactly with I'm seeing in that color matching is always applied.  I think this is why the "hack" of using the primary displays color space improves performance.  And using kCGColorSpaceGenericRGB color space seems to be equally slow as using CGColorSpaceCreateDeviceRGB.  

My guess is that this was a copy and paste error where the previous text was not removed, and then later someone "cleaned up" the doc by removing the new text instead of the old text.

Thoughts?

 

Really not sure about all this.

I'm reluctant to mess around with the colour space in case people are relying on the way it currently works. And since I can't seem to get the results you're seeing, and because of all this confusion over whether Apple's code actually does what it's supposed to do or not, and because I have a million other priorities right now, I'll have to leave it with you to try to suggest some code that works in your tests, and which I could use!

Makes sense.  I'll see if we can come up with anything that will work more generally.