Hi folks,
I’m working on a live video processing app and have been playing with the CameraDevice class a bit. I noticed something odd, which is that adding a listener resulted in a rather significant CPU and memory penalty, even if the callback does nothing. On my newish MacBook Pro (running Sierra) this means roughly 50% CPU usage just passing frames to the listener, which is well beyond what my app can tolerate in overhead.
After doing some digging, I noticed this call in the Mac CameraDevice Pimpl:
internal->callListeners ([CIImage imageWithCVImageBuffer: videoFrame],
(int) CVPixelBufferGetWidth (videoFrame),
(int) CVPixelBufferGetHeight (videoFrame));
And then in the callListeners method, another conversion using this method:
Image juce_createImageFromCIImage (CIImage* im, int w, int h)
{
CoreGraphicsImage* cgImage = new CoreGraphicsImage (Image::ARGB, w, h, false);
CIContext* cic = [CIContext contextWithCGContext: cgImage->context options: nil];
[cic drawImage: im inRect: CGRectMake (0, 0, w, h) fromRect: CGRectMake (0, 0, w, h)];
CGContextFlush (cgImage->context);
return Image (cgImage);
}
So effectively we’re taking a CoreVideo pixel buffer, converting it to a CIImage, drawing that into CoreGraphics context, and then copying the data into a JUCE Image. There are at least two issues here, one of which is that (in my experience) these context draws are comparatively slow operations, and the other is that all the Objective-C object allocations per frame are costly (much unlike the core JUCE classes).
My question in a roundabout way: would it not be preferable to copy the contents of the pixel buffer directly into a JUCE Image? I did a little hacking on CameraDevice and managed to get CPU usage down to nominal levels with the implementation below:
void callListeners (CVImageBufferRef frame, int w, int h)
{
Image image(Image::ARGB, w, h, false);
const Image::BitmapData bm(image, Image::BitmapData::writeOnly);
CVPixelBufferLockBaseAddress(frame, 0);
memcpy(bm.data, CVPixelBufferGetBaseAddress(frame), w * h * 4);
CVPixelBufferUnlockBaseAddress(frame, 0);
const ScopedLock sl (listenerLock);
for (int i = listeners.size(); --i >= 0;)
{
CameraDevice::Listener* const l = listeners[i];
if (l != nullptr)
l->imageReceived (image);
}
}
Note that since Quicktime defaults to 2vuy pixel format, this requires setting up the capture session to explicitly output BGRA, which avoids having to do any conversions. Otherwise everything seems to be working smoothly.
I’d love some feedback on this because barring a new JUCE release that covers it I’ll probably be writing my own AVFoundation wrapper in the near future and would like to avoid similar problems. But I’m pretty new to JUCE and haven’t done a ton of QT/CG/CV stuff before. Does this look like a workable solution? If so I’d be happy to submit a PR, but if this is janky or dangerous let me know as well.
cheers,
[m]