Ok, I tried out Intel’s FilterGaussianBorder on 500x500px - they don’t seem to have a more efficient blur option and I was too lazy to write a custom kernel for tent convolution.
Overall, it’s slightly slower than Stack on my current machine (AMD Ryzen 9 5900HX), which is maybe to be expected since Guassian does a lot of work…
One note: drawImageAt
seemed strangely expensive on Windows (I showed a milder version here), regularly taking up to 5-10ms to draw the 500x500px image, which seemed a bit suspicious? I saw this on the mac machines too, but it seemed to only do that on first paint or two, so I assumed some allocation or memory cache effect was happening. Notable, because in those cases the drawing of the image is 5-10x more expensive than the creation of it, taking the whole operation it out of the “safely animatable” range of timings (I consider this to be <5-10ms on nicer machines).
The ippi API is pretty gross, requiring in/out vars, many calls to prep things, manual custom allocation/freeing. Had no idea what to choose for the Gaussian sigma (or what’s normal), so I tuned it by eye:
// intel calls the area being operated on roi (region of interest)
IppiSize roiSize = {(int) width, (int) height};
int specSize = 0;
int tempBufferSize = 0;
Ipp8u borderValue = 0;
ippiFilterGaussianGetBufferSize(roiSize, radius * 2 + 1, ipp8u, 1, &specSize, &tempBufferSize);
auto pSpec = (IppFilterGaussianSpec *) ippsMalloc_8u(specSize);
auto pBuffer = ippsMalloc_8u(tempBufferSize);
ippiFilterGaussianInit(roiSize, (radius * 2 + 1), 10, ippBorderRepl, ipp8u, 1, pSpec, pBuffer);
auto status = ippGetStatusString(ippiFilterGaussianBorder_8u_C1R(
(Ipp8u *) data.getLinePointer(0), data.lineStride,
(Ipp8u *) blurData.getLinePointer(0), blurData.lineStride,
roiSize, borderValue, pSpec, pBuffer));
ippsFree(pSpec);
ippsFree(pBuffer);
Anyway, I went down this path originally because I thought it would be interesting to try a IPP/vdsp optimized version of stack blur, but got distracted by these built in functions. Pretty neat those are included. I might eventually try writing a stack blur algo, but it’s back to work for now!