Faster Blur? || Glassmorphism UI

Try this out. Usage: blurImage<1>(image); use <2> for a stronger blur. This doesn’t have many options, and I know it won’t cover all the use cases discussed above, but it’s really fast and self-contained.

When set to true, the second template argument does a windows-like aero blur with enhanced contrast, which looks better if you want to overlay a semi-transparent colour on top of the blurred image.

Because both blur parameters are template arguments, the branches are elided, and the values are turned into compile-time constants.

// SPDX-License-Identifier: Zlib
// Copyright (c)2020 by George Yohng

#pragma once

#include "JuceHeader.h"

template<int blurShift, bool enhanceContrast=false>
inline void blurImage(Image &img) {
    if (!img.isValid() || !img.getWidth() || !img.getHeight()) return;
    img = img.convertedToFormat(Image::ARGB);
    Image::BitmapData bm(img, 0, 0, img.getWidth(), img.getHeight(), Image::BitmapData::readWrite);
    int h = img.getHeight();
    int w = img.getWidth();
    for (int y = 0; y < h; y++) {
        for (int c = 0; c < 4; c++) {
            uint8 *p = bm.getLinePointer(y) + c;
            int s = p[0] << 16;
            for (int x = 0; x < w; x++, p += 4) {
                int px = int(p[0]) << 16;
                s += (px - s) >> blurShift;
                p[0] = s >> 16;
            }

            p -= 4;
            for (int x = 0; x < w; x++, p -= 4) {
                int px = int(p[0]) << 16;
                s += (px - s) >> blurShift;
                p[0] = s >> 16;
            }
        }
    }

    for (int x = 0; x < w; x++) {
        for (int c = 0; c < 4; c++) {
            uint8 *p = bm.getPixelPointer(x, 0) + c;
            int incr = int(bm.getPixelPointer(x, 1) - bm.getPixelPointer(x, 0));
            int s = p[0] << 16;
            for (int y = 0; y < h; y++, p += incr) {
                int px = int(p[0]) << 16;
                s += (px - s) >> blurShift;
                p[0] = s >> 16;
            }

            p -= incr;
            for (int y = 0; y < h; y++, p -= incr) {
                int px = int(p[0]) << 16;
                s += (px - s) >> blurShift;
                if (enhanceContrast) {
                    px = s >> 8;
                    px = ((((98304 - px) >> 7) * px >> 16) * px >> 16); // sine clamp
                    p[0] = jlimit(0, 255, px);
                } else {
                    p[0] = s >> 16;
                }
            }
        }
    }
} // blurImage
7 Likes

First of all huge thank you to everybody here! I’ve been in trouble with the JUCE stock drop shadows.

I’m trying to use this implementation by @LukeM1

I have a problem with the spread parameter. It seems to offset the shadow by an amount proportionate to the path’s distance from origin (top left).

Here’s the shadow rendered with no spread at (500, 500).

g.fillAll (juce::Colours::white);
auto box = juce::Rectangle<float> (500, 500, 200, 100);
auto cornerRadius = 15;
auto colour = juce::Colours::black.withAlpha (0.5f);
juce::Path path;
path.addRoundedRectangle (box, cornerRadius);
auto shadow = juce::StackShadow (colour, { 0, 0 }, 30, 0);
shadow.drawOuterShadowForPath (g, path);
g.setColour (juce::Colours::red);
g.drawRoundedRectangle (box, cornerRadius, 1);

Screenshot 2022-10-20 at 17.47.09
Here’s the same with spread = 10

auto shadow = juce::StackShadow (colour, { 0, 0 }, 30, 10);

Screenshot 2022-10-20 at 17.46.39
If i draw the box closer to origin, the offset diminishes.

Am i doing something wrong or is there a bug in this code? Is there a more recent implementation available somewhere?

I believe the problem is with the transforms and specifically the order of the transforms.

In drawOuterShadowForPath:

            auto t = AffineTransform::translation (static_cast<float> (offset.x - area.getX()),
                                                   static_cast<float> (offset.y - area.getY()));
            auto s = AffineTransform::scale (1.f + (static_cast<float> (spread) / static_cast<float> (pathArea.getWidth())),
                                             1.f + (static_cast<float> (spread) / static_cast<float> (pathArea.getHeight())),
                                             area.getCentreX(),
                                             area.getCentreY());

            g2.fillPath (path, t.followedBy (s));

Changing the t.followedBy (s) to s.followedBy (t), meaning scale first and translate after, seems to work.

2 Likes

I’ve updated the pull request with the above change in case anyone else finds it useful (full disclosure, I didn’t test it myself as I don’t have time).

At some point I need to tidy up the parts of that pull request which were borrowed from other places and rebase it from the develop branch to make merging easier. But, you know, time!

1 Like

Thanks for this! This should be in JUCE. I’ve alone wasted enough hours trying to make the JUCE drop shadow match the design. Took me far less with this, counting the time to fix the translation. There are many solutions in this thread, but this was the simplest. At least for me.

Any idea why JUCE isn’t improving the drop shadows, even though all the work has already been done by the good devs on this thread?

I opened a Feature Request asking the JUCE team to prioritize vector UI tooling and improvements. There was a recent fix for a bug with the shadower but no communication from the team so far on planned improvements.

I wonder if anyone on the current JUCE team is building UI/products with JUCE (as they were in the past). If not, they probably aren’t experiencing the same day to day friction that we do. IMO the lion’s share of work building anything real is the UI implemention. I’d love to see the JUCE team hire a UX/UI/design team member and dogfood some first class UI implementations so these things can receive the attention and priority they deserve.

15 Likes

This is very fast indeed. A quick profile showed it to be almost 4x faster than Gin Stack Blur.

1 Like

This one cheats. It only does a single channel, so it works fine for simply shadows, but if you want to blur an ARGB image, it will not be able to do it. Ideally, it should detect the image type and then use the single channel of four channel algorithm.

It definitely works on ARBG images but I haven’t done any tests with images that actually make use of an alpha channel

cc:@gyohng1 @reFX

I can confirm that this works for ARBG images and I am using alpha in those images. If you need the blur to spread even more, you can resample the image down, blur, then resample up again at high quality. You can balance the look and quality of the down/upsample with the performance gained by blurring fewer pixels, so combining these techniques has been really interesting for CPU-based blur and saved me from the complexity of implementing OpenGL (for now).

Yeah, sorry was looking at the wrong project. This one is worse than cheating. You would need to compile one version of each algorithm depending on the blur-size, and then select the right one at runtime. But whatever works for you.

the Battle for the Fastest Blur continues…

Don’t the templated parameters mean that any implementation you use (blur size, contrast) will be baked-in at compile time?

Depends on your use case! I don’t know the structure @reFX works within, but if portability and a sane maintainable api to work in a bunch of developers is your game, I have no argument with his criticism.

It’s inappropriate for an API, but in my case where I have strict control and nobody to answer to (except my future self, who, when this invariably no longer works with whatever pixel format I naively throw at it in 2024, will revisit this comment in shame), it works for now.

As for templating pixel type, you’d have to template the color channel count though and re-write some of the code to use it. It assumes RGBA in the specific layout that work with the pointer math in the method.

It would mean that

blurImage<10> (image);
blurImage<15> (image):
blurImage<20> (image);

would all get stampted out as separate functions. Not the worst thing in the world, but I doubt it’d be a significant performance gain - I’d be interested in seeing it run in a profiler against other approaches.

I looked at templating my approach, since dynamically allocating different amounts of memory based on the blur size was very inefficient.

In the end I settled on capping the blur size to 255 and just allocating the max memory needed. So it’s less efficient on memory usage, but much better for CPU which is the thing that needs addressing.

Doesn’t look like any of the other approaches make use of juce::ThreadPool which I found to have a huge impact, so I’d be interested in seeing if the other approaches have similar gains.

@reFX the algorithm is run sequentially over all 4 channels. It works well with the default JUCE premultiplied ARGB image format.

@ImJimmi
Only values 1 and 2 make sense - anything above that would typically blur too much. This value is passed directly to the bit shift operation. The approach in the above algorithm is not based on a box blur but is essentially a 1st order IIR run forward and backwards. It also does not need a temporary buffer.

I never needed fine control over the blur amount, and the question always is, how many different blurs do you need in your code? But it might still be possible to replace the shift constant with a fixed point multiplication, which may not be bad for performance as long as the multiplication value is cached in a register or used as a constant, for example, by replacing the following code:

int px = int(p[0]) << 16;
s += (px - s) >> blurShift;
p[0] = s >> 16;

with:

int px = int(p[0]);
s = px*blurConstant + (s * otherBlurConstant >> 12);
p[0] = s >> 12;

blurConstant is 0 to 4095
otherBlurConstant = 4096 - blurConstant

(not verified, but should be similar to that)

One way how I can think this code can be improved is by dropping fixed point arithmetic and processing two channels at once with bit masking. Some vector rasteriser libraries do it this way. I.e., working with BBGGRRAA as BB00RR00 and 00GG00AA, masked via 0xFF00FF00 etc - these can handle additions and shifts with some restrictions and post-masking, which might be sufficient for a simple blur, but the blurImage(img) code worked well for me, so I didn’t bother with mental acrobatics to implement the latter approach.

Thanks,
George.

1 Like

This is fascinating George, thanks for sharing your knowledge!

Ahh I see, interesting! I hadn’t looked into how that worked in detail.

I’m inclined to agree - but our designers would not. We currently have this on our backlog because of how JUCE’s current shadow API is very slightly different to that in Figma:

So we’d need that super-fine control to get the shadows to look as close to tools like Figma, Illustrator, etc. as possible.

Shadows - maybe it’s best not to use any blur for it if possible, but draw them using linear gradients (and pixel-aligned radial gradients for the corners) or pre-rendered bitmaps (also based on gradients). What I needed the blur for is to get this effect:

JUCE has a convenient screenshot function for components, which can then be processed by blur and used as a background for the overlay widget. Obviously, this approach doesn’t support dynamic updates of the underlying widget, but in most cases, we could live with it.

/George.

1 Like