Is Path slow? What is the best tool for spectrum analyser graph?

The higher FFT sizes really only improve the low-end precision, the added precision in the high-end is really not needed - especially if you’re displaying the frequency content with a log scale.

With an FFT size of 8192, and a sample-rate of 48kHz, you’ll have a frequency bin every 48000/8192 = 5.86Hz (I think my math is right? please correct me if that’s wrong). In the high-end, 6Hz difference is completely undetectable by human hearing (10kHz - 10.006kHz for example) so you really don’t need that level of precision while at the low-end 6Hz is quite noticeable.

In GT Analyser I allow users to select of block size of anywhere from 1024 to 65536 but I have a fixed number of points of 256 (which is actually a max because repeated points are omitted) which still gives good precision across the spectrum.

1 Like

That makes more sense now. So in theory you really are plotting only as many points/pixel as the window width…(obviously in log scale)

1 Like

last question…do you use in the OpenGL context or just the pure JUCE rendering and Path drawing?

I experimented with simply attaching an OpenGLContext to the plugin’s UI but found the CPU usage actually increased.

1 Like

I think when I’ve done it in the past I determined what the minimum distance in the x-axis is between two points, then for every point in the spectrum between those two pixels I take the maximum value to determine the y-coordinate. This then works nicely regardless of zoom level etc.

As @ImJimmi says I’ve also seen an OpenGL context increase CPU, however only on Windows. On macOS, it generally seems to improve things. In the past we had an option in the plugin to enable the OpenGL context, we would default it the option on for macOS, and off for Windows.

2 Likes

I just tried Kotlin Compose. And it looks like the results are disappointing for Juce :frowning:

Apple M1 test:
Kotlin Compose - 32768 lines [512px, 256px] with gradient ~60 fps - 33.2% CPU
Kotlin Compose - 98304 lines [512px, 256px] with gradient ~60 fps - 34.7-37.8% CPU
Juce strokePath - 32768 lines [140px, 96px] with setGradientFill ~60 fps - 64.4-72.3% CPU

As you can see, the number of lines is 3 times larger and the window size is 3 times larger, and the load is still several times less for Compose. I suspect there’s a Skia under the hood. Something is clearly wrong with strokePath. Any solutions to reduce CPU load/fps ? p.s. I don’t want to use openGLContext.attachTo - this works with friezes and memory increases significantly.

[Edit] Ohh, and that I forgot to say, that ~12% cpu in Compose goes to wave[i] = Random.nextFloat(), so in fact the load of drawing will be much lower. While in juce I just load static values from wave.

How are you handling this in juce?

All inside paint()? What triggers the repaint? Show the component code if you can.

I’ve been toying with this stuff and may have a bit to offer, depending on where you are starting from.

First I call repaint with a Animator. This gives me smooth and stable 60 hz

juce::Animator repaintAnimator = ValueAnimatorBuilder{}
                .withValueChangedCallback([this](auto) { waveTable_.repaint(); })
                .runningInfinitely()
                .build();

Then draw lines like this:

path_.clear();

                for (std::size_t lineIndex = 0; lineIndex < linesCount_; ++lineIndex) {
                    const float morphRatio = static_cast<float>(lineIndex) / linesCountMinOne_;
                    const auto offset = static_cast<std::size_t >(morphRatio * morphResolutionMinOne)
                                        * SampleBank::PhaseLength;

                    // start point
                    {
                        const float value = samples_[offset];
                        const float y = lineY + (-value * waveHeight_[0]);
                        path_.startNewSubPath(0.0F, y);
                    }

                    for (std::size_t x = 0; x <= widthInt; x += LineResolution) {
                        if (x > widthInt) {
                            x = widthInt;
                        }

                        const auto xFloat = static_cast<float>(x);
                        const auto samplePos = static_cast<std::size_t>(portion * xFloat);
                        const float value = samples_[offset + samplePos];
                        const float y = lineY + (-value * waveHeight_[0]);

                        // lineTo
                        path_.lineTo(xFloat, y);
                    }

                    lineY += waveHeightDistance_;
                }
                g.setGradientFill(gradientColor_);
                g.strokePath(path_, PathStrokeType(LineThickness, PathStrokeType::mitered, PathStrokeType::butt));

I have linesCount_= 32 and LineResolution=2 (i.e. I’m skiping 1 pixel to save more CPU)

Basically a similar code in OpenGL that I have uses just 15% of cpu, but as I said I can’t use it because:

  1. Memory is huge
  2. Resizing not perfect
  3. It overlap other components and I can’t draw even popup on top of it
  4. Background color on secondary monitor not matching a bit and it looks ugly
  5. Not smooth lines

That’s why my way is only paint(Graphics)

1 Like

Woops. I just realized that Compose on iPad runs this at 120 fps, and Juce Animator 60 fps%( So the situation is even worse.

Example with Compose:

Have you profiled to see where the time is spent?

100% the place to start

From my own experiences with profiling path rendering on macOS (using the animated waveforms on the home page of the Demo app, upping the timer rate to 1000Hz to force continuous updates), the biggest bottleneck is copying data from the juce::Path to the CoreGraphics Context.

See JUCE/modules/juce_graphics/native/juce_CoreGraphicsContext_mac.mm at 61a03097ec9e01693c87ac71935e97b9714cff1a · juce-framework/JUCE · GitHub.

If instead we could have a way to construct a path that adds to the context directly, we could remove this costly copying.


That being said, there’s still things you can do on your end to make things more performant - e.g. you should preallocate memory using juce::Path::preallocateSpace() after calling path_.clear() to ensure you’re not having to re-allocate as you build up the path.

3 Likes

I guess I see the same that @ImJimmi is talking.

[Edit] Btw. I just removed setGradientFill and replaced with setColor temporary. And got now 120 FPS. But the CPU load is 72% :slight_smile:

This is what I’m drawing:
IMG_7142

2 Likes

Worth baring in mind that Release builds will be much better optimised. You could try a debug build with -O3, and/or RelWithDebInfo and profile again

1 Like

I’m already on -O3. Same results with OFast. In debug it will be 4000% :slight_smile:

3 Likes

Am I correct in that you want create and stroke 1024 * 32 paths at 60 frames a second? If so, I think you are going to find this a performance challenge no matter the tech/platform (re: seeing high mem with opengl).

For context, I’ve done a lot of performance optimization, now in JUCE and before that professionally for over a decade — IMO regardless of platform you will end up needing to get creative and “cheat” to get something performing well. Basically, find ways to draw less or less often or reuse what you’ve already drawn. Even more important if others will run your app (you have a decent M1 machine and others may still be on older machines).

The “get creative” part depends 100% on your end goals and what compromises are acceptable for your use case. I wrote these based just on your vid, so might not all make sense for you:

  • Futz with details: See if you get better performance with 1 path per lineIndex vs. one big path with subpaths (make sure to preallocate enough space). See what the impact of removing the gradient fill is. What is it like if you paint to a single channel image first, then composite up to argb? It’s worth it to know all these pieces of information for your specific use case, as it will inform your options.
  • Performance goes hand in hand with UX. It seems like you are trying to display 32 realtime channels of waveform data — what should the user be able to pick out and attend to? The waveforms in the video are tiny in height and going fast, so is it mainly “vibes” or is there something technical you want to highlight? If so, focus on improving that and usually performance will follow.
  • Try draw n’ clear. instead of every sample changing, you could draw each path starting at x=0 and drawing over 1024 frames to to x=endOfScreen so it “builds up” over time. When you get to the end, you wipe the screen and start from x=0 again. This is a different vibe, but gives you opportunities to draw very small slice at a time and might help users pick out trends.
  • Sliding Cached Image. Instead of recreating the entire image from paths each frame, you might have better luck having a cached image 1.5 or 2x wider than its component viewport, adding 32 new points to the image each frame, then translating the image left in a container. This might require also storing the last y positions so you can create a new path with just the new data. Every X frames you’ll need to copy the right side of the image to the left.
  • Data preprocessing. [Edit: relevant to vid more than screenshot]. Your LineResolution is a step in the right direction, but ideally the data would be manipulated and reduced. Think of what a DAW does and why it does it (it reduces waveform data depending on scale for performance and UX reasons).

Logic Pro - 2024-07-24.04

Something is clearly wrong with strokePath

g.strokePath generates a new path per call and then fills that new path. If a stroked path is reused across paint calls, it’s more efficient to manually do juce::PathStrokeType (2.0f).createStrokedPath and cache that path in the component across calls.

Good luck! It’s a legitimate tough problem but I think you have lots of opportunity to tame it if you focus on reusing what’s already been drawn.

1 Like

@sudara
The problem is not so much finding a clever solution, the problem is that Kontlin Compose draws it with ease and with even more quantity. Here I use only 32 lines, but there with Kotlin I even tried 120 lines and it doesn’t matter it still has low CPU usage and FPS > 60 (Mac M1) and Ipad >= 112-120. Also GPU usage 0.1%. That is strange as for me.

Example with Kotlin Compas:


1 Like

On the same platform with the same code structure?

Regardless, my tips will be useful if you move forward with JUCE. You’ll have to manually optimize for your use case. Sliding image would be my first bet based on your last posted image.

Yes, everything is on same platform. The code is also similar and simple. It just reading values from the float array. I would even say Kotlin is less optimized, since every frame I just make a new array with Random.nextFloat(), because I’m only testing it… But in the example with Juce, I read static data that I don’t even change yet, I call just repaint().

I don’t know what they did there, I just remember trying Flutter and it was terribly slow with a cpu of 90-100%.

The reason why I need to update all 32 lines and not using setBufferedToImage(true) with drawing secondary image on a top - that’s because when FM Modulation will be changed, I need to reflect it. btw. same as in Ableton WaveTable.

{Edit} I also just tried to prepare Path in the constructor and call in the paint only g.strokePath(path_…) - nothing changed. Same CPU load. So the cost of building path no effect here.

just reading values from the float array

Do you how are those values converted to filled lines on screen? My guess is custom GPU stuff? JUCE paths are very 1:1 wrapping an underlying macOS CG implementation.

This is what I have at the moment.

juce::Animator repaintAnimator = ValueAnimatorBuilder{}
                .withValueChangedCallback([this](auto) { waveTable_.repaint(); })
                .runningInfinitely()
                .build();
void paint(Graphics &g) noexcept final {
            if (path_.isEmpty()) {
                return;
            }

            g.setColour({80, 80, 80});
            g.strokePath(path_, PathStrokeType(LineThickness, PathStrokeType::mitered, PathStrokeType::butt));
        }

        void resized() noexcept final {
            prepare();

            const auto widthInt = static_cast<std::size_t>(getLocalBounds().getWidth());
            const auto heightInt = static_cast<std::size_t>(getLocalBounds().getHeight());

            if (sampleLength_ == 0UL || widthInt <= 0UL || heightInt <= 0UL) {
                return;
            }

            const auto space = static_cast<int>(linesCount_ * widthInt * 3);
            path_.clear();
            path_.preallocateSpace(space);

            // Create path
            constexpr std::size_t morphResolutionMinOne{SampleBank::MorphResolution - 1};
            const auto portion = portion_;
            float lineY = waveHeight_[0] + kLineSize;

            for (std::size_t lineIndex = 0; lineIndex < linesCount_; ++lineIndex) {
                const float morphRatio = static_cast<float>(lineIndex) / linesCountMinOne_;
                const auto offset = static_cast<std::size_t >(morphRatio * morphResolutionMinOne)
                                    * SampleBank::PhaseLength;

                // start point
                {
                    const float value = samples_[offset];
                    const float y = lineY + (-value * waveHeight_[0]);
                    path_.startNewSubPath(0.0F, y);
                }

                for (std::size_t x = 0; x <= widthInt; x += LineResolution) {
                    if (x > widthInt) {
                        x = widthInt;
                    }

                    const auto xFloat = static_cast<float>(x);
                    const auto samplePos = static_cast<std::size_t>(portion * xFloat);
                    const float value = samples_[offset + samplePos];
                    const float y = lineY + (-value * waveHeight_[0]);

                    // lineTo
                    path_.lineTo(xFloat, y);
                }

                lineY += waveHeightDistance_;
            }
        }