Direct2D resources are expensive to allocate and cheap to render. Preallocate and reuse images and gradients. For example, not only does ColourGradient use an Array internally (which allocates heap memory) but Direct2D gradients are cached internally by the renderer itself.
However, both through experimentation and through reading the source code, this is misleading regarding the nature of how ColourGradient is implemented. From what I’ve read in the source, ColourGradient GPU-side allocation is managed as such:
The ColourGradient constructor does not perform any graphics allocation or even interact with the graphics library at all
GraphicsContext::setGradientFill() and its corresponding call to LowLevelGraphicsContext::setFill() is where the resource allocation is performed. Specifically, the fill is instantiated as an ID2D1LinearGradientBrush which is then stored in a least-recently used cache with 128 slots.
My issue is this: the current implementation makes it impossible for the JUCE user to actually optimize the usage of ColourGradient. It is impossible to “preallocate and reuse… gradients” per the docs because the user has zero control over when the gradients themselves are actually allocated— it’s all managed by internal caches in the graphics context. This is all well and good except the moment you go over 128 gradients in a single paint (possible in a large, complex app), because then each paint routine is deallocating/allocating gradients without any way for the user to control that behavior. In my opinion, “use less than 128 gradients” should not be the only optimization option for an end user.
I understand that refactoring ColourGradient to hook into the graphics context directly to mirror the behavior described in the docs would be a breaking change and I’m not requesting that. At the same time, I’d appreciate some added clarity in the documentation and some more flexibility for optimization on the user side. Our team was about to do a huge refactor to start caching all the ColourGradients we use as component members, and it would’ve been incredibly frustrating to find out that that wasn’t having any effect.
If I’ve misunderstood the code or the wording in the docs, I apologize in advance and welcome clarification. Thank you!
Upon further investigation, it turns out I was throwing myself for a loop. Though the ColourGradients were contributing to sluggishness, it appears that using CachedComponentImage in components that are getting constantly repainted was the main culprit in slowing down our program. I’ll post a new thread if I find an issue with that, but luckily there’s a very clear solution: don’t buffer to image on components that are getting repainted each frame.
Sorry for the ambiguity and confusion. I’m responsible for placing that disclaimer in the docs, paraphrasing discussions I had with Matt and the JUCE team.
“Preallocate and reuse” is my general performance mantra for JUCE painting (basically: hold member variables). However, I see why it’s confusing here. I’m no longer clear on whether in Direct2D’s case, holding a ColourGradient as a member offers much benefit.
For now, I’ve removed that section. If a discussion ensues, I will update it with more accurate info.
I took this opportunity to clean up the doc a bit. I hope it was generally useful, despite it sending you on a wild goose chase. These sorts of docs are a new addition with JUCE 8 and it’s obvious they require a bit of tending as feedback like this comes in — thank you and let me know if anything else could use attention.
don’t buffer to image on components that are getting repainted each frame.
Definitely been there before! setBufferedToImage is already pretty unfriendly for an animating components, I’ve found it’s almost always better to just paint what needs to be painted. In addition, I believe it’s a CPU-only cache (which would defeat the purpose of rendering to the GPU).
In my opinion, “use less than 128 gradients” should not be the only optimization option for an end user.
This is relevant to my interests, as I have some gradient-heavy stuff on some pages (~200+ gradients). I’ll have to check if it helps me out to enlarge the cache size…
Thanks for your thorough response and edits to the docs! I didn’t notice anything else on that page that I found misleading. In general, I’d appreciate more specific documentation regarding what resources do and do no allocate GPU memory (and when), but I think that’s outside the scope of that document and might be overkill. I just like overly thorough docs
I may be mistaken, but I believe on D2D the CachedComponentImages do interact directly with the GPU when they’re marked dirty and have to re-render. This was why I was experiencing much worse lag on D2D than the software renderer—the re-renders on D2D were performing a round trip to the GPU to render the cached image, then writing it to the cpu cache, then sending that data back to the GPU for the final render (I could be wrong on that last step).
That process alone makes it almost worth mentioning in the docs in my mind, but at the same time if you follow the mantra of not using setBufferedToImage on animated/rapidly repainting components, it won’t be a problem. If anything, it might be worth adding a stipulation about that to the doxygen for setBufferedToImage. I definitely feel a little silly for not considering that drawing to an image, then redrawing that image every frame would be less efficient, but it seems like a common enough mistake that it might be worth mentioning in the doxygen.
I had the inkling! I definitely am not in the position to write about GPU memory, but would if I could.
I’m also a big fan of thorough documentation. Have found out the hard way that every individual word chosen matters (can clarify/mislead). This is scary and can disincentivize “writing too much” (like in this case), but in my experience the best docs are good precisely because they go through iteration, battle testing, refinement! Because of this thread, I just learned a bunch about the more recent updates to Direct2D in 8.0.2 (which makes get/setPixelAt fast again on Direct2D) — so I have more updating to do
I feel like what the doc is missing right now is a clear presentation of the best strategies and their performance implications for current renderers.
Like is is now still faster to draw rectangles for horizontal line instead of lines?
Is strokePath now just as efficient as fillPath?
In what situations is setBufferedToImage still pertinent?
What image type should be favored?
And so on. What are the current best practices for Windows and Mac renderers.
These are good questions. It’s very easy to get muddied by temporary (e.g. Direct2D at launch vs. >= 8.0.2) or specific situations that others posted about (where the full context and resolution is often not clear).
For me personally, working on my app, I would definitely avoid “walking on eggshells” around the API (ie, I would not avoid lines or strokes generally). In other words, my take on using API is “innocent unless proven guilty with data” — I personally haven’t run into any particular drawing performance issues with Direct2D, it’s been a big performance upgrade for my non-trivial app on Windows. There are for sure going to be specific situations where one implementation out-performs another, but that seems tough to generalize about.
One thing I would feel comfortable (based on my experience) stating is “avoid setBufferedToImage for animated components, it just adds overhead, you’ll have 100% cache writes, 0% cache reads.” That probably does need a doxy update, as more and more people are animating. Beyond that usage disclaimer, it “does the expected thing” everywhere, including the Direct2D renderer — if not, it’s a bug to be reported.
Since 8.0.2, the Image API (such as get/setPixelAt) does the expected thing on Direct2D (immediately modifies pixels in a software image, only flushing main memory to GPU memory once when read by the GPU) — so my take is also worrying about image types was a temporary 8.0.0 to 8.0.1 issue.