Question about Unicode Cache Allocation In Juce8

While working on converting a plugin to Juce8 (8.0.1 develop on Windows), I see a curious large allocation in the Visual Studio Memory Allocation Debugger. After launching and once the UI has been drawn, 11.7 megabytes are used in a LruCache that seems to cache Unicode String Layouts. It is created in juce_Unicode.cpp around Line 96:

   static Array<Codepoint> performAnalysis (const String& string, std::optional<TextDirection> textDirection = {})
   {
       if (string.isEmpty())
           return {};

       thread_local LruCache<Key, Array<Unicode::Codepoint>> cache;

       return cache.get ({ string, textDirection }, analysisCallback);
   }

My plugin only draws 30 strings and I see 30 entries in the LruCache. Is this wrong reporting by the memory allocation debugger or is something strange going on? This is now by far the biggest allocation I see when running my plugin and I don’t understand why it’s even needed.

I checked ProJucer and am seeing the same thing:

Does anyone know why this gets so large? The information contained does not seem to use that much space.

Looking at it more, it seems the memory doesn’t really stay reserved. However, it still appears that a lot of reallocating is going on when the LruCache is filled on the initial draw. Maybe juce::Array and LruCache don’t work together very well? A part of it might be the fact that juce::Array starts with size 0… so reallocation is guaranteed to happen in this case.

Are you asking out of curiosity, or are these allocations causing problems in your project?

The cache you mentioned is used to store analysis results for unicode strings. This analysis data is primarily used when re-flowing strings, e.g. during dynamic resize. My understanding is that recomputing this information can be costly, and may be too heavy when resizing text areas at high frame rates. With that said, I haven’t profiled this myself, so I can’t say definitively that it’s necessary in practice.

We have some changes in progress that reduce the size of Unicode::Codepoint (and therefore this cache) - perhaps we can look at removing the cache completely as part of this change.

This stack trace is pointing to a different allocation. In order to lay out Unicode text, we need to store a table of information about each unicode character, including line-breaking characteristics, text direction, script kind, and category. This table is stored in compressed form in the binary, and then decompressed once on first read. Other Unicode-aware software also bakes in similar information, e.g. CPython has this set of tables.

It’s just me looking at how memory usage changed in Juce8. I found this when looking for memory issues when I got the weird native alert boxes last week :wink: .

Ah ok, that makes a lot of sense! Once again I got confused by a chain of lambda calls.

Thank you for the clarification.