Feature request: Custom Memory Allocator


#1

Hi Julian,

Since I am working on a commerical game engine based on JUCE and I would like to implement a system to track and optimise the memory usages in every part of the engine (similar to this http://jfdube.wordpress.com/2011/10/06/memory-management-part-2-allocations-tracking/).

So it would be nice if JUCE add the support of custom memory allocator (or at least for string, containers, and memory blocks).

 

Regards.


#2

Good request..

You know that there are already tools that may help to track it though? Certainly on OSX Xcode has a very good memory analysis tool.


#3

Could this also be used by the JavascriptEngine?

Using a preallocated memory block for all runtime script allocations would make it safer for real time stuff.


#4

Yes, there already exists good memory analysis tool such as Xcode and Intel Vtune, and they do a nice job in memory analysis; however,  it is not possible to optimize the memory usages without custom memory allocators.

For example, when a game-level is loaded, there will be a large amount of temporary objects / arrays construction and destruction, which will lead to memory fragmentations if the default new / delete operator is used. A possible solution is pre-allocating a block of memory and then assign it to those temporary objects / arrays; and custom memory allocator can achieve the solution easily.

Actually, I can contribute this feature if you are currently busy on other tasks :-D.

 


#5

Thanks - feel free to have a go!

The tricky bit would be to find a way to add a custom allocator without compromising the existing classes like String, Array, etc by breaking existing code that uses them or bloating them with extra internal pointers that most people will never use.


#6

well is this a real a problem in practice,  did you measure it?

memory fragmentations? mmhhh

One thing i noticed is that with windows Multi-threaded applications, all threads using the same CriticalSection for locking the HEAP Memory... 

 

Maybe you want to have a look at:

https://github.com/jchkn/ckjucetools/blob/master/CachingAllocaterMultiThreaded/CachingAllocaterMultiThreaded.cpp

 


#7

Chkn - does overloading the global new and delete catch all allocations on the free store?  

I was after a way of checking that I'd managed to banish all free store stuff from the audio thread, so it'd be very useful if that's the case...I could add a little logger...


#8

Dear all,

After some studies on the source of the JUCE libray, here is my draft implemetation:

https://github.com/ming4883/JUCE/compare/julianstorer:master...master

with the following changes:

  • A new option JUCE_ENABLE_CUSTOM_ALLOCATOR (disabled by default) is added to the juce_core module.
  • A new class CustomAllocator is added to store the context and function pointers of the custom allocator. Please notice that in additional to the numOfBytes, allocate() function also takes the arguments: flags and desc, which provides more information about allocation (e.g. from which module, temporary allocation vs long-term allocation)
  • 3 macros jmalloc(), jfree(), jnew() are added.
  • The override of global operator new / delete are added to intercept all non-jnew allocations.
  • Modified the HeapBlock class to use the jmalloc macro.
  • The juce_Component.cpp is modified to test and demonstrate the usage of the jnew() macro.

Todo:

  • Modify all source to use the jnew() macro.
  • A default custom allocator which can save the allocation-report to a csv file.

#9

I appreciate the effort, but I'm afraid that I can't see anything in that implementation that I could use, even as a starting point.. Using macros for malloc or using a macroised form of 'new' are definitely out of the question.

There's a lot of work going on in the C++ standard for allocators in the std library, so it would make more sense to use those classes, or at least read about it and use the same approach, since a lot of very smart people have spent a lot of time thinking about it.


#10

Never mind, I know it is really challenging to find a good approach for this topic :-)

While studying the source, I notice that most of the JUCE classes use the HeapBlock for memory allocations, what about if we modify the HeapBlock to accept one more template argument: Allocator


struct HeapBlockDefaultAllocator {
  static void* malloc (size_t sz) { return std::malloc (sz); }
  static void free (void* p) { std::free (p); }
};
template <class ElementType, bool throwOnFailure = false, class Allocator=
HeapBlockDefaultAllocator
>
class HeapBlock ​{
public:
  explicit HeapBlock (const size_t numElements)
  : data (static_cast<ElementType*> (Allocator::malloc (numElements * sizeof (ElementType)))
  {
    throwOnAllocationFailure();
  }
  // other implementations...
};

If users would like to hook most of the memory allocation, they can simply change the default argument of Allocator in their repository (and override the global new /delete operator ) with the following pros:

 

  • It is easy to merge with further updates from the original JUCE repository.
  • This changes is minimal and doesn't break any existing API.
  • The use of std::malloc (which cannot be intercepted) becomes optional.

#11

That's certainly better, but would involve adding another template parameter to all classes that use a HeapBlock and want to provide custom allocator.

Perhaps there's something that could be done using template overloads, e.g. having a global class like

template <typename Type>
struct DefaultAllocator
{
    static void* allocate (..) ...etc
    ...etc
};

Then if classes like Array called DefaultAllocator<Array>::allocate(), users could then declare their own special cases before including the juce headers, e.g.

template <>
struct DefaultAllocator<Array>
{
    void* allocate (...) ...special case
};

But honestly, have a read about the way the std library are doing this - the most sensible approach is probably just to use the C++11 classes rather than re-inventing the wheel.

 


#12

Sorry for this late reply.

I agree that we should avoid re-inventing the wheel, and the std::allocator is a good design and it supports scoped allocators in the c++ 11 standard. However its interface does not fits the usages of HeapBlock quite well, so I came up with the following design:

To goal of this design is to add flexible memory allocator support to any HeapBlock related class with:

  • scoped allocator model similar to those in the c++11 standard http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2008/n2554.pdf
  • maximized compiler compitablity (make use of c++ templates as simple as possible)

First we start with the HeapBlockAllocator class:

// This class is copyable and comparable (==, !=)
class HeapBlockAllocator
{
public:
    typedef void* (*AllocateCallback)   (void* /*context*/, size_t /*numOfBytes*/, bool /*initialiseToZero*/);
    typedef void* (*ReallocateCallback) (void* /*context*/, void* /*ptr*/, size_t /*newNumOfBytes*/);
    typedef void  (*DeallocateCallback) (void* /*context*/, void* /*ptr*/);

private:
    void* context;
    AllocateCallback allocateCallback;
    ReallocateCallback reallocateCallback;
    DeallocateCallback deallocateCallback;

public:
    // other helper functions used by HeapBlock and related-containers
};

The HeapBlockAllocator stores a void* context pointer and 3 callback function pointers which makes it copyable and comparable (operator == and != ). It is an important property in implementing the scoped allocator feature.

Then we modify the HeapBlock template to store an instance of HeapBlockAllocator and modify its constructors to accept any existing instances of HeapBlockAllocator.

template <class ElementType, bool throwOnFailure = false >
class HeapBlock
{
public:
    HeapBlock(const HeapBlockAllocator& alloc = HeapBlockAllocator::crt()) noexcept  : allocator (alloc), data (nullptr)
    {
    }
    // const HeapBlockAllocator& alloc = HeapBlockAllocator::crt() are also added to other constructors of HeapBlock
    
private:
    HeapBlockAllocator allocator;
    ElementType* data;
};

Since std::is_constructible<> is not available in older compilers, instead, we use a type traits UseHeapBlockAllocator<> to distinguish whether a type is supporting scoped allocator or not:

template<typename T>
struct UseHeapBlockAllocator { static const bool value = false; };

For any containers or objects would like to support scoped allocator, they should specialise the type traits.

template<typename T, bool B>
struct UseHeapBlockAllocator< HeapBlock<T, B> > { static const bool value = true; };

template<typename T>
struct UseHeapBlockAllocator< Array<T> > { static const bool value = true; };

The implemetation details is avaible here https://github.com/ming4883/JUCE/compare/julianstorer:master...master

For testing and demonstrating purposes, I have modified the juce::Array<> to support HeapBlockAllocator; and added a simple unit test on scoped allocator.

Please comments .


#13

Very nice, this discussion (which is beyond my C++ knowledge), but, er, have you actually done any proof of concept, in the sense that another mem manager might give any significant improvement? Why this discussion without any hints of what you think you can accomplish with it? Please first give some POC results first.


#14

Hi Peter,

Thanks a lot for your advice.

Here are what I am planning to do:

  • Implement a memory-pool allocator.
  • Add allocator supports to some of the JUCE containers (potentially HashMap, SortedSet) and conduct performance testes.

I will post the results here once I complete those testes.


#15

I appreciate the effort, but before you go too far down this path and think this will end up in the main repo, there's really no way I can bloat the HeapBlock class with any extra member variables. I've resisted even adding a size member to that class - it's strictly a wrapper around a pointer.

Allocators are a tricky balancing act between adding flexibility and adding unnecessary bloat for the 99% of code that doesn't need them. And a better approach might be to create alternatives to HeapBlock and allow it to be replaced by them rather than trying to augment HeapBlock itself. But to be honest, I think you might find that it's all a waste of time - in a class like Array for example, why would you expect your own allocator to do a better job of allocating randomly-sized blocks than malloc does? A more common use-case for an allocator would be in the overloaded new/delete of a class where you have a smart pool that can recycle instances of that specific object, in cases where they're very being heavily created and destroyed.


#16

Hi jules,

As I would like to write an game engine based on JUCE, the following blog posts explains why I am so eager to need custom allocator support in strings and containers: 

  • http://bitsquid.blogspot.hk/2010/09/custom-memory-allocation-in-c.html
  • http://molecularmusings.wordpress.com/2013/01/29/memory-allocation-strategies-a-growing-stack-like-lifo-allocator/

Besides, there are other reasons like: aligned-memory allocations to fit the SIMD pipelines, data oriented design ... etc. which are really important in modern games / high-performance-applications development.

Since JUCE already comes with different kinds of well-tested containers (arrays, hashmap... etc) and string functions (and many other facilities), I really don't wants to re-implement them all just because the lacking of custom allocator support.


#17

Having done a lot of realtime apps and learned a lot about games programming, I think you're heavily into premature optimisation territory with your thinking on this.

By far the biggest issue for performance with games on modern CPUs is efficient use of the cache, and that's not something that you can magically fix by changing an allocator in your existing code - you need to re-structure your data model entirely to get patterns that are optimised for sequential cache access. And even in a games engine, the majority of your code isn't going to present a problem and won't get any measurable benefit from a faster allocators, and the bits that would benefit are often highly customised classes, not just an Array. My gut feeling would be that if you profile your app and find that some allocations in Array are a real bottleneck, then you almost certainly need to fix the whole algorithm, not blame the allocator itself. 

I saw a good presentation at cppcon this year that might be helpful: 

http://www.youtube.com/watch?v=rX0ItVEVjHc


#18

Thanks a lot for your information and advices :-)


#19

Boring start. Amazingly useful middle.  Aggressive end. 

That was good.

Any other top talks? 


#20

There's still a lot to go up but some of the good ones I can remember are:

 

 - Herb Sutter - "Lock-Free Programming (or, Juggling Razor Blades)"

https://www.youtube.com/watch?v=CmxkPChOcvw&list=UUMlGfpWw-RUdWX_JbLCukXg

Gives a pretty good overview of Lock-free basics. (For a more in-depth example also take a look at Tony Van Eerd's "Lock-free by Example" which doesn't fully solve the problem he discusses but highlights why locks aren't actually so bad)

 

- Andrei Alexandrescu  - "Optimization Tips - Mo' Hustle Mo' Problems"

https://www.youtube.com/watch?v=Qq_WaiwzOtI&list=UUMlGfpWw-RUdWX_JbLCukXg&index=7

Quite low-level tips but some intersting good-practice ones none the less (use of 0 and top loading class members are the two that spring to mind)

 

- Pablo Halpern - "Decomposing a Problem for Parallel Execution"

https://www.youtube.com/watch?v=Ej97699t-G0&list=UUMlGfpWw-RUdWX_JbLCukXg

Not really for everyone but probably the best talk on parallelism I saw. Some interesting things to come in future years..

 

- Herb Sutter - "Back to the Basics! Essentials of Modern C++ Style"

https://www.youtube.com/watch?v=xnqTKD8uD64&index=46&list=UUMlGfpWw-RUdWX_JbLCukXg

This is quite a high-level round up talk but makes some good points. It also finally converted me to auto.

 

- Scott Meyers - "Type Deduction and Why You Care"

https://www.youtube.com/watch?v=wQxj20X-tIU&list=UUMlGfpWw-RUdWX_JbLCukXg&index=41

This was the first talk of the conference and although quite a dry subject is probably important for C++11/14/17. There is also a very important poll in there!

 

Two that stick in my mind but aren't up yet are "Lawrence Crowl - The Implementation of Value types" and Chandler Carruth's. Chandler's is a bit ranty (in a fun way) but has some solid points for performance, especially why using supposed performace data-structures may actually be a hinderence. To be honest though his 2013 BoostCon "Optimizing the Emergent Structures of C++" talk goes into a bit more detail of why all these value types are good.

 

There were over 100 sessions though so if anyone else watches any good ones please post!