JUCE, slower on Mac/Linux

Hi, i’ve written a program that draw musicale partitions. The drawing stuffs are done in Frame Buffers with the AGG libraries. The GUIs are created with JUCE.

On Windows the program is very very fast but on Mac/Linux it’s slow. The code is the same from one platform to another, I don’t think the problem could come from AGG because it only uses pure C++ and doesn’t use OS specific functions.

So I wonder if other users experienced this problems ? Is juce known to be slower on Mac/Linux ? The program does the blitting of an image buffer very frequently, perhaps the problem comes from the different OS blitting functions ?

I’ve also tried to change the niceness of the program on Linux but it’s not really quicker.

So, how can I optimze speed on Linux/Mac ?

Thanks.

Bouba

Ps: Sorry if it’s difficult to understand me, i’m french :wink:

juce is also pure c++ of course, so in that respect it’s no different to agg. The mac is a bit slower at updating the screen, but not much different - as you can see from the demo app. More likely it’s some aspect of your build settings.

You’re not trying to run a PPC binary on an intel mac, are you? That’d be slow.

P.S. why did you choose agg to draw musical notes? Juce’s path rendering could do that perfectly easily, and could give you lots of handy shortcuts like using Drawable objects to store all your note shapes, etc.

I can see why people might use agg if they need some fancy feature that juce doesn’t have, but for simple black-and-white shapes it seems like doing things the hard way!

On MacOSX I’m running the program on a PPC processor, i’ve not tested again the code on a Mac/Intel.

I’ve choosen AGG because it is specialised in vectorial drawing and implements functionality that are not in JUCE (I’m using texturing, Alpha Blending, multiple buffer formats, dash generation, draw texts along curved paths, etc…).

Also, i’ve make Bencmarks of vectorial drawing APIs and it seems that AGG is quicker and as it’s said on the antigrain web site the quality is very good. So I prefered to use a library that is specialized in vectorial drawing and allow me to build my Rendering Pipeline as I want.

For my problems on Mac/Linux it’s perhaps only that the way to manage process isn’t the same as on Windows. The program consume a lot of CPU ressources when I move the partition or zoom on it. [/img]

I see. Well the rendering process is exactly the same on all platforms, the only difference is the blitting to the screen that happens at the end.

PPC macs are very slow compared to intel, though… Are you testing both platforms with an optimised release build?

You’re right, i’m not sure, i reboot on Linux immediately to verify that.

Ok, I inserted optimization flags and it’s better now, but still less fluid than one windows.

The flags i’m using are those:

MAKEOPTS="-j3"
CPPFLAGS = -march=nocona -O2 -pipe -D “LINUX=1” -I (AGG_INCLUDE_DIR) \ -I (AGG_FREETYPE_INCLUDE_DIR)
-I (JUCE_INCLUDE_DIR) \ -I (FREE_TYTPE_INCLUDE_DIR)
-I (CIMG_INCLUDE_DIR) LDFLAGS = -L(LIBDIR) -s -L “/usr/X11R6/lib/” -lfreetype -lpthread -lX11 -lasound -ljuce -lXinerama

The difference between O2 and O3 is’nt visible, also I’ve read that O3 optimize speed only on very specific binaries so I prefer to put O2.
The Juce Library used is the release one.

Do you know other solutions to optimize the binary on UNIX ?

I’m not really much of a gcc expert, so can’t think of anything to help there…

Ok, anymore you’re help has been very useful, the program is faster than before with those new flags. I’ve also realized that it could be possible that the kernel I’m using isn’t the right one for a dual core CPU. I’ll test with an SMP one and perhaps that the perfomances will be better one Linux.

Thank you very much for the help.

Bouba

It’s me again, i’ve tested the code with an SMP kernel but the program isn’t more fluid. So I just suppose that the blitting functions are a bit slower on Mac (HIViewDrawCGImage()) and Linux (XPutImage()). Also, I’ve read in the
blitToContext() function of the juce_mac_Windowing.cpp, is it this function that actually does the image Mapping on MacOS ?

[quote=“bouba”]It’s me again, i’ve tested the code with an SMP kernel but the program isn’t more fluid. So I just suppose that the blitting functions are a bit slower on Mac (HIViewDrawCGImage()) and Linux (XPutImage()). Also, I’ve read in the
blitToContext() function of the juce_mac_Windowing.cpp, is it this function that actually does the image Mapping on MacOS ?[/quote]

Yes, that’s what draws it. I wouldn’t have thought it made much difference though, as blitting is likely to take up far less time than rendering, especially if you’re drawing complex stuff.

Ok,I think (even if not totaly sure) that the blitting could be accelerated using the Quicktime BitBlitting functions.
In the Juce this the HIViewDrawCGImage() witch does the image mapping. In the documentation (http://developer.apple.com/documentation/Carbon/Conceptual/HIViewDoc/HIView_tasks/chapter_2_section_11.html) they mention that this function use the QuickDraw function to draw. I think that the function uses indirectly the CopyBits one.

I’ve found an article witch explains that CopyBits is a bit slower that the QuickTime BitBlitting functions http://developer.apple.com/quicktime/icefloe/dispatch008.html. So perhaps the drawing could be accelerated on MacOS using those QuickTime functions.

Before spending days fine-tuning the blitting, have you actually tried using a tool like Shark to see where the CPU is being used?

No, you’re right,as I not come from the Mac World I did not know Sharp, I will try that ok.

I don’t know if it would make any difference, but for linux, the blitting can be speeded up by using the shared memory extension for X. I think this would avoid sending the whole pixmap through a socket for each blit…
http://pantransit.reptiles.org/prog/mit-shm.html

[quote=“bouba”]Ok, I inserted optimization flags and it’s better now, but still less fluid than one windows.

The flags i’m using are those:

MAKEOPTS="-j3"
CPPFLAGS = -march=nocona -O2 -pipe -D “LINUX=1” -I (AGG_INCLUDE_DIR) \ -I (AGG_FREETYPE_INCLUDE_DIR)
-I (JUCE_INCLUDE_DIR) \ -I (FREE_TYTPE_INCLUDE_DIR)
-I (CIMG_INCLUDE_DIR) LDFLAGS = -L(LIBDIR) -s -L “/usr/X11R6/lib/” -lfreetype -lpthread -lX11 -lasound -ljuce -lXinerama

The difference between O2 and O3 is’nt visible, also I’ve read that O3 optimize speed only on very specific binaries so I prefer to put O2.
The Juce Library used is the release one.

Do you know other solutions to optimize the binary on UNIX ?[/quote]

you can try to use -O3 and pass
-funit-at-a-time -funroll-all-loops -fpeel-loops
to the compiler… some more is possible to gain from the compilation step, but sure is better to find out which are the bottlenecks in your implementation