Compile auto optimizations on Win and MAC?

Guys, my latest project is running great on both Windows and MAC, BUT, I need to optimize the code somehow as in some simple presets, just one note is eating 30% of CPU on most machines, and that's not good...

So, my options:

1) better compiler settings on MSVC 2013 Express and Xcode 5

2) SSE cross-platform code?

3) what else?

Any ideas?

Thank you all in advance!

Best Regards, WilliamK


MSVC 2013 Express doesn't come with the profiler tools. :-( I have to check XCode, but sadly my MAC machine is having some problems right now...

VerySleepy works well for Windows.

As much as I like your suggestions, I already do manual profiling, I don't think it will benefit me much further. If you want, you could take a look at the Single OSC code (is open source), the main problem is to really speed it up by using some other methods. I'm still thinking on SSE, but I could be wrong...


SSE is already cross platform if you only plan to do Windows and OSX

Just use intrinsincs to avoid issue with inline assembly not supported in LLVM


check IPP

Gotcha, thanks bud. :-)

What do you mean by "manual profiling"?

There really is no substitute for a proper profiler, there's no way you could add performance counters in every method of your code. The whole point of profiling is to show you exactly where CPU time is eaten up. This is very often in a place you wouldn't think to look or could show up some simple mistake you made. Improving profiler hotspots is really the only way to improve performance, how else would you know that you've actually improved it?

Just glancing at your code, the first thing I would do is DRY it up a bit ("globalInfo->events[counter]" is repeated 15 times in about 20 lines). At least then you will have less code to look at and get more of a sense where time is being spent.

As a simple example I noticed one of my list boxes was a bit sluggish on iOS so I thought I'd improve the graphics performance a bit. Launched it with a profiler attached, scrolled the list backwards and forwards a couple of times and then reviewed the result. 50% of the time was spent drawing buttons. I realised the buttons were actually behind some opaque labels that I'd forgotten to hide. 3 buttons per row over about 20 rows made for a lot of button repaints every frame. I made them show and hide themselves when necessary and got all that CPU back, took me about 5 minutes in total. Probably wouldn't have noticed that for ages otherwise.

Indeed, thanks bud, I will check this out more tomorrow, a bit in a bad mood today. ;-)

+1 to what dave said. Compilers and CPUs are too complex to guess what they're doing, the only way to know is to measure it with a real profiler.

If you don't profile, or if you only profile parts of your app, then you WILL get it wrong.

Thanks for the VerySleepy tip Jules! Great tool!

And another thumbs up for IPP, great for FFTs! (a bit expensive if you need it for both OSX and Win, but a good investment).

The main problem I see right now is the module call to process a single sample. Since Wusik 4000 is modular, modules can feedback one-another, so we can't just select to process multiple-samples per module call. One idea is to make a 2 or even 4 sample block per module call and handle it somehow. I did this for Wusik Station V1 SSE Code and it works great, reduces a lot of overhead from calling each tick/nextframe multiple times per sample-block...