Cross-Platform SIMD / Paralel?


Not sure how to expose this, but my old Wusik Station code is mostly Assembly with SIMD instructions. What I do is process blocks of 4 voices on each call, as the 32 bit SIMD registers (SSE) can handle 4 numbers at once. But, I wonder, is there a better way to handle this so my code is cross-platform even on machines without SSE/SIMD/compiler-Assembly-support?

I don’t know exactly what I’m looking for, so please, be patient with me on this. :oops:

I remember reading about paralelization, but I couldn’t understand how to make this work with my 4 voice-block code. Would this be the trick on making a code that is not tied to one type of cpu/compiler?

I could share my filter code here made with assembly if that helps.

Thanks for now.


Well, any machine has SSE nowadays so I wouldn’t bother much.

However you should looks at intrinsics instead of inline assembly

as you can’t use assembly code directly when building a 64bits version with Visual c++ for example.

In any case, you can check this

SSE and multithreaded in order to use multiple core are really two different things.


Thanks, I mean SSE, not multi-core. But what I wonder is will the code also be able to compile for targets without SSE? For instance, iPhone? That’s what I was trying to explain but couldn’t. :wink:


You may want to look into creating your own vector library, where you use conditional compilation to choose how each platform implements SIMD.

For example, I have a lightweight vector library, that calls the Apple Accelerate framework on OSX. Right now, it defaults to unoptimized C++ on Windows, but I am going to wrap the relevant Intel Performance Primitives in the next few days. This will give me vector operations that work on Macs (G4, G5 and Intel) and Windows.

Sean Costello


I wonder if anyone had news on this. :wink:



What sort of news?

As an update: I ended up not using the Intel Performance Primitives. I didn’t have a lot of operations, and it ended up being less time (and money) to use the Microsoft intrinsics in VisualStudio, and keep my Accelerate framework code for the Mac.

With regards to your original question, having 4 parallel voices is probably the smartest way of using SIMD on the PC and Mac. I don’t know about the parallelization of the ARM used in the iPhone - it might be 2-way SIMD. It gets trickier if you have parallel signal processing blocks, and you also want to process blocks of samples at a time.

Sean Costello