Intel IPP - free Community licence


#1

Hi everybody,

I just discovered the Community licence option for all Intel DSP libraries.

If i understand it correctly it means you get the latest version for free and are allowed to use it commercially (but maybe someone with more knowledge can chime in on this)

this should shorten the waiting time for the juce_dsp module...


#2

I didn't know about this, I'm going to have a look, thanks for the info !


#3

Ok I registered at Intel anf replaced the FFT routine from the wdl convolution engine with the IPP version with an incredible performance boost on windows.

I would suggest to enable drop in replacement per preprocessor defines for IPP routines for the juce_dsp module - whenever its finished :)


#4

Awesome find, chrisboy! Looks truly free to me, it just comes without premium support. I always would have liked to use these awesome libraries, but the price was just too high in the past. Hopefully no one finds some hidden legal problem.


#5

Well the only drawback I can find so far is that you only have access to the latest version - but I can't think of a scenario where this can get a problem. 

But this FFT really rocks. It transforms the WDL convolution from unusable to ... usable :)


#6

Hi, 

 

Could you explain how to use intel IPP libraries and integrate them in Juce??Are they in a .dll way or so??

I have checked intel site, but they say they are not for free...

 

regards

 

Carlos


#7

Yes they are not for free, you have to buy them from Intel. Once compiled, they are just DLL files with names something like ipp****.dll

A quick check of the intel site reveals this comprehensive guide:

https://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-for-windows-deploying-applications-with-intel-ipp-dlls

The only difference is that you would specify the include paths/library paths in the Introjucer project instead of the Visual Studio property pages.


#8

https://software.intel.com/en-us/articles/free_ipp

Purchasing is only necessary if you want access to Intel® Premier Support (direct 1:1 private support from Intel), older versions of the library or access to other tools in Intel® Parallel Studio XE or Intel® System Studio.

And you can also statically link the IPP library if you want to avoid keeping track of another dll. If you install it properly, you get a drop down option in VS2013 (in the Project Property window) with different options (multithreaded / single threaded / dynamic linking / static linking).

I didn't even had to fiddle around with include paths - it was a strange experience that everything worked out of the box :)


#9

Awesome, I didn't know that! Last time I came across IPP (many years ago) it was less easy to use.


#10

Even Intel resellers didn't know it was free! I phoned an Intel reseller in th UK a couple of weeks ago and they said it was around £1K GBP plus 20% per year maintenance. Then someone here pointed out the community license!

It's super easy to to use. The Math Kernel Library is "free" too (so you can do other vectorised maths ops).

And yes the FFT stuff is fast. I've been using Apple's vDSP for years and IPP is about the same speed.


#11

How does this work with Xcode and VS2015 ? Anyone tried that?

Cheers,

Rail


#12

Yes -- works fine with both! :)

Best,
Stian


#13

Very fine with both


#14

Thanks guys!

Rail


#15

Hi,
IPPFIRSR kills all other FFTs for simple multiplication of a stream with a set impulse. The maths functions are super fast.
The hardest thing about IPP is setting it up.
As a novice, I think a simple tutorial or video tutorial on how to install IPP for mac and PC would be great. Eg step 1. install IPP step 2. open terminal and type… This would really help users.
The initializing of an FFTs is a little awkward, but simple to follow with an example.
I would love IPP as part of the JUCE math and FFT. I am on IPP v8. If someone could list out all the functions you want to replace or add, I am happy to give you the IPP equivalent. A hash define IPPS replacement of some elements of the JUCE code would be great. It would be better for trouble shooting to be able to turn the hash define IPP on and off. When IPPS crashes there is not much debugging info.
A real programmer can check my work, but could you give us a structure if you would like us to start writing some hash define replacements with
zero(memset),
add,
addproduct,
mul,
div,
mulc,
divc,
addc,
rms avarege,
fttforward ,
FIRSR(fft with a set impulse),
fttforward,
fttforwardCCS to magnitude with a hz to bin address system,
fftinverse,
windowing functions,
sampleup(add zeros),
sampledown,
dotproduct,
Min,
Max,
Threshold(change a value to a constant above or below a threshold),
set(like memset, but sets to a constant).
The list goes on. This would really boost JUCE.
Thanks
lpb


#16

The hash define would also stop licencing problems. You would only turn IPP on if you downloaded and installed it yourself.
Someone with experience can give IPP replace for biquad and random noise. I have not used these yet.


#17

It would be indeed really nice to have the FloatVectorOperations class choose the IPP routines if the IPP library is available… On OSX there won’t be a big performance leap because of vdsp but on windows it would make a difference.

Jules, if we supply you with the code (its boring boilerplate stuff) would that be something you would incorporate into JUCE?


#18

Sure, if it’s straightforward to do that. Would need to make sure there’s no runtime overhead in choosing IPP, but if it could be selected at compile time, that’d be something we could probably use.


#19

I made some quick tests to benchmark the IPP routines vs the FloatVectorOperations by running this loop which contains some random vector operations.

The juce::FloatVectorOperations loop:

for (int i = 0; i < LOOP_LENGTH; i++)
{
    FloatVectorOperations::fill(d, initialisers[i], NUM_SAMPLES);
    FloatVectorOperations::multiply(d, multipliers[i], NUM_SAMPLES);
    FloatVectorOperations::add(d, 2.0f, NUM_SAMPLES);
    result[i] = FloatVectorOperations::findMaximum(d, NUM_SAMPLES);
}

The IPP loop:

for (int i = 0; i < LOOP_LENGTH; i++)
{
    ippsSet_32f(initialisers[i], d, NUM_SAMPLES);
    ippsMulC_32f_I(multipliers[i], d, NUM_SAMPLES);
    ippsAddC_32f_I(2.0f, d, NUM_SAMPLES);
    ippsMax_32f(d, NUM_SAMPLES, result+i);
}

You see that it looks pretty much the same (the only semantic difference would be not having the result as return value because every IPP operation returns a IppResult object.

initialiers, multipliers and result are float arrays allocated on the stack (the result will be logged to make sure the compiler doesn’t optimize this away).

d is the data buffer. I made tests with NUM_SAMPLES = 88200 and LOOP_LENGTH = 2049. I also made tests using unaligned (?) data allocated from an AudioSampleBuffer as well as using specially aligned data from the IPP allocator.

My machine is a Macbook Pro i7 2,3GHz 2012 running Win7 under Bootcamp (OSX is not interesting, because this is not supposed to be a vdsp / IPP shootout). Testing was done with x64 (of course Release build)

Results

Unaligned Data IPP: 134 ms
Aligned Data IPP: 98ms

Unaligned Data JUCE: 130ms
Aligned Data JUCE: 130ms

So using IPP allocated data with the IPP routines yields a perfomance gain of about 25% (it may be higher for more complex operations than multiply and add). If the operation is used on a unaligned data buffer, IPP performs a little bit slower (while the JUCE operations don’t seem to care about the allocation type).

Conclusion:

To really benefit from IPP, we really need to make sure that the data is IPP allocated. Using the IPP routines on AudioSampleBuffer data is only useful if it allocating it elsewhere and using AudioBuffer::setDataToReferTo.


#20

I assume the code could go along side the non IPP/AMD alternative using #if HAS_IPP. Here are some code that can be replaced. 32s is an int.
———————————————————
Type findMinimum (const Type* data, int numValues)
could be

ippsMin_32s(data, numValues, &result);
ippsMin_32f(data, numValues, &result);
ippsMin_64f(data, numValues, &result);

———————————————————
findMaximum (const Type* values, int numValues)
or for peaks there is a MaxAbs(or i.e. MaxAbs[{8.0,-15.0,3.0}]=15.0
could have

ippsMax_32s(data, numValues, &result);
ippsMax_32f(data, numValues, &result);
ippsMax_64f(data, numValues, &result);
———————————————————
findMinAndMax (const Type* values, int numValues, Type& lowest, Type& highest)
is
ippsMinMax_32f(const Ipp32f* pSrc, int len, Ipp32f* pMin, Ipp32f* pMax);
———————————————————
There are a whole lot of threshold functions that are very different to JUCE threshold tests. Anyone got any thoughts? I love the IPP threshold functions.
———————————————————

For anyone happy to contribute some other obvious useful non FFT functions are below. It would be great to have non IPP/AMD alternatives.

ippsNorm_L2_32f(const Ipp32f* pSrc, int len, Ipp32f* pNorm);
is the function is a RMS of the source over the length passed to the norm
———————————————————
ippsDotProd_32f(const Ipp32f* pSrc1, const Ipp32f* pSrc2, int len, Ipp32f* pDp);
is the dot product of src1 and src2 over the length passed to pDp
———————————————————

ippsSampleUp_32f (const Ipp32f* pSrc, int srcLen, Ipp32f* pDst, int* pDstLen, int factor, int* pPhase);
factor-1 is the number of zeros inserted
pPhase is the address that the pSrc starts on, i.e. 0 is the start, like in an array.
———————————————————
ippsSampleDown_32f (const Ipp32f* pSrc, int srcLen, Ipp32f* pDst, int* pDstLen, int factor, int* pPhase);
factor-1 is the number of samples removed
pPhase is the start address that the pSrc keeps, i.e. 0 is the start, like in an array.
———————————————————
I can provide a FFTforwardCCSToMagnitudeDb where the input is pSrc, an input length and an array of bin(hz) values. It would return an array of dB outputs for each hz values. I thought the input could be the bin values, but defined as HzValueRequired[index]= HzToBin(inputHz); The db could be a flag. The Hz conversion would be better in the constructor unless needed elsewhere.

The FFT FIRSR is easy, but it would be better if the constructor was hidden and appeared to only use impulse and impulse size. We could hide the spec, etc. I think the only function inputs should be block stream and block stream size. I can provide working code for FIRSR, but not in this neat format.

FFT, iFFT, PolarToCart and CartToPolar are easy, but yet again it would be good to hide the spec, etc.
Thanks