It seems that it is important to specify the enhanced instruction set when using

At least for Avx2, by including it I’ve gotten an incredible double speed boost.

From Projucer it can be done by adding the extra flag compiler /arch:AVX2 (for VS)

Given how important it is, maybe it should be included as an option in Projucer, in any case I’m writing the post in case someone finds it useful (or in case someone shows up saying: “Noooo, never do that” :joy:)

You should be aware that there are CPUs out there that don‘t support AVX2 or AVX. If you compile your code with that flag, it will simply crash on those machines.

More recent CPUs will probably support it and there are applications or plugins out there that simply require it, that‘s a valid bussiness decision in the end. But you probably want your installer to check the capabilities of the machine to avoid that a user installs software that will crash.

One last note: AVX2 mainly implements integer math, while AVX supplies floating point instructions. So AVX should be fine for most dsp applications.

True, Avx is enough. It is experimental and I do not plan to release it at the moment. If necessary I will include a check and a message or a notice of requirements, but that’s all since this will necessarily use the Avx instructions.

I have found this interesting article, although it is more than 10 years old, maybe this is already solved, I don’t think a current processor will have problems

You can’t just throw /arch:AVX to speed up your program

While searching around for some AVX docs, I happened to find a blog post on Intel’s website describing how to optimize an image processing routine. The gist of the article was that you could get big gains just by throwing some VC++ compiler switches such as /arch:SSE2 or /arch:AVX to tell the compiler to use vector instructions. Presto, your code magically gets faster with less than an hour of work and without having to modify the algorithm!

Of course, my next thought was: “Yeah, until QA gives you an A-class bug the next day saying that the code now crashes on an Athlon XP or Core i7.”

The documentation for the Visual C++ compiler /arch compiler switch is labeled “Minimum CPU Architecture,” but should probably emphasize the ramifications of this switch. If you use this switch, your code will crash on any CPU that doesn’t support the required instruction set. Unlike the Intel compiler, which has options to auto-dispatch to different code paths depending on the available instruction set, the VC++ compiler will simply blindly generate code for the target CPU. Therefore, you can also reinterpret the switches as follows:

  • /arch:SSE: Generate code that crashes on an Athlon.
  • /arch:SSE2: Generate code that crashes on an Athlon XP.
  • /arch:AVX: Generate code that crashes on a Core i7.

This is not to say that the /arch switch is bad, as the compiler does actually generate faster code when it can use vector instructions. The problem is that unless you can absolutely guarantee that your EXE or DLL will never run on a CPU lower than the specified tier, you can’t use those switches. Okay, so /arch:SSE is probably pretty safe at this point, and you may be able to justify /arch:SSE2. You’d be insane to throw /arch:AVX on your whole app unless you really want to require a Sandy Bridge or Bulldozer CPU (which, as of today, only one of which has shipped).

Windows 7 dropped support for CPUs without SSE2.
SSE2 support is required for Windows 8 and newer.

Section 3.0 - Minimum hardware requirements for
Windows 10 for desktop editions
This section provides detailed hardware requirements that apply to any device that runs
Windows 10 for desktop editions. See Table 2 for the list of devices that can run Windows 10 for
desktop editions. For additional component requirements that may also apply, see section 6.0.
Note Throughout this specification, all requirements for Windows 10 for desktop editions also
apply to Windows 10 Enterprise.
3.1 Processor
Devices that run Windows 10 for desktop editions require a 1 GHz or faster processor or SoC
that meets the following requirements:
• Compatible with the x86* or x64 instruction set.
• Supports PAE, NX and SSE2.
• Supports CMPXCHG16b, LAHF/SAHF, and PrefetchW for 64-bit OS installation

Source: Windows minimum hardware requirements

I’m pretty sure that SSE2 shouldn’t have any problem with setting the flag. Anyway I’m going to continue with AVX. In the worst case you could release both versions of the DLL or EXE with the different sets of instructions, and problem solved.

The speed that is obtained by activating the flag is so amazing that it is now hard for me to deactivate it. I definitely don’t want to ever disable it :grin:

I think you’ll be ok. Bulldozer and Sandy Bridge are 11 years old. There are lots of things that would run awfully on something older than that.