Yes, this is with /fp:fast. I think what happens is that the compiler optimises away a full section of my intrinsics code, where I load -0.f and then do bitwise ops. It is easily fixed by loading uint constants instead, but as this is the only platform where I see this happening, I wanted to share this info as it was very hard to figure out where my problems came from because this only happens on fully optimised release builds.
I do not think the fast floating point mode should threat -0.f and 0.f the same, so maybe this is just an optimizer bug in msvc 17.12.4 that will be fixed in the future.
Again, a new problem has appeared trying to bring one of my plugins to Windows on arm. This time it is about Reaper and the .vst3 folder structure. Reaper is one of the few hosts with a arm64ec build available and I am able to load and run my arm64ec vst3 builds just fine - as long as I donāt use the .vst3 folder structure with the different binaries inside. But Iād very much like to use it.
I find that when the arm64 Reaper scans new plugins that come as vst3 folders, it attempts to scan the binary inside the x86_64-win subfolder - even if there are a arm64ec-win folder + binary present.
One of my plugins requires AVX on intel CPUs and therefore cannot be loaded using the current prism x64 emulation on an arm machine. This makes Reaper think my plugin is broken. Did anyone else run into this? The plugin runs and validates fine if I have only a arm64ec-win subfolder btw.
I wonder whether this is just a bug in Reaper or whether this will happen in other hosts as well.
I donāt understand how the correct folder is chosen if a .vst3 folder structure plugin is loaded. Does it happen in the SDK or does the host need to implement it?
Hmm, looking at how a .vst3 folder is loaded in the SDK, I see that the SDK does the right thing, first looking for arm64ec-win, then arm64x-win and finally for x86_64-win - if the SDK is compiled as arm64ec.
However, it seems that this does not happen if the host is running under Prism emulation. Btw, a Prism emulated host can load a single file arm64ec .vst3 plugin just fine. But if a plugin comes as .vst3 folder, it will never even look for an arm64ec-win sub-folder/binary.
Conclusions:
There is likely a bug in the arm64 Reaper plugin scanner resulting in first trying to scan the x86_64 binary.
The .vst3 folder structure is fundamentally broken for the current Prism situation and if I want to be able to let users load my arm64ec plugin into emulated x86_64 hosts (which is very desirable to me because of my avx requirement), Iāll have to ship as single file .vst3 using a separate installer step :(.
It seems unlikely the vst3 sdk will adapt to this situation as it would mean the x86_64 codepath would have to always check whether it is running on prism emulation before loading plugins - this seems very unlikely to be added.
This whole ARM situation with plugins on Windows is frustrating. I think we would also need to be able to offer arm64x builds sooner or later.
Bitwig is a native arm64 application and can load native arm64 plugins (they also support arm64ec plugins). According to them, arm64ec builds are up to 20% slower than normal arm64 builds on Snapdragon X:
Installers:
I like the idea of one installer and how it works with vst3. We canāt expect that users know about the different arm builds. I think it should be the hostās responsibility to choose the right binary. I would ignore the use case of x86_64 hosts on ARM. Users with that setup donāt have optimal performance anyway.
CLAP and other formats do not have any folder structure. In this case, I would install both formats without asking the user. I think the best solution would be a triple universal binary that contains arm64x (arm64, arm64ce) and the x64 plugin. Not sure if that is possible somehow. It would make things simpler, also for the enduser.
Unfortunately, I donāt think an universal binary with x64 and any form of arm is going to happen.
Iām still hoping to get arm64ec to work. Providing multiple extra binaries for just one platform is a testing nightmare and not worth it for a few % of users.
The 20% performance penalty you mention seems rather high to me. Checking the Bitwig site, it says We have internally benchmarked Arm64 to be up to 20% faster than Arm64EC on Snapdragon X for certain algorithms..
Without knowing what ācertain algorithmsā are, the 20% are a useless number.
The differences between arm64 and arm64ec are in calling conventions and the binary format of some structs.
For heavily inlined dsp code, I donāt see how this can make a 20% difference - I for one see no measurable performance difference between arm64 and arm64ec for my builds using JUCE - once I had worked around the pitfall of that terrible SSE emulation library MS came up with (softintrin.lib). As it comes as a static library, it is only applied during linking and kills speed by not allowing proper inlining.
How are you planning to achieve this? Wouldnāt they overwrite each other? As you know both x64 binaries and arm64⦠binaries both are located in āProgram Filesā by default.
I will add a postfix. Probably a ā_arm64xā postfix to the CLAP/VST2 binary filename. Like we did for the x86 / x64 transition.
But I have this project on hold. We didnāt receive many arm requests in the last months. I hope JUCE / CMake supports a simple way to create arm64x in the future. I want to avoid the confusion that will happen when we going from arm64ec to arm64x.
Yes, we donāt know what these algorithms are. Nevertheless, this is not optimal.