Allocation in FloatVectorHelpers::multiply

I have been running some tests using juce 7.0.7.

On macOS Ventura it looks like there is an allocation happening in FloatVectorHelpers::multiply with stacktrace:

plugin_base_tests  0x0000000104d6a8c8 _ZN8hello10mallocHookEP14_malloc_zone_tm + 64
dyld               0x00000001ab315fe4 _ZN5dyld412RuntimeState16_instantiateTLVsEm + 216
libdyld.dylib      0x00000001ab667ecc tlv_get_addr + 108
libBLAS.dylib      0x00000001ac105118 BLASStateRetain + 32
libvDSP.dylib      0x00000001b6241b60 vDSP_viclipD + 10908
plugin_base_tests  0x000000010567feec _ZN4juce18FloatVectorHelpers12_GLOBAL__N_18multiplyIiEEvPffT_ + 52
plugin_base_tests  0x000000010567feac _ZN4juce25FloatVectorOperationsBaseIfiE8multiplyEPffi + 40
plugin_base_tests  0x0000000104f6eff0 _ZN4juce17SmoothedValueBaseINS_13SmoothedValueIfNS_19ValueSmoothingTypes6LinearEEEE9applyGainEPfi + 248
plugin_base_tests  0x0000000104f64c64 _ZZN20CommonAudioProcessor12processBlockERN4juce11AudioBufferIfEERNS0_10MidiBufferEENK4$_22clES3_ii + 312
plugin_base_tests  0x0000000104f62e2c _ZN20CommonAudioProcessor12processBlockERN4juce11AudioBufferIfEERNS0_10MidiBufferE + 2016

With macOS Sonoma I don’t see the same thing. Any idea what is going on?

This looks suspiciously like it’s initialising a thread-local variable in one of the Apple system libraries. If that is what’s going on, I’m not sure how you can work around it - this will need to happen at some point on each thread that calls this system function:

With macOS Sonoma I don’t see the same thing.

Perhaps the new OS has an updated dyld implementation.

ok, thanks for the quick investigation.

Maybe juce could call FloatVectorHelpers::multiply once when starting up the audio thread, before doing real time things like calling processBlock. What do you think?

I don’t think that’s feasible - in a plugin, JUCE doesn’t create the realtime thread, so there’s no “safe” opportunity to call this function. I’m also not sure that it makes sense for us to add this in non-plugin contexts, as this would introduce overhead for users who might not call any vDSP functions. It would also set a bad precedent: there’s no way that we can preemptively call every system function that might happen to use thread-local variables.

1 Like