[solved] Why does my Abs function crash but not std::abs?

I hope it’s OK to ask generic C++ stuff here…

I’m playing around with various optimizations and approximizations, just having fun and trying to squeeze the odd microsecond out here and there. So I wrote my own approach to a function that turns a value into its absolute (positive) value, basically to measure it against std::abs(). This is my implementation:

namespace Math {
    template <typename TYPE>
    inline const TYPE Abs (const TYPE& Value)
    {
        return Value + (Value < static_cast<TYPE>(0)) * static_cast<TYPE>(-1);
    }
}

I have measured and timed it, my own abs() is slower in a debug build, but usually more than 2x faster in a release build, so I really want to keep it in my framework.

02

In this simple little wavefolder snippet, I tried to use my own Math::Abs instead of std::abs, obviously to get the performance benefit:

const double sign = Math::Sign(Sample);
double value = Math::Abs(Sample);  // <--------------------- this call works fine

while ((value > 1.0) or (value < 0.0))
{
    const double positive = Math::Abs(value);  // <--------- this call halts the host
    
    value = (value > 1.0) * (1.0 - (value - 1.0)) + (value < 0.0) * positive;
}

So when I use my Math::Abs to get the absolute value of the current sample at the beginning, it works just fine and does what it’s supposed to. But as soon as I use it inside the while loop, the host just halts and stops responding when the project loads.

If I replace the Math::Abs call inside the while loop with a call to std::abs, everything works fine again, host loads project, sound is there. Swap std::abs for Math::Abs again, host stops responding on project load and I have to kill it.

I tried playing around with everything about the Math::Abs implementation, making it non inline, non const, making the argument non const, non reference, even using a dedicated variable before the return. I even put the result of the Math::Abs call into a variable inside the while loop, to make sure there’s no trouble with assigning and referencing the same variable in one line.

But nothing helps, it just keeps halting the host.
Anyone got a clue why that would be the case?

Oh wow, I had a really, really, REALLY dumb bug in that abs code… :skull:

From the initial tests, I still had that last factor as -1 in there… so that basically always kept the loop going… i.e. host hangs.

The correct Abs function looks like this:

template <typename TYPE>
inline const TYPE Abs (const TYPE& Value)
{
    return Value + (Value < static_cast<TYPE>(0)) * (static_cast<TYPE>(-2) * Value);
}

Even with that additional multiplication at the back, it’s still more than 2x faster than the regular std::abs, so good enough for me.

Thank you to those who took the time to read, and sorry for wasting your collective time. :slight_smile:

Interesting … did you try benchmarking against a simple:

template <class T>
constexpr T abs(T x) noexcept
{
  return (x > static_cast<T>(0) ? x : (-x); 
}

I see no apparent reason why this should be slower than yours (1 comparison + 1 multiplication / sign reversal), plus it doesn’t suffer from potential overflow / sign issues.
Also keep in mind std::abs() correctly handles NaN and Inf value which might explain the additional cost.

1 Like

I did that right now, after reading your reply.

Here are the functions I’m benchmarking:

For the sake of completion, so you see I didn’t tamper with it :wink: (except for closing that missing bracket):
37

And here are the results:

Hm, looking at those for loops, I’m wondering why Xcode didn’t warn me about the >-1 comparison against an unsigned int… it usually would. I guess the number of loop runs is then +1. But I doubt that would change the outcome. :wink:

Micro-benchmarking is a tricky business.
The way you’re doing it with a fixed input number won’t give accurate results. You really should use an array with varying values, as otherwise the compiler will most probably optimize away things.
And possibly also use (pseudo-) random numbers, if you want to factor out the CPU branch prediction.

Inserting the code below in http://quick-bench.com/W93ApP8u5bMSC0SmnQqBwqqvs8s

#include <cmath>

using T = float;

static T make() { return static_cast<T>(rand() - 0.5 * RAND_MAX); }

static inline T absf(T x) noexcept
{
  return (x >= 0) ? x : (-x);
}

template <typename TYPE>
inline const TYPE Abs (const TYPE& Value)
{
    return Value + (Value < static_cast<TYPE>(0)) * (static_cast<TYPE>(-2) * Value);
}

static void BenchAbs3(benchmark::State& state) {
  // Code before the loop is not measured
  T x = make();
  for (auto _ : state) {
    // Make sure the variable is not optimized away by compiler
    benchmark::DoNotOptimize(absf(x));
  }
}
BENCHMARK(BenchAbs3);

static void BenchAbs(benchmark::State& state) {
  T x = make();
  for (auto _ : state) {
    benchmark::DoNotOptimize(Abs(x));
  }
}
BENCHMARK(BenchAbs);

static void BenchfAbs(benchmark::State& state) {
  T x = make();
  for (auto _ : state) {
    benchmark::DoNotOptimize(std::fabs(x));
  }
}
BENCHMARK(BenchfAbs);

you get this

The first two produce the same exact assembly too :wink:

3 Likes

What’s with the DoNotOptimize ?

A compiler would know, that the result is not used, and hence not execute the whole call.
See here: https://github.com/google/benchmark#preventing-optimisation