What exactly mean X bit depth?

I am not sure if I understand meaning of “bit depth” in audio.

As I see in Juce projects in processBlock (AudioBuffer<float>& buffer, MidiBuffer& midiMessages) the audio sample values (when not distorted) are between minValue = -1.0f; and maxValue = 1.0f;.

I need to know the smallest possible value next to 0.0f of one sample.

As I suppose “x bit depth” means that one sample can be expressed by 2^16 various values. Am I right?

So does it mean that the smallest possible sample value change is (maxValue - minValue) / (pow(2.0f, x)) ??? So for 16 bit depth is it 2.0f / pow(2.0f, 16.0f) ?? Which is 4.6566128742e-10 ?

I have mess in mind, couse I have even problem with simple float variable. As I know it has 32 bits. So the biggest value should be 2^31 (and one bit for sign) which is 2147483647. And for int it works, but how is it possible that float max value can be 3.40282e+38 ? Which is much more than 2147483647.

So let’s imagine I have audio with 32 bit depth. So then the smallest possible sample value change is 2/2147483647, or maybe 2/3.40282e+38 ???

And other issue is: should I realy devide (maxValue - minValue) / (pow(2.0f, x)) - which means bit depth concern to amplitude. Or it concern to gain - which means I should calculete it by 1.0f / (pow(2.0f, x)) ?

Please help me to understand it because my math calculations (a lot of log() and pow()) gives me a lot of “NaN” and “inf”. So I need to narrow the possible inputs and outputs to avoid such errors. But first I need to understand how to do that.

For any help great thanks in advance.
Best Regards

Floating point numbers work similar to our normal numbers, see here:

So there is not a minimum epsilon, it changes, depending how big the number is.

When using float in audio, the max number of dBFS (full scale) is represented with 1.0. That means, you can exceed 0 dbFS, but you need to take care of that, if you were to manually translate it back into an integer signal.

You don’t get to know the actual source bit depth of the incoming audio from the host anyway, so you can forget about thinking if it’s 16/24/32 bit etc. (That is, if there’s for example a 16 bit audio file on the host DAW’s audio track, your plugin doesn’t really know about that, the host converts the audio to 32 or 64 bit floating point before it comes into your plugin. The host may also be doing other processings like resampling, time stretch or pitch shift before feeding the audio into plugins, which would destroy the knowledge about the original bit depth of the source audio.)

Why is it important for you to know this stuff? Are you trying to avoid divides by zeros or something like that?

OK, great thanks. That’s why I want to design my plugin to handle 64 bit depth samples. But the question was not how to recognise what bit depth I receive from host. But how to calculate the smallest possible value next to zero for specified bit depth, which is in my case 64 bit depth.

C++11 has the std::nextafter etc functions, which might be useful, but maybe you should redesign your code so that you don’t have to be dealing with the whole thing anyway? Why do you need to know “the smallest possible value next to zero”?


As I told, I have a lot of maths algorithms which uses a lot of log() and pow() which in some scenarios (which I can’t forecast all of them) gives me “NaN” or “inf”. And I need to avoid that, that’s why I want to make narrower possible inputs.

For example I think I don’t need to use 1.17549e-38 (smallest float value) untill my audio bit depth can’t even manage such small values.

Where I am wrong?

I think, there is some confusion.
Fixed number arithmetics and floating point number arithmetics have different features. Originally floating point numbers were very expensive, they even needed an extra processor to be installed. That is one of the reasons, why there are audio file formats using integer numbers (not to be confused with the C++ integer type).

Nowadays, for processing it is much more common to use floating point numbers, since the artefacts in integer numbers, once you exceed the numeric limits, are horrible.

If you say, you want 64 bit processing, that information alone doesn’t mean anything. Most of the times, when you see advertisements for 64 bit processing, they are using in fact still floating point numbers, but the C++ type “double”.

In juce you can support that by adding a processBlock overload to your processor called:

processBlock (AudioBuffer<double>& buffer, MidiBuffer& midi) override;

and you make supportsDoublePrecisionProcessing() returning true.
Now, and if you are lucky and the host implements that as well, your algorithms will use 64 bit floating point processing.

It is very well defined, when the result of these functions like log is undefined. That won’t change, no matter how many bits you throw at it :wink:
You have to design your algorithms in a manner, that you don’t feed these values, e.g. log(0).

OK, so let’s look at that. I have method which makes such operation:
return pow(10.0f, -24.0f* (1.0f - someInputValue) / 20.0f )

And In in some crazy scenarios the someInputValue is for example 100.0f. Then that method returns “inf”.
I need to avoid that, so I need to set max value for input such my method returns me number not greater than float max which is 3.40282e+38. Am I right?

But unfortunately that’s only half of true. Because later, the return of my method I devide by audio sample envelope (abs(sample)), which can be values from 0.0f to 1.0f. So I have such situation:
pow(10.0f, -24.0f* (1.0f - someInputValue) / 20.0f ) / abs(sample);

Of course if abs(sample) == 0.0f I can avoid operation by simply:

if ( abs(sample) != 0.0f )
       pow(10.0f,  -24.0f* (1.0f - someInputValue) / 20.0f )  /  abs(sample);

But let’s say I limited someInputValue such my method gives me greatest possible value not greater than 3.40282e+38. And to remind the smallest possible abs(sample) is given by smallest possible float which is 1.17549e-38. So I can end up with situation like that:
3.40282e+38 / 1.17549e-38
And it again gives me “inf”.

So to avoid that, I need to limit someInputValue in such way that my end operation:
pow(10.0f, -24.0f* (1.0f - someInputValue) / 20.0f ) / 1.17549e-38; is not greater than 3.40282e+38.

But I suppose I don’t need to use min of float (1.17549e-38), it would be enaugh if I use smallest possible value of sample, which I think is defined by bit depth. That’s why I need to define smallest possible value for maximum possible bit depth. To find useful limit of someInputValue.

Does it make a sense?