Does JUCE support Dolby Atmos?

Hey guys,
I’m trying to wrap my head around Dolby Atmos and it’s support in JUCE.
I understood the format in the way, that Daniel describes it in this thread:

He wrote:

As I understood dolby atmos you are not rendering to separate channels but rather use three channels to define the position in space and the actual signal. The matrix for which actual speaker should be fed is done at the theatre. So the channels 24-27 ambisonicX, ambisonicY, ambisonicZ and ambisonicW for the actual wave should be used…
These channels are additional to the “normal” 5.1 surround beds.

But Justin responds:

there’s nothing ambiosonic about it.

I see in the documentation that JUCE offers a 7.1.2 channelset (https://docs.juce.com/master/classAudioChannelSet.html#ac37d2d92a7dc1983d06c085109cee237). But that is just form home theaters and not the “real deal” is it? How can this be achieved? Can I just render 3D positions to the ambisonic channels and be done with it?

Best
Thomas

As I understand it, we got hung up on terminology here…

Wikipedia understands ambisonics as term for surround formats, that defines a spatial 3D position to monophonic sound sources. In that regard Dolby ATMOS would fall under that term.

However, I did hear a different notion about ambisonics being a specific speaker setup with opposite pairs of speakers being driven by the inverted signal, so the waves appear at any position driven from both sides, which is obviously different from Dolby ATMOS.

Apart of that line I don’t know of Justin’s thought process…

Just quickly jumping in here:

Dolby Atmos is not Ambisonics. Ambisonics is a scene-based spatial audio format, where the signals are encoded in a specific way (spherical harmonics). In order to play it back, it has to be decoded/rendered to headphones (binaurally) or to loudspeaker arrays (in any configuration you want).

Atmos is a proprietary format, which - afaik - combines object-based and channel-based audio. I am not sure if they support scene-based (e.g. Ambisonics).

Also not true. I think you are referring to Ambiophonics, however there are also no inverted signals.

1 Like

Thank you for your comments. I think I gonna leave here, since I realise I haven’t read deep enough into the documentation.

To me the ATMOS specs looked like they use simply three linear values to define the position in 3D space per object. How the rendering in the show is done is a completely separate issue of what JUCE would need to deliver for the (cinema) processor what’s needed to render.

But never mind.

That might be the case… was a talk I heard a while ago, so I might have confused the terms… thanks for clearing that up/

To understand the differences between Atmos beds, Atmos objects and how the Atmos RMU is used in production contexts I’d advise to take a little time to watch the two videos linked here. You should have a much better understanding of how it all hangs together when you come out the other side.

1 Like

Do you mind telling a bit about scene-based in that context? I would like to understand :slight_smile:

Let’s divide the production and playback process into two steps:
Encoding/Panning: bringing signal into that format
Decoding/Rendering: extracting loudspeaker feeds from that format

So basically there are three kinds of audio formats when in comes to spatial audio:

  • Channel-based
    Each audiochannel holds the signal for a specific loudspeaker, you can directly play it back to your loudspeaker setup, as long it’s the right one. So stereo, 5.1., 7.1.4, etc are all channel-based formats.
    Encoding: Bringing sources into that format needs a panner e.g. a stereo-panner which distributes the source signal to the loudspeaker audio channels. For something like 7.1.4 you need a more sophisticated panner, however that’s up to the producer side, e.g. with VBAP, VBIP or some other panning method.
    Decoding: Playback is fairly simple, send channel one to loudspeaker one and so on. However, if your loudspeaker setup doesn’t match the channel-based format, the results won’t be as intended.

  • Object-based
    With the object based format, you get a bunch of audio-channels which hold isolated signals of sources, and you also get some meta information about where those audio-objects should be positioned during playback. This meta information can hold x,y, z position information or direction information like azimuth and elevation.
    Encoding: Bringing sources to that format needs basically just a meta-information writer, which adds positional information to the format. How this is done depends on the format itself. Afaik, MPEG-H has a dedicated audio-channel which has all the meta-data in it.
    Decoding: Playing back object-based audio needs a renderer, which fuses source signals and positional information for the desired loudspeaker setup. The loudspeaker setup doesn’t need to be a standardized format, as a good renderer might be able to render to custom setups.

  • Scene-based
    The only scene-based format I know of is Ambisonics. Ambisonics is has a long history and there’s so much research going on at many universities, also the one I work at. It’s called scene-based as a whole scene consisting of any number of sources is stored within the audio-data, and it is completely independent of the playback system. It is stored using spherical harmonics (let’s vaguely describe it as a fourier-transformation not over time but over the surface of a surrounding sphere). A nice entity of those spherical harmonics is, that the scene can be rotated very efficiently which is an important feature when it comes to VR applications. Ambisonics is used by FB360, Youtube 360 degree videos, and many many other VR platforms. The number of channels needed depends on the desired spatial resolution you want to achieve. The lowest one - first order Ambisonics - needs 4 channels, higher orders need more: order N - > (N+1)^2 channels.
    So third order would need 16 channels.
    Encoding: Same as channel-based, you use a panner, however that one performs the calculation of the spherical harmonics needed to represent the source at the desired position. The output is an Ambisonic signal.
    Decoding: You also need a renderer (we say decoder), and there are quite a few ways how to calculate them. One of the methods is called AllRAD, with it you can create a good decoder for any arbitrary loudspeaker setup. You can also listen to Ambisonics with headphones, using binaural decoding, simulation what you would hear, if you would be within that sphere being surrounded by those encoded sources.

I wrote a bunch of Ambisonic plug-ins with JUCE, here a small binaural demo of how the encoding looks like: https://www.youtube.com/watch?v=FVeqpChrX-o Excuse my german accent :wink:

However, I guess we should go back to the main-topic again :slight_smile:

5 Likes

Thank you for explaining, that makes a lot more sense now.
Seems I picked up a much to simplified explanation on the way…

nice work @danielrudrich.
Have you had a crack at proper surround plugins using the recent JUCE versions?
My surround plugins are way back on JUCE 4.1 (?) because too much was broken in the surround implementation after that. I’ve been hacking new channel layouts into it ever since.
Justin

If I understand this correctly, Pro Tools supports Dolby Atmos 7.1.2 (*). Which is a channel layout JUCE supports (**). So that’s probably as good as gets, right?

(*) https://www.avid.com/press-room/2017/04/pro-tools-dolby
(**) https://docs.juce.com/master/classAudioChannelSet.html#ac37d2d92a7dc1983d06c085109cee237

Yes, which Atmos refers to as a 7.1.2 bed. It’s 7.1, plus 2 height channels that are typically used as a stereo stripe along the ceiling front to back. The bed is a standard channel based layout just like quad, 5.1, 7.1 etc always were. Anything else in the ceiling is positioned via the Atmos RMU as a point source, and is referred to as an object. Pro Tools also supports hardware RMU (and also a software version of it) which you’d use to position (typically) mono sends with x/y metadata as objects. What ceiling channel the object actually ends up being played out of depends entirely on the setup in the playback space (the idea being the mix engineer doesn’t need to know in advance what that is so theatres can scale the setup nicely and retain an accurate placement precision without phasing irrespective of the number of speakers).

Generally speaking a score and other static things are mixed into the bed, the objects are used for stuff that’s moving around to immerse the audience in the action (a bullet spray, flyover, etc). Some people advocate supplementing the score in the bed with a some ceiling objects for a bit of extra depth (e.g. extra decorrelated reverb channels), etc, but that’s all down to the mixing engineer.