HOLLYWOOD, CA—For decades, whatever the loudspeaker configuration, multichannel surround audio production was strictly the domain of the major film studios, until the arrival of optical disc-based delivery formats brought a democratization of the process through a proliferation of new tools, then adoption by television broadcast. In contrast, the newer channel-and object-based immersive production and delivery formats that add a height layer, such as Auro-3D, Dolby Atmos, DTS MDA and MPEG-H Audio, co-developed and championed by Fraunhofer, have arrived on the market with software and hardware readily available.
But exactly what tools and workflow modifications does it take to produce these new formats? The key to working in many of them—not surprisingly, given its ubiquity—is Avid’s Pro Tools|HDX DAW.
“In order to mix a Dolby Atmos project, a mixing facility needs a Pro Tools|HDX system with the Dolby Atmos Panner plug-in, and 128-channels of MADI I/O in order to connect the Pro Tools rig to Dolby’s Rendering and Mastering Unit [RMU],” says Brett Crockett, senior director of sound technology research, Dolby. “The Pro Tools computer also needs the standalone Dolby Atmos Monitor application, which displays and interprets the object automation data from the Pro Tools session.”
RMU firmware comes in two flavors, respectively optimized for cinema and home presentation. “The home Atmos RMU writes the Dolby Atmos Master File based on the mixed audio and automation data it receives via MADI from the Pro Tools system and the Dolby Atmos Monitor,” says Crockett. “The RMU is also capable of emulating the performance of a variety of consumer AV receivers so that the mixer can audition channel-based downmixes in addition to the full consumer Atmos presentation.”
The tools are already migrating into smaller facilities, according to Crockett. “We’ve seen several consumer Dolby Atmos mixes, especially re-mixes from channel-based theatrical mixes, prepped in traditional 7.1 rooms, with object tracks edited and automation roughed in, and then final mixes performed on a proper consumer Dolby Atmos dub stage.”
“As MDACreator is an AAX32/64 plugin, any Pro Tools 11 system is capable of mixing an MDA project,” according to Brian Slack, manager, Advanced Cinema and Professional Audio Solutions at DTS. “The primary requirement would be hardware outputs from the Pro Tools system. The advantage of mixing in MDA is the ability to create content in any speaker configuration, and playing that content back with optimal results on the same, or any other, speaker configuration. If the mix room has 7.1, 11.1 or higher speaker count, the Pro Tools system would have to have that many outputs. However, MDACreator is capable of playing back content on as small as a stereo configuration.”
MDA’s scalability means that the tools can easily move upstream. “MDA is not a prescriptive system, so the exact speaker configuration is up to the client. This being said, both 7.1 and 11.1, or 7.1+4, are both popular configurations for edit bays and smaller mix stages,” says Slack.
Auro Technologies, too, currently offers its software tools only in AAX 32- and 64-bit for Mac and Windows. The plug-ins, collectively called the Auro-3D Creative Tool Suite, include the Mixing Engine, Auro-Panner, Auro- Matic Pro upmixer, Auro-Codec Encoder and reference Decoder, and Auro-Headphones. VST and AU versions are expected in Q2 2015.
“When we started with our format two years ago, everything was basically limited to eight-channel-wide bus structures,” explains Sven Mevissen, director of content production for Auro Technologies. To overcome that limitation (Pro Tools, for instance, offers eight monitor channels but Auro-3D requires at least 10 for its 9.1 format, or a dozen for its 11.1 configuration), he continues, “We have our own bus and routing structure that happens in the background that takes audio and metadata and does the rendering in the background and brings it back into the host application.”
The company’s roadmap includes a couple of other plug-ins expected for release later this year, including a beta version of software that will enable Auro’s tools to be driven by a hardware controller rather than from the computer screen. “For example, solo, mute and level changes, you have to do in our tools,” says Mevissen, so the new software “makes for much better integration of our tools into existing workstations.”
Just as digital television broadcast standards worldwide adopted 5.1 surround sound from the cinema, groups developing the next-generation broadcast standards are already eyeing immersive formats. The Advanced Television Systems Committee (ATSC), for example, has begun a technical review of three immersive audio delivery proposals—from Dolby (Dolby Audio, also known as AC-4), DTS (DTS:X) and an alliance of Fraunhofer, Qualcomm and Technicolor (MPEG-H Audio)—for its ATSC 3.0 standard.
As Robert Bleidt, general manager, Audio and Multimedia Division, Fraunhofer USA Digital Media Technologies, has noted during presentations over the past 12 months, his company’s real-time encoding and playback solutions offers an upgrade path from current broadcast workflows and a transition to an immersive channel- or object-based delivery format. Fraunhofer’s tools include a real-time encoder that enables outside broadcast (OB) contribution; a real-time encoder for emission to consumers, for web streaming or over-the-air trials during the ATSC 3.0 evaluations; and a professional decoder to recover the OB uncompressed audio for further editing and mixing, and to monitor the emission encoder’s output.
MPEG-H Audio accommodates formats from stereo to eight-channel surround plus four height channels (described by Fraunhofer as 7.1+4H), as well as objects, which might be alternate languages or sound effects, for example. It also handles HOA, or higher-order ambisonics.
For the future broadcast scheme, says Bleidt, “The system that you will find most of the companies developing new audio systems are proposing for immersive sound production is the 7.1 Blu-ray surround sound configuration and then four speakers in the top layer. That’s probably what you will see first trials with. That’s not the only configuration—the MPEG-H system is quite flexible; you could use 5.1+4H, or other configurations.”
According to Bleidt, facilities can make the transition to full immersive production with object-based elements one step at a time. Initially, current surround sound formats and metadata can be transmitted using MPEG-H, with the advantage that its coding efficiency reportedly offers a 50 percent bit rate reduction compared to the current AC-3 codec. In the next step, interactive elements, such as adjustments to announcer or sound effect levels, might be added. From there, height channels could be added, including the use of HOA and, in a final step, a broadcaster might add dynamic objects.
In terms of production workflow, says Bleidt, interactive elements are easily generated with traditional mixing consoles. “We would like to take as whatever your console has for direct outputs the things that you want to have as interactive objects. We want to send a mix-minus that doesn’t have those separate, independent sources in it. You feed those into our monitoring unit, so you can hear what it sounds like at home. You feed them into a contribution encoder in order to get them back to the network, if you need to. That’s it.”
As for an immersive mix, “You need to have microphones that capture height information for the top layer speakers. There are many ways to do that,” he says, from various mic “tree” arrangements to an ambisonic microphone.