HOLLYWOOD, CA—The Society of Motion Pictures and Television Engineers (SMPTE), currently celebrating its centennial year, hosted its annual three-day technical conference and exhibition in Hollywood at the end of October. While much of the focus was on visual technology—specifically, ultra-high-definition (UHD) and 4K/8K resolution, higher frame rates (HFR), high-dynamic-range (HDR) and wider color gamut (WCG)—various sessions investigated issues associated with delivering audio for Next Gen TV, streaming VR audio content over the internet and delivering a satisfactorily immersive object-based audio experience in the movie theater.
The conference’s audio program chair, Chris Witham, from Walt Disney Studios, noted that processing and delivering immersive audio in the home and in theaters is still a daunting challenge. Fraunhofer’s Robert Bleidt got the panel started with his presentation, “Building the World’s Most Complex TV Network,” describing the testing of MPEG-H Audio, developed by his company in collaboration with Technicolor and Qualcomm. MPEG-H is now part of the ATSC 3.0 A/342 standard and is planned for broadcasts in Korea beginning in 2017, with TV sets going on sale in the country in the first quarter of the year.
MPEG-H Audio’s three principal features—interactivity, immersive sound and universal delivery—have been demonstrated on a testbed that replicated a live sports mixing workflow from remote truck to the home, at NAB 2015 and an ATSC event later that year. In the tests, MPEGH Audio supported 13 different formats. Why so many? “That’s the future,” said Bleidt. “We had to figure out ways to make that work.”
Fraunhofer had to develop a variety of solutions, among them new codecs; capabilities to carry metadata through the HD-SDI plant and enable it to survive editing; monitoring, loudness metering and mixing solutions; and a 3D soundbar for consumers. The company turned to Junger Audio for a custom accessory unit that could be added to a Calrec mixing console and manage the necessary monitoring, authoring and panning capabilities beyond 5.1.
Worldwide, MPEG-H is currently a planned Next Generation Audio (NGA) standard for Korea—which will use it for live transmission of its 2018 Winter Olympics—and it will be in the next edition of Europe’s DVB standard. Countries that have adopted Japan’s ISDB standard may adopt MPEG-H for future NGA schemes to add the interactive functionality lacking in the current 22.2-channel AAC format, suggested Bleidt.
Google’s Dr. Jan Skoglund shared that company’s progress with an audio compression scheme for streaming VR content. After investigating AAC, Vorbis and other schemes, Google decided to extend Opus, an open source codec initially developed for speech and music. Mobile devices typically support stereo and sometimes 5.1 and Google is therefore working to get the codec onto every platform on which the company operates, Skoglund said.
The first-order ambisonics microphones currently being used for field VR audio capture require transport of just four channels. As Skoglund noted, only a handful of such mics are currently available, the cheapest costing about $1,000.
But, Skoglund also reported, Google is aiming to eventually handle third-order, which requires 16 channels, and has funded research in New Zealand on a mic with 160 capsules—approximately fifth-order—using off-the-shelf components, a reference design that might retail for around $2,000. “Higher-order mics are coming,” Skoglund predicted.
In its first round of trials, Google selected the Opus codec at 192 kb/ second over AAC for further development, he reported, even though AAC offered slightly better directivity. Google has now made various online tools available for content creators to upload VR projects to platforms such as YouTube. In the coming months, Google will add support for non-diegetic stereo tracks (those with no visible source) that will not be head-rotated, enabling voiceovers and music soundtracks to be included.
In the final paper of the session, “Loudspeaker Requirements in Object-based Cinema,” Paul Peace from Harman’s JBL Professional took a deep dive into the significant differences between the mx room and the movie theater. As Peace has discovered, due to the differences in scale between the two environments, and because theater owners have simply upgraded their existing 5.1/7.1 systems to handle the new generation of immersive formats, there are currently major challenges to delivering a satisfactory object-based audio experience.
Peace has analyzed the differences in directivity, level, frequency response and timing between speakers in typical examples of these environments. The screen channels, and especially the center channel, perform best, he discovered, “Because they haven’t changed over the years.” But theater owners are using the same arrays of surround speakers they installed to support 5.1/7.1 playback for the new immersive soundtracks and expect them to deliver the same quality by letting the rendering engine do the best that it can, he said. Yet the two formats are very different.
Any speaker in a movie theater must compensate for the inverse square rule, but the scale of a typical cinema makes uniform coverage from every channel impossible. For instance, there can be a 24 dB swing in distribution from an overhead speaker across the audience, Peace reported. To compensate, mixers may gang several overhead speaker channels together to achieve more even coverage. “But that defeats the purpose of immersion,” he pointed out.
The way forward, Peace believes, is improved loudspeaker design—hence his research. “We can get away with fewer speakers once they do a better job,” he said. “But we can’t test until we have these better speakers.”