HOLLYWOOD, CA—The 2015 annual Society of Motion Picture and Television Engineers (SMPTE) Technical Conference and Exhibition attracted its highest attendance in more than a decade, drawing 2,328 registered attendees from more than 35 countries. Co-chaired by SMPTE Fellows Paul Chapman and Jim DeFilippis, SMPTE 2015 offered a program that focused predominantly on topics related to Ultra HD video, Virtual Reality (VR) and Augmented Reality (AR), and also turned the spotlight on immersive sound and personalized audio.
At The SMPTE Conference, Fraunhofer USA’s Jan Nordmann said the next generation of codecs will likely combine audio channels and audio objects. According to Chapman, “In so many ways, this was a banner year for the SMPTE annual Technical Conference and Exhibition. For starters, so many high-quality papers were submitted that we literally had to make room for presentations. We also had to add space to accommodate more than 90 exhibitors. That, coupled with the record number of attendees, indicates that the industry is on a real upswing.”
Addressing the T in SMPTE, Peter Poers, Junger Audio’s managing director of marketing and sales, advocated for immersive audio to accompany the adoption of 4K, high-dynamic range (HDR), wide color gamut (WCG) and high frame-rate (HFR) video. The timeframe for adoption will be set by the codec developers and uptake into consumer products, he said.
The best way to give these NGA, or next-generation audio, formats a fighting chance for adoption is to keep cost and effort to a minimum, he suggested: “The use of existing digital production infrastructure is essential to start content creation for new formats soon and within the next two years.”
But there will need to be new tools and workflows. A critical component in production and post will be the Multichannel Monitoring and Authoring unit (MMA), bringing together audio interfacing and computing, along with metadata authoring, said Poers. The MMA will enable the delivery of personalized audio—consumer selection of alternate languages or commentary tracks, or adjustment of levels of certain audio elements—and will incorporate stereo-to-surround upmixing and provide loudness control.
The next generation of codecs will likely combine audio channels, audio objects and, in the case of Fraunhofer’s MPEG-H, Higher Order Ambisonics (HOA), according to Jan Nordmann, senior director business development for Fraunhofer USA. “We need to binaurally render that immersive audio experience over headphones,” he said.
“If you want to do immersive audio over headphones for VR, you need head tracking, you need to be able to render audio elements from every direction and distance, and if you want to target mass consumer platforms, you need to have resource-optimized implementation,” he added.
VR cameras are coming onto the market and there are a few microphones that have been used in a broadcast context, including a 3D mic tree used by Fraunhofer in its experiments. “These are usually expensive or difficult to use. Making 3D audio capture more convenient is a topic of ongoing research,” he commented.
“If you do immersive audio right in VR, it really helps you to tell stories in a new, different way,” said Nordmann. But, he cautioned, “Immersive audio is quite new. People will experiment; they will often fail.”
Fraunhofer is introducing new tools for immersive audio, including a VST reverb plug-in that enables dynamic placement of channel beds or sound sources in virtual space. Known for its development work on mp3 and AAC, the company is proffering MPEG H, he added. “It provides immersive audio, meaning 3D height information, and it’s flexible, in terms of listening environments or consumer devices.”
In summary, said Nordmann, “I’m glad SMPTE has a VR symposium for the very first time and we have a talk on audio. Experiment and learn; use the tools that are out there to create great content.”
Nils Peters, senior staff research engineer at Qualcomm, discussed why scene-based audio, in particular HOA, is a practical and elegant solution for creating and transmitting immersive content for next generation audio services. In MPEG-H, scene-based audio may be coded and transmitted with high efficiency and rendered specifically to a consumer’s personal reproduction environment.
Humans tend to focus on a small number of important sounds at a time, with less important sounds fading into a diffuse and reverberant background. This concept allows efficient compression of immersive audio for broadcast, as the HOA order of the less critical sounds may be reduced without significant perceptual impact, he said.
The impact of object-based audio (OBA), with its capabilities to enhance the listening experience and for personalization, will be as revolutionary as sound was to motion pictures in the 1930s, according to Steven Silva, VP of technology and strategy at 21st Century Fox. Siva offered various mix strategies and workflows for OBA in television broadcast, including deployment at live events of a “sidecar mixer” to insert metadata for those audio sources designated as objects.
Immersive sound will be combined with Ultra HD television in the next generation of encoders, said Silva. “OBA and immersive sound will create innovative commercial ventures for programmers, broadcasters and multichannel video programming distributors.”