The acronyms with the most buzz this year must be VR and AR: virtual reality and augmented reality, respectively. They brought mass market awareness with those TV commercials featuring awestruck consumers with HMDs (head-mounted displays/devices) for VR and Pokemon Go for AR. The ability to quickly track, process and utilize head (among other sensor-equipped body part) movements by the participant allows an interactive experience. Within a fully 360-degree immersive (virtual) environment, the participant can move about as visual and aural perspectives change in real time. And, when visuals correspond seamlessly with 3D audio, the illusion of reality can be convincing and breathtaking.
You might ask, “what is our contribution to the VR experience as audio content creators and engineers?” The ability to provide user-controllable multi-channel audio that is independent of the number of speaker channels (or playback medium) is enabled by Ambisonics. First developed in the 1970s by the UK’s National Resource Development Corporation, this surround sound reproduction method seeks to capture and recreate a fully spherical aural environment, including information on the X-axis (front/back), Y-axis (left/right) and Z-axis (up/down). This method would capture such information “isotropically”—that is, treating sound from all directions equally. This capture of “a complete set of information” can be sculpted into stereo views aimed in a particular direction, or a folded-down binaural version of the entire sound field, multi-channel surround sound (like Dolby 5.1), or into a four-channel first-order Ambisonic file that can be played back with user interaction.
Such Ambisonic audio, which some describe as a three-dimensional extension of the M/S technique, could be captured in a number of ways and, via additive and subtractive processes to the signals captured by the microphone capsules, various soundstage widths and directional dominance properties can be changed during playback.
A tetrahedral microphone array is typically utilized for such capture today; Sennheiser’s Ambeo VR microphone is an ideal example. It consists of four directional cardioid capsules arranged coincidentally and aimed outwards, each of them at 45 degrees off-axis from any of the X, Y or Z axes. These four raw A-format signals can be Ambisonically processed into a B-format 3-D sound field where the W channel equates the mono-sum signal of all directions and the three X/Y/Z signals represent the directional axes signals. All four B-format signals work together in the VR playback environment. It is monitored in ever-changing proportions, chasing the consumer’s rotating and tilting head movements and the video/animation via their HMDs such as Facebook’s Oculus Rift or Samsung’s Oculus-based Gear VR.
The intriguing part for prospective VR content creators is the widespread availability of the processing software, which is often non-proprietary, or freeware, or both. Numerous platforms and programs exist for free, or for very affordable rates, as major players like Facebook and Google are eager to provide the freeware, sell the hardware and then showcase your content, as they clearly see personal VR content to be the coming wave of monetization for the future.
VR signal processing is doing loads of heavy lifting, as optimizing the believability of movement within an audio sound stage is no simple task. The key here is to utilize inter-aural time differences (ITDs, or the slight time differences between arrival at our ears of incoming soundwaves); inter-aural level differences (ILDs, or the different levels of similarly timed waves); the spectral color (frequency/intensity balance of sound arrivals); and account for HRTF (head-related transfer function, which accounts for differences in sound perception as filtered, reflected, diffused and diffracted by our heads) as our consumers merrily bounce their way through their virtual environments.
For associated software to function properly, rather concise sound capture is essential. One can clearly see how microphone diaphragms would have to be free of disturbance from shock, wind or excessive humidity. The capsules would have to be very closely matched for frequency response and sensitivity. And, the capsules would have to be as coincident as possible to minimize phase distortion and maintain solid directionality/imaging.
The traditional leader in this field has been Soundfield, which provides a hardware control unit with its Ambisonic mic systems. This allows level control and monitoring aids, but warrants a high-comparative price with complete systems (including wind protection, mounts, cabling and cartage) at around $6,000. Conversely, Soundfield’s SPS200 mic system is software-based, clocking in at around $3,500, which I reviewed for Pro Audio Review back in 2014. [Read the full review via prosoundnetwork.com/july2017 —Ed.]
Soundfield’s SPS200 mic system is software-based, clocking in at around $3,500. I reviewed it for Pro Audio Review back in 2014.
Meanwhile, Audeze makes its Planar VR mics, with large surface-area rectangular diaphragms capturing big sound from all directions. Its Tetrahedral model sells for $3,995 complete with software.
Audeze’s Tetrahedral Planar VR microphone is priced at $3,995 complete with software.
Newer player Core Sound provides the TetraMic, a system that relies on reference-quality capsules, a no-frills package and proprietary free software for decoding into any format, for around $1,000.
Still hovering at Kickstarter level, Embrace Cinema Gear’s Brahma Mic is an internal mic upgrade for Zoom’s popular H2n portable recorder, which retails at a mere $159 and offers a dual-M/S four-channel output, easily convertible to Ambisonics via third party software apps.
And such apps abound, as the battle to achieve industry-wide norms is pursued, but not necessarily achieved. Facebook has its 360 Spatial Workstation technology to feed its Facebook Spaces VR platform; Google has its Jump platform/suite-of-tools; and Samsung has its Gear VR. And of course multi-channel pioneer Dolby is in this space with the Dolby VR platform aimed at big-budget big-studio creators.
Amongst the legions of smaller creators, often-affordable software enables B-format creation and/or decoding for various end-user formats. Noisemakers, Blue Ripple Sound, VV Audio, Harpex, Two Big Ears’ 3Dception (owned by Facebook, makers of both Cinematic and Game versions) are all notable players in this sphere. Keep in mind that A-format audio can’t be simply panned into a stereo fold-down or directly applied to VR playback; certain adjustments are necessary to optimize raw A-format for consumption and developments in such software at the cutting edge of ambisonic application.
To fully utilize the power of Ambisonics in the mix stage, content creators should have the ability to monitor more than just stereo-fold-downs or surround sound approximations. It should be noted that there is no element of height achievable with a 5.1 system; only larger surround systems, such as 10.2, are available, which matrix two height signals from LF and RF to allow Z-axis representation.
Content creators should also have an Ambisonic monitoring environment. A more realistic and immersive environment can be recreated (at least in a real-world achievable half-sphere) with numerous subwoofers and an array of speakers suspended around and above the sweet spot. A “dual rings of 8” (one mounted low, one mounted high), with four subwoofers in a 16:4 system is a favorite configuration.
Chris Timpson, partner in Aurelia Soundworks, designed his studio as a 19:4 monitoring configuration (featuring three rings of six speakers, with a “voice of God” above, and four subwoofers).
As seen in the 3D rendering of Aurelia Soundworks, studio partner Chris Timpson designed the facility’s studio as a 19:4 configuration (featuring three rings of six full-range monitors, with a “voice of God” above, and four subwoofers). Such an environment is ideal for mixing for sound installations, music features and experimental apps where hearing the “air” and the open space is quite necessary.
For final delivery via HMDs/headphones, mixing on headphones—with the use of very precise binaural fold-down practices—is the ideal methodology. It is the accuracy of the software’s HRTFs processing that defines the potency of the mixdown’s imaging. Load the binaural encoding software with a custom IR (impulse response) that exactly meets your own head’s dimensions and enjoy ideal response. Numerous HRTF standards are available, which reveals the depth of the artistry required for Ambisonic engineers to push their craft forward.
In such a mix environment, the critics of Ambisonics find the limitations of the medium, one might say. That criticism is that exact sound localization and imaging are not as exacting and dramatic as one might desire. To be more precise, we are so used to the hard and harsh results of pan-pot panning that we psychoacoustically desire “extreme” localization to achieve our super-real results (e.g, the demands of pinpoint localization in ultra-dense video game environments). Even though software can offer variable dominance controls to aim at specific sonic sources in the sound field, such software cannot remove the sound source entirely, adding ADR to a combination of nats (natural sounds) and dialogue from Ambisonic capture would be very difficult, or more likely impossible.
In my experiences with surround and stereo broadcast audio (and my very limited newbie-engineer experiences with VR audio), I find the Ambisonic method to be the absolute penultimate when it comes to capturing environments and natural sound (especially of the constant and all-around-you-type found in stadiums, crowds, forests and the such). Further, it is every bit as “steerable” in post production as one might imagine, yet with the addition of localized, mono sources in post, engineers can achieve the modern ideal of realistic immersion, coupled with the ideal conveyance of exact placement and the potential for precise movement of moving a mono source into an ambisonic field.
A perfect example of this can be found in an interactive clip produced by John Hendicott at his Aurelia Soundworks studio, which is a trailer for NBC’s action-series Blindspot. It uses Sennhesier’s Ambeo VR mic (pole-mounted and just outside the thick of the action) along with carefully hidden spot RF lavaliers for close dialogue pickup, along with music and SFX.
This clip is viewable in a compromised VR fashion; thanks to YouTube, viewers can steer the audio and video via mouse during the action, getting pretty close to the complete VR experience for being only web-based. Note that you may require Google’s Chrome browser or at least Internet Explorer; it won’t work on Safari (for Mac-heads like me), as it’s still the Wild West out there and the cowboys fight it out for supremacy.
Also worthy of note, Facebook’s platform allows panning, with second-order Ambisonics, plus a fixed stereo track that folds down to binaural for stereo headphones via iOS, Android, Facebook News Feed, Google Chrome and the Samsung Gear VR headset. This allows a crucial feature: The entirety of the Ambisonic content is panable, whilst the music remains static, maximizing the user experience!
If I know cynical, investment wary and hype-weary audio engineers—and believe me, I do—I know they are not likely eager to invest in bleeding-edge standards, new tools and uncertain markets. Most of us still work in stereo and the few of us who have ventured into surround work are well aware of multi-channel production infrastructure costs and limited markets. That said, this might be a different world that requires mostly software tools to create seemingly endless content to prime and lubricate a myriad of social networks.
Today, much audio excitement lies within 3D films, VR tents, bulky headsets, fun games and a wow factor. Tomorrow, even once the “wow” is gone, its real estate includes virtual tours, virtual visits with the doctor, virtual museum tours, virtual dating and more, plus live feeds from concerts, plays and recitals. Rest assured, where there’s demand for content, there will ultimately be monetization. And to be sure, one day the whole process of creating VR content will be automated, cheap and used by consumers just like Photoshop or Word.
Yet for now, specialized tools, software and considerable expertise are needed to decipher a veritable alphabet-stew of terms, formats and apps for VR content creation. Even so, if you’re still curious about audio capture and processing, looking for a new direction in your work, or are simply a DIY content creator, look into Ambisonics and VR audio. You might just find yourself aiming in a new direction.