San Francisco, CA (September 23, 2019)—Podcast producers and media editors have a new option when editing audio and video content with Descript Podcast Studio, a multitrack podcast production studio that integrates with Overdub, an AI-generated text-to-speech feature that models the speaker’s voice.
Descript’s platform is intended to simplify the podcast production process, transcribing recorded voices — Descript also offers standalone transcription services — then allowing the media to be edited from the text. The functionality works similarly to Google Docs as a cloud-based, synchronized multi-user online collaboration tool. Any collaborator can edit the text to remove a word, a phrase or a sentence and Descript edits it out of the audio timeline, while also applying automatic crossfades.
Podcast Studio also includes a non-destructive editor with features familiar to anyone who has used a DAW, such as multitrack editing, crossfades and volume automation. The finished timeline may be exported to various platforms, including Pro Tools, Reaper and Adobe Audition. The platform can also edit video through text editing with export to Final Cut Pro and Adobe Premiere. The Descript solution was developed with input from beta testers including the NPR teams working on podcasts such as Rough Translation and Planet Money.
Earlier this year, Descript acquired Lyrebird, a Montreal-based startup founded two years ago by four PhD students from MILA (Montreal Institute for Learning Algorithms), which set itself the mission of creating realistic voices using AI. Using just a few minutes of an uploaded vocal sample Lyrebird can recognize what the company calls the DNA of a voice. With Overdub, which is currently still in closed beta, a Descript user can add text into a transcription of a voice recording and the algorithm will synthesize the user’s voice and insert that text into the audio timeline. An example offered by the company is correcting a number, “twenty,” in the recording to “thirty-five.”
Descript stresses that its Overdub technology may only be used for the user’s own voice. “We built this feature to save you the tedium of re-recording/splicing time every time you make an editorial change, not as a way make deep fakes,” writes Descript founder Andrew Mason, former founder and CEO of Groupon, in an announcement on Medium. According to an ethics statement on the developer’s website, “Descript uses a process for training speech models that depends on real-time verbal feedback, ensuring that individuals can only create a text-to-speech model of their own voice.”
Descript is free for up to three hours of voice content. More than that and it costs $10 per month.
Descript • www.descript.com