A new technology developed by the Princeton University edits human voices in audio-recordings just like editing a written word in word-processing software.
The software, called VoCo, enables the easy replacement or addition of a word in an audio recording of a human voice by editing the recording’s transcript. It automatically synthesizes new words in the speaker's voice even when they are not available anywhere in the recording.
VoCo's user interface appears like that of any other renowned music editing program. It offers waveform audio track visualization, and editing tools for cutting, copying and pasting. However, unlike other software, VoCo elevates the waveform with the track’s text transcript, and permits the user to insert or replace words by simply typing in the transcript. VoCo then updates the audio track with the new word by automatically synthesizing the word by tailoring together snippets of audio from any part of the narration.
The system utilizes an advanced algorithm to learn and reproduce a specific voice’s sound. In the future, this could make easier the process of editing narration and podcast videos. This technology could also act as a launching point for producing natural-sounding personalized robotic voices.