Thursday, June 1, 2023

A look at startups like WellSaid Labs and VocaliD, which are building custom AI voice actors for digital assistants, video game characters, and corporate videos: Look Labs Vocalid Mit Technologyreview

Look Labs Vocalid Mit Technologyreview: In the past few years, artificial intelligence has revolutionized multiple industries and we’re now seeing it creep into every facet of our lives. From understanding our shopping preferences to translating documents faster than a human translator, AI is changing the world.

And while many of us are so focused on these new technologies that we forget about how they impact us, there’s one area that AI is starting to heavily infiltrate: voice acting.

Today, I’m going to take a look at two startups that are using AI to build voice-over talent for voice assistants, video characters, and corporate videos.

WellSaid Labs uses AI to develop custom voice-over talent for corporate videos and commercials. Its initial product is a service where customers provide scripts, the company provides them back with an AI-generated voice actor that reads the script in natural sounding English. So far, the company has built 1,500 voice actors for their customers.

VocaliD is a Y Combinator-backed startup that builds custom voice-over talent for digital assistants like Apple’s Siri or Amazon’s Alexa. Their goal is to allow companies to build their own voices for digital assistants and includes character voices, celebrity voices and even those that sound like yourself.

The idea behind both of these startups is to use AI to automate a process that is typically performed by real people.

Let’s take a look at how companies do this now and explore what potential AI brings to the table.

How Companies Create Voice-Over Talent Today

Companies like Apple, Google, and Amazon provide the digital assistants that are used in multiple devices in our lives: Siri on iPhone, Alexa on Amazon Echo and Dot, Google Assistant on Google Home. In addition, software platforms like Unity and Unreal have created easy-to-use tools for developers to create their own digital assistant for games. Meanwhile, non-gaming companies are also stepping into the voice-over space by building custom voices for mobile assistants like those made by Invoke and Nuance.

All of these companies have access to voice actors that can provide their customers with high quality audio recordings. However, there are a couple of limitations with this approach.

In order to provide such high quality recordings for every request, companies need to have a large number of voice actors on hand. It’s also expensive because these voice actors have to be paid salaries and have staff in place that takes care of the recording process from start to finish.

These limitations get increasingly harder to overcome as people start to use voice assistants more and more. For example, there are now countless people that have talked with their digital assistant at home. This means fewer and fewer people are producing these recordings because it requires a physical presence.

The demand for such high quality recordings is also increasing and this is causing the cost of maintaining such a workforce to increase substantially.

It’s clear that these limitations are slowing down the pace of innovation in the voice-over industry and limiting its growth.

The Solution: AI

The solution for these limitations is to automate this process. The aim here is to build a system that can generate high-quality recordings automatically without any human involvement. In other words, it’s trying to take this process away from humans and put it under AI control.

The first issue companies face with this is convincing developers to use it. This is a problem that WellSaid and VocaliD are working to solve by creating more visible value for developers and their users in exchange for relying on AI voice-over talent.

Another issue is that today, AI voice actors don’t sound human. The reason for this is simple: creating a human sounding actor requires recording in person which requires a large number of actors and also a lot of time.