AI voice actors are more humane than ever and are willing to hire

[ad_1]
The company’s blog post is excited about the cheap US commercial of the 90s. WellSaid Labs has what customers can expect from “eight new digital voice actors!” Tobin is “vigorous and wise”. Paige is “gratifying and expressive”. Ava is “polished, confident and professional.”
Each is based on a real voice actor, whose appearance (with permission) has been saved using AI. Companies can license these voices to say what they need. They add some text to the voice engine and create a crisp audio clip of a natural sound performance.
WellSaid Labs, A Seattle-based startup that is the latest company to offer customers an AI voice based on research from the nonprofit Allen Institute of Artificial Intelligence. For now, e-learning specializes in voices for corporate videos. Other startups make their voices heard digital assistants, call center operators, and even video game characters.
Not so long ago, such deep voices had a bad reputation for being used there fraudulent calls and Internet scams. But better and better quality has sparked the interest of more and more companies. Recent advances in in-depth studies have made it possible to repeat many of the subtleties of human speech. These voices pause and breathe in the right places. They can change style or emotion. You can spot the trick if they talk too much time, but in short audio clips, some are different from humans.
AI voices are also cheap, scalable, and easy to practice. Unlike in the recording of a human voice actor, synthetic voices can update the script in real time, opening up new possibilities for personalizing advertising.
But the rise of hyperrealistic false voices has no effect. Human voice actors, in particular, have been asked what it means for their lives.
How to falsify the voice
Synthetic voices have been around for a long time. But old, including the voices of the original Siri and Alexasimply words and sounds stuck together to achieve a robotic and clumsy effect. Getting them to be more natural was tedious manual labor.
Deep study changed that. Voice developers do not need to determine the exact rhythm, pronunciation, or intonation of the resulting speech. Instead, they could incorporate a few hours of audio into an algorithm and the algorithm can learn these patterns on its own.
“If I’m a Pizza Hut, I certainly can’t be like Domino, and I can’t be like Papa John.”
Rupal Patel, founder and CEO of VocaliD
Over the years, researchers have used this basic idea to build increasingly sophisticated voice engines. The ones built by WellSaid Labs, for example, use the core models of deep learning. In the first, from a passage of text, a speaker predicts the wide range of tones of sound — including accent, tone, and timbre. The second is filled with details, including the echo of the breath and voice in the environment.
It takes more than just pressing a button to make a convincing synthetic voice. Part of what makes the human voice so human is inconsistency, expressiveness, and the ability to convey the same lines in completely different styles, depending on the context.
Capturing these nuances involves finding the right voice actors to provide the right training data and adjust in-depth learning models. WellSaid says the process takes at least an hour or two of audio and a few weeks to create a replicated synthetic playback.
AI’s voices are particularly popular among brands that want to maintain a consistent sound in millions of interactions with customers. With the ubiquity of smart speakers and the rise of automated customer service agents and the rise of digital assistants embedded in cars and smart devices, brands will have to produce one hundred hours of audio a month. But they no longer want to use text as the generic voices offered by traditional voice technologies — a trend that accelerated during the pandemic as more and more customers skipped out-of-store interactions to have almost no contact with companies.
“If I’m a Pizza Hut, I certainly can’t be like Domino, and I can’t be like Papa John’s,” says Rupal Patel, a professor at Northeastern University and founder and director of VocaliD, who promises to build custom voices that match the company’s brand identity. “These brands have thought of their colors. They thought of their fonts. Now they need to start thinking about how to hear their voices. “
While companies had to hire different voice actors for different markets — in the Northeastern United States or Mexico in France — some voice AI companies could manipulate the accent or change the language of the single voice in different ways. This makes it possible for the ad to adapt to streaming platforms, changing not only the characteristics of the voice, but also the words spoken. A beer ad can tell a listener to stop at another pub, such as whether it’s playing in New York or Toronto. Resemble.ai, which designs voices for ads and smart contributors, says it is already working with customers to launch custom audio ads on Spotify and Pandora.
The gaming and entertainment industry also sees benefits. Sonantic, a company that specializes in emotional voices that can laugh and cry or whisper and shout, works with video game makers and animation studios to provide the voice of their characters. Many clients use synthesized voices only in pre-production and switch to real voice actors for the final production. But Sonantic says a few have started using it during the process, perhaps for characters with fewer lines. Resemble.ai and others have also worked with film and television shows to fix actors ’performances when words are confused or mispronounced.
[ad_2]
Source link