Synthetic speech can be a fearful object these days when paired with deepfakes and other AI deceptions, but it’s also an indispensable tool for anyone who can no longer speak on their own. Acapela Group has these folks squarely in mind with its newservice, which lets anyone for free.
Acapela has been in the text-to-speech space for around 25 years, and was recently acquired by tech accessibility giant Tobii Dynavox, though they still operate independently.
Like many industries, accessibility has been heavily influenced by the advent of consumer-scale machine learning processes. 7 or 8 years ago, recalled Acapela co-founder Remy Cadic, it was not just tedious to customize a synthetic voice for yourself, but the results weren’t particularly good.
“It was very time consuming — the patient had to train for 8 hours. Now we can bank a voice with just 50 sentences recorded; it takes about 10 minutes and the voice is ready the next day,” he said. “There’s definitely a revolution going on with neural text-to-speech techniques.”
They weren’t kidding about how quick and easy it is: I went through the new “my own voice” process myself, and it really was just 50 short sentences, drawn from a (random, it seemed) corpus of novels, recipe books, and articles. The recording interface was simple and easy to navigate, and sure enough, a day or so later my voice was ready to use. The quality is fine — not uncanny like some models out there can be, but clearly my own voice (as advertised) and able to handle any sentence I threw at it in the demo page.
Now that it’s there, if I ever need it I can go and download it for a fee to use on any compatible speech generation system. Obviously this includes Tobii Dynavox’s TD Talk and devices; the company, in fact — these things are getting pretty sleek.
And that’s the real point of all this — it’s not a technical demonstration of the power of neural voice tech or a demo that lets anyone feed it a celebrity voice to clone. It’s a tool made specifically for people who until recently may have had no options or at best a difficult, complex process if they wanted to preserve their voice.
Many facing degenerative conditions, cancers, or certain procedures know that within a few months or years they may not be able to speak well or at all any more. Making the process of banking their voice as easy as possible is a service many will appreciate.
“One big advantage is we also customize for children — we’ve made the recording script easier to read, and tuned the system to make the quality of children’s synthetic voices better. We were the first in the world to do that, and we’re still going in this direction,” said Cadic.
Being able to record and re-record or artificially age the banked voice is a new and challenging capability, but one that seems to be getting results:
The compatibility with offline devices that don’t have the latest neural processing chip is a key differentiator as well. “There are online solutions where it’s easy to create a voice, but it’s only available via the cloud, and that’s just not practical,” he said.
The company has also found that diversity and thoughtfulness in the training process is as important as in other AI applications. An issue Cadic pointed out with some super-fast training techniques is that “it will pretty much just try to find the speaker in the training material that’s closest to the user. But if there isn’t a speaker in the training close to the original voice, it just won’t sound like it.”
Acapela product manager Nicolas Mazars added that, like many AI problems with their root in insufficient training data, this one is not evenly distributed: “That process works well for the average 50-year-old white guy, but not if you’re an African-American man, or you don’t speak English well. We work in 23 languages, and have many users who have disabilities. We try to rely on user feedback and develop something for them, by them.”
The recording and banking process is free;and be training your own synthetic voice in minutes. You only pay if you want to download and install it on a device.