There are 2.5 million people in the U.S. with severe speech disorders, and some 40 percent of them use speech devices to express themselves. But those devices offer an extremely limited selection of computerized voices.
“We wouldn’t dream of fitting a little girl with the limb of a grown man — so, why then, a prosthetic voice?” asked Northeastern University professor Rupal Patel in a talk at the TEDWomen conference in San Francisco last week.
Over the past six years, Patel has developed a process for morphing voices in which she combines samples of a patient’s speech — however limited it may be — with the voice of a donor who shares similar age, gender, size and location.
From the recipient, she extracts pitch, loudness and sibilance. As little as a single vowel may be enough. From the donor, she records a list of hundreds of utterances, which in the lab can be broken down to individual phonemes.
The synthesized result is a reverse-engineered voice that approximates what a person might sound like if he or she weren’t limited by speech disorders.
So far, this process has been completed three times. The results have been emotionally impactful, Patel said.
When a 9-year-old patient named William first received his prosthetic voice, his response was, “Never heard me before.”
“Imagine carrying around someone else’s voice for nine years, and then finding your own,” Patel said.
In an interview after her talk, Patel said she is currently deciding whether to turn her research into a for-profit or nonprofit endeavor, which will likely be called VocaliD (said “vocality”). She has set up a site to solicit voice donors and recipients at VocaliD.org.
After years of refining and tweaking the process, Patel said that she and her team realized it was time to get the project out the door. “It’s actually relatively easy to make a voice. It takes literally a few minutes, so it should be scalable,” she said.
The VocaliD software currently only runs on Windows, Patel noted, so she would like to port it to iOS and Android to make it more accessible. For the time being, recipients will likely still need to buy specialized devices — they are called augmentative and alternative communication systems — to output their new voices. They can cost $8,000 or more.
Coincidentally, Patel is quite familiar with the process of turning original research methods on language into startups.
In 2005, she and Deb Roy — a professor at MIT — had their first child, and the couple planted video cameras all over their home to record his first two years so they could study child language development. Dealing with massive quantities of conversational data led Roy to found Bluefin Labs to make a business around analyzing social media conversations about television. Twitter bought the company for about $90 million earlier this year.
Roy also did a TED Talk about the experience; that’s embedded here. Patel’s talk has not yet been posted online.
Article from AllThingsDigital