November 17, 2019
"Speech2Braille is a wearable device that transcribes auditory information (like speech) into a non-visual and non-auditory output." - Jacky Zhao
Interview
Q: What did you work on?
A: My Speech2Braille project holds a special place in my heart. I was first inspired to work on this project when I met a woman who is deaf, Beth, on the bus. After telling me her struggles, I was inspired to create a solution to help her. She stated that she often had trouble reading lips, couldn’t use a hearing aid, and cochlear implants cost too much. After further research at local hearing clinics and hospitals, it became apparent that the problem was not a unique case. I focused on developing a wearable device that would be able to transcribe auditory information (like speech) into a non-visual and non-auditory output. Two main components had to be developed for this device to work: 1) a neural network for understanding speech and 2) a tactile braille display for the user to understand the output. In prototyping, cost had to be kept in mind in order to keep the device accessible.
Q: How did you go about doing that?
A: In essence, what I needed to create was an algorithm to transcribe speech to text. Unfortunately, it turns out that this kind of task is really difficult for computers. Speech is extremely variable -- different people speak in different ways. Artificial neural networks would allow us to tackle a lot of these problems my mimicking a biological neural network, allowing it to 'learn' how to deal with issues like variation in accents, pauses, tone, and volume.
After a lot of research, trial, and error, I decided on using a type of artificial neural network called a Long Short-term memory (LSTM) network. This type of network is really good at operating on time-series data as it has the ability to 'remember' data. As with any type of neural network, we also need a cost function -- a function to tell the network what it did wrong. Here, I used something called Connectionist Temporal Classification (CTC), which is a cost function that excels in tasks like speech recognition, where timing is variable. They output a probability distribution over all the possible labels, which means that slight changes in timing matters less than the actual content of the predictions.
Another important part of neural networks is the optimizer. This algorithm takes the output of the cost function and modifies the neural network so that it improves over time. Here, I used RMSProp, a popular variant of the Adagrad optimizer which adapts the learning rate of the network based on how frequent parameters are being updated.
Last but not least, all neural networks require data on which they learn. For this model, I used the LibriSpeech corpus, which is a speech dataset where a bunch of people read excerpts from books.
Q: How effective was the model for your dataset?
A: After training, the model attains a final word error rate of 74.88% on the training set and 71.50% on the dataset, which is within 92% of the state-of-the-art word error rate on the dataset!
Q: Why is this work important to you?
A: This was my first-time working on a project for such an extended period of time and at such an advanced level, and as a result, challenged my patience, grit, and time management. However, I’ve also learned a lot in terms of my passions and how I tackle problems. When I shared my first prototype with Beth, I saw the excitement in her eyes as she felt the dots on her skin. Seeing the impact that I created is what brings meaning to my work.
Interviewee Bio:
Jacky Zhao is a first-year student at The University of British Columbia. He is interested in solving problems in the world with tech, especially in regards to social impact and improving the lives of others. He has done projects related to helping provide financial tools to the unbanked, and have worked on projects related to improving drone use in search and rescue.
You can learn more about Jacky's project here.
Thank you for sharing your awesome project with us, Jacky! If you are currently a student and also doing research in the field of AI, reach out to me at charmaine.lee@aiforanyone.org and let’s chat! We would love to feature your work.
This blog post was written to accompany our All About AI newsletter, our newsletter containing news from the world of artificial intelligence, as well as research papers to help you learn more about AI. Subscribe here for more content like this!