- This event has passed.
Two Ideas for Improving Automatic Speech Recognition: One Elegant, and One Very Useful! – talk by Sanjeev Khudanpur, John-Hopkins University
December 29, 2019 @ 12:00 pm - 1:00 pm IST
Sanjeev Khudanpur from John-Hopkins University
Two Ideas for Improving Automatic Speech Recognition: One Elegant, and One Very Useful!
The Kaldi tools for automatic speech recognition (ASR) are being widely used both for research and for large-scale deployments. Many innovations large and small have gone into keeping Kaldi up-to-date in this fact-moving field. I will describe how two those innovations went from conceptualization to execution and evaluation. One was to use adversarial examples to improve training of deep neural networks. The other was GPU acceleration of the inference engine (i.e. Viterbi decoder). They illustrate two different metrics of success. The adversarial training solution turns out to be very elegant: it can be viewed as either correcting for sample bias in the gradient estimate in SGD, or as an application of the leave-one-out estimate. The GPU acceleration solution turns out to be engineering of immense practical significance: while the neural network computations are addressed by generic GPU developers, the Viterbi search, which accounts for half the decoding time, requires exploiting graph structures in ASR that specify the necessary computation. Time permitting, I will outline other ongoing research threads in Kaldi, such as optical character recognition and speaker recognition.