Professor Carol Espy-Wilson
Institute for Systems Research
Department of Electrical and Computer Engineering
The ability to talk to computers like we talk to each other would revolutionize human-computer interaction. Speech not only provides for a fast mode of communication and eliminates the need for physical interaction with the computer, but it opens the gate to other revolutionary technologies like universal language translation, software that recognizes and understands phrases in one language and translates it appropriately to any other language, thereby enabling effective communication between people from different cultures (listed as the top emerging technology in MIT Technology Review, Jan. 2004). In fact, Google just announced that it is working to put speech-to-speech translation on mobile phones in the next few years. Such a breakthrough will require considerable advancements in speech recognition technology, which generates a string of words or sounds from the speech signal, and natural language processing technology, which gleans meaning from the output of the recognizer. This talk will focus on research in the Speech Communication Group that is addressing the problems in current speech recognition technology: (1) understanding how the many differences that exists across speakers (physiological, psychological, behavioral, cultural and regional) are manifest in the speech signal and developing methods to deal with the variability and (2) making speech recognition work in practical environments. Our approach involves the integration of knowledge from engineering, speech production, speech perception, acoustic phonetics, speech science, linguistics and neuroscience. Specifically, the talk will describe our research into the relationship between the speech production system and the physical characteristics of the speech signal, the development of a speech signal representation that captures the linguistic information in the speech signal and diminishes speaker characteristics (for use in speaker-independent speech recognition), the development of a speech signal representation that emphasizes speaker differences (for use in speaker recognition), and new paradigms for speech recognition and speaker recognition.