Like HowStuffWorks on Facebook!

Expert Stuff: Google's Mike Cohen

What's the difference between a computational linguist and a speech technologist?

Wow. That's a good question, because the boundaries really have blurred. I mean, these days, we all work side by side and do similar things. Twenty or 30 years ago, there were sort of two camps. There were linguists that were trying to build speech recognizers by explicitly programming up knowledge about the structure of language, and then there were engineers who came along and said, "Language is so complex, nobody understands it well enough, and there's just too much there to ever be able to explicitly program it, so instead, we'll build these big statistical models, feed them data, and the let them learn." For a while, the engineers were winning, but nobody was doing a great job.

So more recently, like, in the last 25 years, those communities came together and we learned certain things from the linguists about the structure of speech, like the fact that I mentioned earlier, which is the production of any particular phoneme is very influenced by the phonemes that surround it. Linguists have been publishing on that, calling it co-articulation, for years. Finally, the statisticians or engineers took that to heart and built models that are context dependent so that they can learn and add a separate model for "ah" as it occurs following an "mm" versus a "duh," and so on and so forth.

Those communities really came together, and so -- maybe I've thrown these terms around too loosely referring to speech technologists versus computational linguists. We all work on the boundary of trying to understand language, the structure of language, trying to develop algorithms, machine learning style algorithms where we figure out how do we come up with a better model that can better capture the structure of speech, and then have an algorithm such that we feed that model lots and lots of data, and the model both changes its structure and alerts its internal parameters to become a better, richer model of language, given the data that's being fed to it.