Expert Stuff: Google's Mike Cohen

What are grammars?

Yeah, that word has been used loosely, and it has meant a couple different things over time. In the most general sense, you could think of it as a description of what we might expect in terms of what word strings can happen. In some systems, and this was very true for a lot of call-center systems, we would have a reasonably good idea of what people were pretty likely to say, right? You have a system that is a menu, do you want A, B, or C? You might expect most people will say either "A," "B," or "C," or they might say, "I want A" or "B please," or things like that, things that because of the application were fairly predictable.

But there were languages by which people could specify "here are the rules or the set of strings that people might say in this particular context." That would be a case where the recognizer was very limited. It would only recognize a certain number of variations in how you might say things. Let's say, "do you want your account balance or to make a transfer?" It's not like people will mimic exactly those words, but it's reasonably predictable, so somebody with experience, and after listening to some of the data, could have a reasonable chance of writing an explicit grammar that said, "Here are 50 variations in how people might make that two-way choice."

Whereas, as you get to more difficult applications like, for example, voice search, it's way more difficult to predict all of those different strings of words that people might utter. So instead, the grammar becomes what's called a statistical grammar, or what we often call a statistical language model. That would be something more in the form of, given the last two words were A, B, here are the probabilities across all of the words in my language of what might happen next.