So we're building an acoustic model for U.S. English, and we have model for "ah," and "uh," and "buh," and "tuh," and "mm," and "nn" and so on and so forth for all of the basic sounds of the language. Actually, it's a little bit more complicated than that because it turns out -- take the "aa" sound in English. The "aa" in the word "math," versus "aa" in the word "tap." They produce something differently, and they sound a bit differently, and so we actually need different models for the "aa" sound, whether it's following an M versus following a T. The production of those fundamental sounds or phonemes varies depending on their context.
We have many, many models for the "aa" sound, and it's a different model if the predecessor is "mm" versus "tuh," for example. So that's the first piece of the model, the acoustic model, the model that all of the fundamental sounds are given their context.