How do you take into account accents and dialects when designing speech recognizers?
One of the fundamental things, given the kind of data-driven approach that we take, is we try to have very large, broad training sets. We have large amounts of data coming in from all kinds of people with all kinds of accents, saying all kinds of things, and so on and so forth, and the most important thing is to have good coverage in your training set of whatever is coming in. We have enough instances of Brooklyn accents -- and not just thanks to me -- but we have people from Brooklyn that have spoken to our systems such that we do a good job when people with Brooklyn accents talk to our system.
On the other hand, if somebody came along and had very peculiar and unusual ways of pronouncing things that was not well-covered in our data, we'd have more trouble recognizing them.
Sometimes pronunciations are radically different enough, let's say in U.K. English versus U.S. English, we may build a separate model, or a partially blended model, or whatever. That's sort of an area of research. When should we build separate models versus combine everything into one big model, or any compromise in between? That variation is one of the big challenges, one of a number of big challenges in the field that makes it more difficult. Having good training sets is one of the ways that we deal with that, when there's training sets that have broad coverage of all those things that happen.