How many words are in the Google voice search database?
So let me put it this way. For English, the vocabulary side, the number of different words in our vocabulary is roughly a million, and over time that evolves because, obviously, new words enter the language, new names come along, so on and so forth, so that gets rediscovered from time to time and it gets added, too. Then, those words can be put together in any imaginable order, and for any length word string. So you might come up with a 10-word query, picking randomly from those million words, so it turns out to be an astronomically large number. However, by using this kind of statistical language model I just mentioned, and training it on lots and lots of queries, hundreds of billions of queries, we end up with reasonable predictive power about what's likely.