Hidden in Plain Site
The deep Web is enormous in comparison to the surface Web. Today's Web has more than 555 million registered domains. Each of those domains can have dozens, hundreds or even thousands of sub-pages, many of which aren't cataloged, and thus fall into the category of deep Web.
Although nobody really knows for sure, the deep Web may be 400 to 500 times bigger that the surface Web [source: BrightPlanet]. And both the surface and deep Web grow bigger and bigger every day.
To understand why so much information is out of sight of search engines, it helps to have a bit of background on searching technologies. You can read all about it with How Internet Search Engines Work, but we'll give you a quick rundown here.
Search engines generally create an index of data by finding information that's stored on Web sites and other online resources. This process means using automated spiders or crawlers, which locate domains and then follow hyperlinks to other domains, like an arachnid following the silky tendrils of a web, in a sense creating a sprawling map of the Web.
This index or map is your key to finding specific data that's relevant to your needs. Each time you enter a keyword search, results appear almost instantly thanks to that index. Without it, the search engine would literally have to start searching billions of pages from scratch every time someone wanted information, a process that would be both unwieldy and exasperating.
But search engines can't see data stored to the deep Web. There are data incompatibilities and technical hurdles that complicate indexing efforts. There are private Web sites that require login passwords before you can access the contents. Crawlers can't penetrate data that requires keyword searches on a single, specific Web site. There are timed-access sites that no longer allow public views once a certain time limit has passed.
All of those challenges, and a whole lot of others, make data much harder for search engines to find and index. Keep reading to see more about what separates the surface and deep Web.