Google Equipment

Back in 1998, Google's equipment was relatively modest. Co-founders Larry Page and Sergey Brin used Stanford equipment and donated machines to run Google's search engine duties. The equipment at that time included:

  • Two 300 megahertz (MHz) Dual Pentium II servers with 512 megabytes (MB) of memory
  • A four-processor F50 IBM RS6000 computer with 512 MB of memory
  • A dual-processor Sun Ultra II computer with 256 MB of memory
  • Several hard drives (some of which were housed in a box covered in LEGO bricks) ranging from four to nine gigabytes (GB) for a total of more than 350 GB of storage space [source: Google Stanford Hardware]

Today, Google uses thousands of servers to provide services to its users. Google's strategy is to use relatively inexpensive machines running on a customized operating system based on Linux. A program called Google File System manages the data on Google's servers [source: Google Cluster Architecture].

You Got Served
How many servers does Google have? The company is quiet about the subject, but estimates range from 200,000 to more than 450,000 machines.
Google uses servers for different tasks. Google's Web servers receive and process user queries, sending the request on to the next appropriate server. Index servers store Google's indexes and search results. Google uses document servers to store search summaries, user information, gmail and Google Docs files. Ad servers store the
advertisements Google displays on search pages.

Google divides the information on each index server into 64 MB blocks. There are three copies of each block of data, and each copy is stored on a different server running on a separate power strip. The blocks of data are distributed semi-randomly so that no two servers have the exact same collection of data blocks. That way, if there's a problem with one server, the data will still exist in other machines. Using multiple copies of data to prevent an interruption in service is called redundancy.

A master computer manages each set of servers. The master computer's job is to keep track of which servers hold each block of data in the event of a catastrophe. If one server goes down, the master computer redirects all traffic to the other servers containing the same data.

Google and Bandwidth
Some webmasters feel that Google's spiders consume too much bandwidth per month. When a spider follows a link to a Web page, it uses up bandwidth. Most Web hosting services charge webmasters for bandwidth consumption. If the webmaster feels that Google's spiders are a liability, he or she can create a robot.txt file in the root directory of the Web page that will tell the spiders to ignore the site.

In the next section, we'll learn more about Google's corporate culture.