Google says little about the hardware it currently uses to run the GFS other than it's a collection of off-the-shelf, cheap Linux servers. But in an official GFS report, Google revealed the specifications of the equipment it used to run some benchmarking tests on GFS performance. While the test equipment might not be a true representation of the current GFS hardware, it gives you an idea of the sort of computers Google uses to handle the massive amounts of data it stores and manipulates.
The test equipment included one master server, two master replicas, 16 clients and 16 chunkservers. All of them used the same hardware with the same specifications, and they all ran on Linux operating systems. Each had dual 1.4 gigahertz Pentium III processors, 2 GB of memory and two 80 GB hard drives. In comparison, several vendors currently offer consumer PCs that are more than twice as powerful as the servers Google used in its tests. Google developers proved that the GFS could work efficiently using modest equipment.
The network connecting the machines together consisted of a 100 megabytes-per-second (Mbps) full-duplex Ethernet connection and two Hewlett Packard 2524 network switches. The GFS developers connected the 16 client machines to one switch and the other 19 machines to another switch. They linked the two switches together with a one gigabyte-per-second (Gbps) connection.
By lagging behind the leading edge of hardware technology, Google can purchase equipment and components at bargain prices. The structure of the GFS is such that it's easy to add more machines at any time. If a cluster begins to approach full capacity, Google can add more cheap hardware to the system and rebalance the workload. If a master server's memory is overtaxed, Google can upgrade the master server with more memory. The system is truly scalable.
How did Google decide to use this system? Some credit Google's hiring policy. Google has a reputation for hiring computer science majors right out of graduate school and giving them the resources and space they need to experiment with systems like the GFS. Others say it comes from a "do what you can with what you have" mentality that many computer system developers (including Google's founders) seem to possess. In the end, Google probably chose the GFS because it's geared to handle the kinds of processes that help the company pursue its stated goal of organizing the world's information.
To learn more about computer systems and related topics, take a look at the links below.
- How Google Works
- How Gmail Works
- How Google Earth Works
- How Google Docs Works
- How Google Calendar Works
- Why is the Google algorithm so important?
- How Shared Computing Works
- How Cloud Computing Works
- How Cloud Storage Works
- How Data Integration Works
- How Internet Infrastructure Works
- How Microprocessors Work
- How Home Networking Works
- How Web Servers Work
More Great Links
- Ghemawat, Sanjay, Gobioff, Howard and Leung, Shun-Tak. "The Google File System." Google. 2003.
- Harris, Robin. "Google File System Evaluation." StorageMojo. June 13, 2006. http://storagemojo.com/?page_id=152