The Anatomy of a Cloud
Google's approach to cloud computing may seem perplexing at first. You might think a huge corporation worth billions of dollars would have data centers packed with state of the art, high-tech servers and machines that go ping. Wouldn't Google executives want the best equipment?
But Google's approach is more pragmatic. The company purchases mid-range servers for its data centers. The company has a good reason for this approach. Should something break, it's relatively easy and inexpensive to get a replacement. Repair and maintenance can be huge costs for a data center -- each building may house thousands of machines. To ensure services remain online, Google dedicates several servers to provide the same function. That way, should one server malfunction, another can take its place with a minimal interruption in services. It builds redundancy into the system.
Google's philosophy is to keep the back end system as simple as possible. As systems become more complex, the opportunity for problems to arise increases. Simplifying a system reduces the chance for problems even if the system itself is enormous. The Google cloud's foundation is the Google File System. This is a distributed computing system that handles information requests through basic file commands like open, read and write.
The entire file system consists of networks called clusters. The Google File System relies on master servers to coordinate data requests -- each cluster has a single master server. When you interact with information stored on the cloud, your actions translate into data requests. A request may be something simple, like viewing a file, or may involve more complex actions, such as formatting or writing new data. Your computer acts as a client -- a machine that sends data requests to other machines. Ultimately, a master server takes the request and sends a message to the Google machine that houses the data -- Google calls these machines chunkservers. The chunkserver sends the data directly to the client -- the information never passes through the master server.
Because Google stores several copies of each piece of information for the sake of redundancy, making changes to data in the cloud is a little complicated. First, your write request goes to a master server. The master server chooses one chunkserver storing the appropriate data to respond to your request -- this becomes the primary replica chunkserver. The master server tells the client the location of all replica chunkservers storing your file. When you make changes, those changes go to the first replica chunkserver to which your computer can connect. The write request moves through the system to all the replica chunkservers, including the primary replica. The primary replica makes the actual change to the data and then sends a message to all other replica chunkservers to do the same. Once the primary replica receives confirmation that all copies of the data have changed, it sends a notification to the client.
Now that we have the technical details out of the way, let's take a look at some of the things you can do with the Google cloud.