How the Google File System Works

Google File System Basics

Google developers routinely deal with large files that can be difficult to manipulate using a traditional computer file system. The size of the files drove many of the decisions programmers had to make for the GFS's design. Another big concern was scalability, which refers to the ease of adding capacity to the system. A system is scalable if it's easy to increase the system's capacity. The system's performance shouldn't suffer as it grows. Google requires a very large network of computers to handle all of its files, so scalability is a top concern.

Because the network is so huge, monitoring and maintaining it is a challenging task. While developing the GFS, programmers decided to automate as much of the administrative duties required to keep the system running as possible. This is a key principle of autonomic computing, a concept in which computers are able to diagnose problems and solve them in real time without the need for human intervention. The challenge for the GFS team was to not only create an automatic monitoring system, but also to design it so that it could work across a huge network of computers.

The key to the team's designs was the concept of simplification. They came to the conclusion that as systems grow more complex, problems arise more often. A simple approach is easier to control, even when the scale of the system is huge.

Based on that philosophy, the GFS team decided that users would have access to basic file commands. These include commands like open, create, read, write and close files. The team also included a couple of specialized commands: append and snapshot. They created the specialized commands based on Google's needs. Append allows clients to add information to an existing file without overwriting previously written data. Snapshot is a command that creates quick copy of a computer's contents.

Files on the GFS tend to be very large, usually in the multi-gigabyte (GB) range. Accessing and manipulating files that large would take up a lot of the network's bandwidth. Bandwidth is the capacity of a system to move data from one location to another. The GFS addresses this problem by breaking files up into chunks of 64 megabytes (MB) each. Every chunk receives a unique 64-bit identification number called a chunk handle. While the GFS can process smaller files, its developers didn't optimize the system for those kinds of tasks.

By requiring all the file chunks to be the same size, the GFS simplifies resource application. It's easy to see which computers in the system are near capacity and which are underused. It's also easy to port chunks from one resource to another to balance the workload across the system.

What's the actual design for the GFS? Keep reading to find out.