Apart from the basic services the GFS provides, there are a few special functions that help keep the system running smoothly. While designing the system, the GFS developers knew that certain issues were bound to pop up based upon the system's architecture. They chose to use cheap hardware, which made building a large system a cost-effective process. It also meant that the individual computers in the system wouldn't always be reliable. The cheap price tag went hand-in-hand with computers that have a tendency to fail.
The GFS developers built functions into the system to compensate for the inherent unreliability of individual components. Those functions include master and chunk replication, a streamlined recovery process, rebalancing, stale replica detection, garbage removal and checksumming.
While there's only one active master server per GFS cluster, copies of the master server exist on other machines. Some copies, called shadow masters, provide limited services even when the primary master server is active. Those services are limited to read requests, since those requests don't alter data in any way. The shadow master servers always lag a little behind the primary master server, but it's usually only a matter of fractions of a second. The master server replicas maintain contact with the primary master server, monitoring the operation log and polling chunkservers to keep track of data. If the primary master server fails and cannot restart, a secondary master server can take its place.
The GFS replicates chunks to ensure that data is available even if hardware fails. It stores replicas on different machines across different racks. That way, if an entire rack were to fail, the data would still exist in an accessible format on another machine. The GFS uses the unique chunk identifier to verify that each replica is valid. If one of the replica's handles doesn't match the chunk handle, the master server creates a new replica and assigns it to a chunkserver.
The master server also monitors the cluster as a whole and periodically rebalances the workload by shifting chunks from one chunkserver to another. All chunkservers run at near capacity, but never at full capacity. The master server also monitors chunks and verifies that each replica is current. If a replica doesn't match the chunk's identification number, the master server designates it as a stale replica. The stale replica becomes garbage. After three days, the master server can delete a garbage chunk. This is a safety measure -- users can check on a garbage chunk before it is deleted permanently and prevent unwanted deletions.
To prevent data corruption, the GFS uses a system called checksumming. The system breaks each 64 MB chunk into blocks of 64 kilobytes (KB). Each block within a chunk has its own 32-bit checksum, which is sort of like a fingerprint. The master server monitors chunks by looking at the checksums. If the checksum of a replica doesn't match the checksum in the master server's memory, the master server deletes the replica and creates a new one to replace it.
What kind of hardware does Google use in its GFS? Find out in the next section.