Like HowStuffWorks on Facebook!

How Google Books Works


Google Book Scanning and Strategy
Google has plans to scan and index entire libraries, such as the one at Stanford University.
Google has plans to scan and index entire libraries, such as the one at Stanford University.
Justin Sullivan/Getty Images News

It goes without saying that scanning millions of books is a gargantuan undertaking. The technical challenges alone are significant. Traditional scanning equipment uses a glass plate that completely flattens each page, ensuring that OCR (optical character recognition) software is able to identify the letters and numbers printed on the pages being digitized. Once scanned, those characters can be edited and searched with a computer.

To eliminate the need for glass plates and reduce the possibility of damage to the books it wants to preserve, Google patented a new book scanning process. Workers simply place the book on an open book scanner that has neither a glass plate nor any other equipment that would flatten a book. Google's advanced software scans the book and accounts for curvature of the pages, meaning there's no degradation of character recognition. The scanners work at a rate of about 1,000 pages per hour.

Google developed agreements with major libraries to start the project. The New York Public Library, as well as university libraries at Harvard, Michigan and Stanford, all agreed to let Google scan their volumes. With the help of these institutions, Google has already scanned around 12 million books [source: von Lohmann].

The project's expansiveness means that its greatest promise is granting access to books that people would otherwise never see. A student in Florida can access a special Native American collection on the other side of the country. People who can't afford to travel to see ancient texts in France might browse those tomes from their living rooms. And thanks to Google's extra efforts, a visually impaired person can view books on enlarged displays, use Braille equipment, or listen to documents through read-aloud technology.

Initially, Google Books planned to digitize only works in the public domain, which made up about 20 percent of all books [source: Toobin]. In the United States, books enter the public domain 70 years after the author's death; as public domain, they're no longer protected by copyright.

However, as Google scanned, it began digitizing even copyrighted texts. The company didn't put copyrighted materials online in their entirety, instead limiting online contents to about 20 percent of the book's contents. Google claimed this was a fair use of copyrighted materials.

Others strongly disagreed. The Authors Guild and the Association of American Publishers filed a class-action lawsuit, fueling controversy about Google Books in the United States and around the world.