How Google Books Works


If Google has its way, you'll eventually be able to use a keyword search to find text from almost every book ever written. See more pictures of popular Web sites.
Screenshot by HowStuffWorks.com

When Sergey Brin and Larry Page launched Google, they created an Internet juggernaut that made information easier to find. But they realized that without the information contained in humankind's decidedly analog books, there would always be a gaping hole in online information.

To bridge this gap, Google Print (now called Google Books) was born, driven by a goal of digitizing entire libraries of books. With these books online, anyone with an Internet connection could use keyword searches to locate information spanning the entire history of publishing. The implications of this project are profound in myriad ways.

For example, scholars could use the service to access a rare manuscript in Cairo, Egypt. Medical researchers might scroll through studies from all over the world in weeks instead of years, drastically reducing research times. Scientific studies of every kind could be completed on expedited timelines, too. And of course, high school and college students could tear through research papers at warp speed, with better citations and higher-quality information.

Google Books proponents also argue that the world's treasure trove of books will also be safer once they've all been digitized. Natural disasters such as fire and earthquakes, which have destroyed swaths of written history before, wouldn't ruin a database with redundant copies of files stored in multiple locations. An online repository would be better-suited to resist war and political upheaval. And then there's the simple fact that as paper ages, it becomes brittle. There are some works that librarians have to take special care of to prevent their falling apart.

In short, Google Books could mean better access to more information for more people than ever before. It could revolutionize the Internet in ways that we can't yet imagine.

But as with all revolutions, the Google Books project is not without controversy. Citizens, politicians and companies from around the world have justifiable concerns about privacy, copyright law and antitrust issues related to Google Books. Keep reading to see how Google quickly scans millions upon millions of pages of books, and how some people are doing everything they can to handicap this daring project.

Google Book Scanning and Strategy

Google has plans to scan and index entire libraries, such as the one at Stanford University.
Google has plans to scan and index entire libraries, such as the one at Stanford University.
Justin Sullivan/Getty Images News

It goes without saying that scanning millions of books is a gargantuan undertaking. The technical challenges alone are significant. Traditional scanning equipment uses a glass plate that completely flattens each page, ensuring that OCR (optical character recognition) software is able to identify the letters and numbers printed on the pages being digitized. Once scanned, those characters can be edited and searched with a computer.

To eliminate the need for glass plates and reduce the possibility of damage to the books it wants to preserve, Google patented a new book scanning process. Workers simply place the book on an open book scanner that has neither a glass plate nor any other equipment that would flatten a book. Google's advanced software scans the book and accounts for curvature of the pages, meaning there's no degradation of character recognition. The scanners work at a rate of about 1,000 pages per hour.

Google developed agreements with major libraries to start the project. The New York Public Library, as well as university libraries at Harvard, Michigan and Stanford, all agreed to let Google scan their volumes. With the help of these institutions, Google has already scanned around 12 million books [source: von Lohmann].

The project's expansiveness means that its greatest promise is granting access to books that people would otherwise never see. A student in Florida can access a special Native American collection on the other side of the country. People who can't afford to travel to see ancient texts in France might browse those tomes from their living rooms. And thanks to Google's extra efforts, a visually impaired person can view books on enlarged displays, use Braille equipment, or listen to documents through read-aloud technology.

Initially, Google Books planned to digitize only works in the public domain, which made up about 20 percent of all books [source: Toobin]. In the United States, books enter the public domain 70 years after the author's death; as public domain, they're no longer protected by copyright.

However, as Google scanned, it began digitizing even copyrighted texts. The company didn't put copyrighted materials online in their entirety, instead limiting online contents to about 20 percent of the book's contents. Google claimed this was a fair use of copyrighted materials.

Others strongly disagreed. The Authors Guild and the Association of American Publishers filed a class-action lawsuit, fueling controversy about Google Books in the United States and around the world.

Google Books Controversy and Proposed Settlements

Copyright, access and profit issues are at the center of the Google Books debate. Rights holders want more control over distribution of their work, and they also want part of the profits that Google generates from its digital archive. Google, on the other hand, wants more control over the information it is digitizing -- with better control, Google Books would not only become the world's biggest library, it could be the world's biggest bookstore, too.

In an initial settlement with the Author's Guild and the Association of American Publishers, Google agreed to pay $125 million to the plaintiffs and also make some changes in the way it is using its Google Books database. Google agreed to create a Book Rights Registry, where authors and publishers can settle copyright claims [source: Metz].

Using the registry, rights holders can opt out of the Google Books project by refusing to let Google display their work. Of course, if you're an author or publisher from another country and you don't understand the registry, it would be easy to miss the opt-out deadline, meaning Google Books would automatically begin including your work in its search results.

In addition to the registry, the first settlement would've given Google exclusive license to scan and post pages of orphan works. These are books that still fall under copyright for which the rights owners cannot be tracked down. It could also sell digital downloads of the books and set its own prices, using the registry as a guide.

Concerned parties questioned the fairness of the settlement. They argued that Google's blatant copyright infringement had sparked a lawsuit that subsequently granted the offending company even more power over the materials it copied. The U.S. Department of Justice weighed in as well, encouraging the parties to replace the settlement with a fairer version.

In a revised version of the settlement, Google Books agreed to remove all books published outside the United States, United Kingdom, Canada and Australia. It also creates a trustee that manages royalties earned from access to orphan texts. So, instead of lining Google's coffers, this revenue may wind up in the hands of copyright holders who are eventually found -- if not, the proceeds could fund charities promoting literacy [source: Samuelson].

An additional change addresses issues with Google's exclusive license to use orphan works for profit. The newer settlement should, in theory, give other companies a better shot at competing with Google Books.

Why All the Controversy over Google Books?

A lot of people object to Google taking pictures of streets and homes all over the world. How would you feel about Google tracking your reading habits?
A lot of people object to Google taking pictures of streets and homes all over the world. How would you feel about Google tracking your reading habits?
Harold Cunningham/Getty Images News

Google Books definitely treads on dangerous ground when it comes to copyright issues. Here's just one question that any settlement is unlikely to decide: What gives a U.S. court the right to speak for millions of rights holders who don't know or care about Google Books? But for many opponents, copyright infringement is just one troubling aspect of the project.

Other opponents are more worried about privacy issues. For example, in spite of Google Books' privacy policy, it's possible that Google could track what you read, right down to specific pages, with dates and times included.

Because Google is a for-profit organization, it only made sense to generate revenue from its ever-growing index of books and the associated tracking data garnered from users. As Google displayed snippets from public domain and copyrighted books, it also shows adjacent advertisements related to the book and its subject, offering to sell you products with related content. This kind of targeted marketing is a sure revenue generator. If Google can exploit that kind of detailed data for commercial gain, it could use it for more nefarious purposes, too.

Profit issues are at stake, too. Authors and publishers witnessed Google displaying their work and profiting from their texts, so they fought back with a lawsuit. They claimed Google was clearly committing copyright infringement on an incredible scale, and in the process, profiting from its actions. And although Google didn't show the entire contents of copyrighted books, what would stop the company from doing so at a later date?

On a technical and philosophical level, who would stop Google from censoring parts of books, or from eliminating whole texts? And because the legal settlement lets authors and publishers to opt out of the Book Rights Registry database, there's a potential for a form of self-censorship on the part of rights holders, too.

And what if a growing dependence on Google Books' authority actually caused an information gap? Once people began assuming that Google had scanned every book, it seems logical that they'd also assume that if the information wasn't on Google Books, it simply didn't exist.

What's more, what if Google Books constitutes a monopoly? With Google as the digital hub of the world's books, the company would control access to knowledge. Then Google could potentially charge immense fees to the organizations who wanted to tap into the Google Books database.

Google Books Under Fire

Google continues to scan books, rapidly building its database and leveraging the contents for its own purposes. In the meantime, competitors, privacy advocates and federal authorities are closely monitoring the project.

In the meantime, it remains to be seen whether Google Books will stand the test of time. Will Google's enterprising project increase knowledge and understanding for everyone with computer access? Or will the company consolidate knowledge as power, build a massive monopoly and then charge a premium for access to its holdings?

Will Google Books take great care in protecting the privacy of its users? Or will it sell detailed tracking information to a corporation that's only too willing to exploit private information for every possible financial gain?

Will scientists harness the power of Google Books to solve some of humankind's most pressing problems? With more knowledge at their fingertips, perhaps they'll collaborate to end world hunger, cure awful diseases and advance technology to fantastic heights, all in a matter of a few years. Or will they be stymied by a database that's so large and unwieldy that it hampers the people it's supposed to help?

In short, when it comes to Google Books and its potential impact on humanity, there are more questions than answers. The scale of the project is so immense and the possible outcomes are so far-reaching that no one really knows where this path will lead.

Many pundits agree that no matter what happens with upcoming rounds of legal action, the fight over Google Books is just beginning, with battlefields developing both in the United States and abroad. A French judge recently backed publishers who sued Google, which had to remove all copyrighted French materials in its database and pay damages for infringement, too.

Although it's a confusing and complicated issue riddled with esoteric legal and economic jargon, the Google Books fight is one worth watching. You may be a witness to the birth of one of the most powerful knowledge-sharing networks ever created.

Related HowStuffWorks Articles

More Great Links

Sources

  • Bartz, Diane. "Google Wants To Sell Books To Kindle Users, Too." Reuters. Dec. 11, 2009. (Dec. 19, 2009) http://www.reuters.com/article/idUSTRE5BB0DH20091212
  • Boulton, Clint. "Google Bows To FTC, Creates Privacy For Google Books." Eweek. Sept. 4, 2009. (Dec. 19, 2009) http://www.eweek.com/c/a/Search-Engines/Google-Bows-to-FTC-Creates-Privacy-Policy-For-Google-Books-763554/
  • Boulton, Clint. "Maybe Google Should Give Up The Google Search Book Ghost." Eweek. Nov. 20, 2009. (Dec. 19, 2009) http://googlewatch.eweek.com/content/google_book_search/maybe_google_should_give_up_the_google_book_search_ghost.html
  • Clements, Maureen. "The Secrets Of Google's Book Scanning Machine Revealed." National Public Radio. April 30, 2009. (Dec. 19, 2009) http://www.npr.org/blogs/library/2009/04/the_granting_of_patent_7508978.html
  • Crumley, Bruce. "Europe vs. Google: The Next Chapter." Time. Dec. 11, 2009. (Dec. 19, 2009) http://www.time.com/time/world/article/0,8599,1946920,00.html
  • Deahl, Rachael. "U Mich Pres to AAP: Google Is Good." Publishers Weekly Daily. Feb. 8, 2006. (Dec. 19, 2009) http://www.publishersweekly.com/article/CA6305725.html
  • Eckersley, Peter. "Google Book Search Settlement: Foster Competition, Escrow The Scans." Electronic Frontier Foundation blog. June 11, 2009. (Dec. 19, 2009) http://www.eff.org/deeplinks/2009/06/should-google-have-s
  • Faure, Gaelle. "French Court Shuts Down Google Books Project." Los Angeles Times. Dec. 19, 2009. (Dec. 19, 2009) http://www.latimes.com/news/nation-and-world/la-fg-france-google19-2009dec19,0,548537.story
  • Fister, Barbara. "Unsettled: Questions About The Google Book Search Settlement." Library Journal. Dec. 10, 2009. (Dec. 19, 2009) http://www.libraryjournal.com/article/CA6711187.html Frommer, Dan. "How Google Scans Books." Silicon Alley Insider. May 3, 2009. (Dec. 19, 2009) http://www.businessinsider.com/how-google-scans-books-2009-5
  • Google. "Google Book Settlement." (Dec. 19, 2009) http://books.google.com/booksrightsholders/
  • Kahle, Brewster. "A Book Grab By Google." The Washington Post. May 19, 2009. (Dec. 19, 2009) http://www.washingtonpost.com/wp-dyn/content/article/2009/05/18/AR2009051802637.html
  • Kang, Cecilia. "Post Tech Explains Google's Revised Book Settlement: Video." The Washington Post. Nov. 24, 2009. (Dec. 19, 2009) http://voices.washingtonpost.com/posttech/2009/11/post_tech_explains_googles_rev.html
  • Toobin, Jeffrey. "Google's Moon Shot." The New Yorker. Feb. 5, 2007. (Dec. 19, 2009) http://www.newyorker.com/reporting/2007/02/05/070205fa_fact_toobin
  • MacMillan, Douglas. "Google Books: Scan First, Ask Questions Later." BusinessWeek. Nov. 14, 2009. (Dec. 19, 2009) http://www.businessweek.com/the_thread/techbeat/archives/2009/11/google_books_sc.html
  • Metz, Cade. "Google Books: Is It The Last Library?" The Register. Aug. 29, 2009. (Dec. 19, 2009) http://www.theregister.co.uk/2009/08/29/google_books/
  • Metz, Cade. "Google Turns Up Nose At Ebook Monopoly Claims." The Register. Aug. 5, 2009. (Dec. 19, 2009) http://www.theregister.co.uk/2009/08/05/google_book_defense/
  • Metz, Cade. "Google Settles Book Search Suit For $125M." The Register. Oct. 28, 2008. (Dec. 19, 2009) http://www.theregister.co.uk/2008/10/28/google_settles_book_suit/
  • Oder, Norman. "Samuelson Says She Has Same Pricing, Privacy Concerns About Google Settlement." Library Journal. Nov. 18, 2009. (Dec. 19, 2009) http://www.libraryjournal.com/article/CA6707799.html
  • Samuelson, Pamela. "Legally Speaking: The Dead Souls of the Google Booksearch Settlement." O'Reilly Radar. April 17, 2009. (Dec. 19, 2009) http://radar.oreilly.com/2009/04/legally-speaking-the-dead-soul.html
  • Schonfeld, Erick. "Scan Your Books And Search Them On Google." TechCrunch. June 7, 2009. (Dec. 19, 2009) http://www.techcrunch.com/2009/06/07/scan-your-books-and-search-them-on-google/
  • Singel, Ryan. "The Fight Over The Google Of All Libraries: A Wired.com FAQ." Wired. April 30, 2009. (Dec. 19, 2009) http://www.wired.com/epicenter/2009/04/the-fight-over-the-worlds-greatest-library-the-wiredcom-faq/
  • Von Lohmann, Fred. "Google Books Settlement 2.0: Evaluating Access." Electronic Frontier Foundation blog. Nov.17, 2009. (Dec. 19, 2009) http://www.eff.org/deeplinks/2009/08/google-book-search-settlement-access
  • Vyas, Ravi. "The World Within One's Grasp." The Telegraph. Dec.11, 2009. (Dec. 19, 2009) http://www.telegraphindia.com/1091211/jsp/opinion/story_11843045.jsp