What is ‘big data’?

Every time you make a purchase online, you're adding to the data stream.
Every time you make a purchase online, you're adding to the data stream.
©Monkey Business/Thinkstock

In a way, big data is exactly what it sounds like -- a lot of data. Since the advent of the Internet, we've been producing data in staggering amounts. It's been estimated that in all the time leading up to the year 2003, only 5 exabytes of data were generated -- that's equal to 5 billion gigabytes. But from 2003 to 2012, the amount reached around 2.7 zettabytes (or 2,700 exabytes, or 2.7 trillion gigabytes) [sources: Intel, Lund]. According to Berkeley researchers, we are now producing roughly 5 quintillion bytes (or around 4.3 exabytes) of data every two days [source: Romanov].

The term 'big data' is usually used to refer to massive, rapidly expanding, varied and often unstructured sets of digitized data that are difficult to maintain using traditional databases. It can include all the digital information floating around out there in the ether of the Internet, the proprietary information of companies with whom we've done business and official government records, among a great many other things. There's also the implication that the data is being analyzed for some purpose.

We've generated lots of it ourselves by making online purchases and participating in social media, but that is just the tip of the iceberg. Big data can include digitized documents, photographs, videos, audio files, tweets and other social networking posts, e-mails, text messages, phone records, search engine queries, RFID tag and barcode scans and financial transaction records, though those aren't the only sources. You're producing data every time you do anything online, leaving a digital trail that others can come along and mine for useful information.

The numbers and types of devices that produce data have been proliferating as well. Besides home computers and retailers' point-of-sale systems, we have Internet-connected smartphones, WiFi-enabled scales that tweet our weight, fitness sensors that track and sometimes share health related data, cameras that can automatically post photos and videos online and global positioning satellite (GPS) devices that can pinpoint our location on the globe, to name a few. Don't forget weather and traffic sensors, surveillance cameras, sensors in cars and airplanes and other things not connected with individuals that are constantly collecting data. The large numbers of electronic devices that generate and upload data have given rise to the term "the Internet of things."

You'll find multiple definitions of big data out there, so not everyone agrees entirely on what is included, but it can be anything anyone might be interested to know that can be subjected to computer analysis. And these large, unwieldy sets of data require new methods to collect, store, process and analyze them.