How Data Integration Works

Data Integration Basics

This simple table shows customer purchases.
This simple table shows customer purchases.

Data integration focuses mainly on databases. A database is an organized collection of data. It's similar to a file system, which is an organizational structure for files so they're easy to find, access and manipulate.

There are different ways to categorize databases. Some people prefer to classify them according to the kind of data the databases store. For example, you might classify a database as a media database if all the information stored there is contained in video or sound files.

Another classification method looks at how the databases organize data. A database's organizational arrangement is called a schema. A common organizational technique is to use tables to show the relationship between different data points. Tables are like spreadsheets. Columns define categories of data, while rows are records. A database using this approach is a relational database.

Object-oriented programming (OOP) databases take a different approach to organizing data. The OOP language is a departure from traditional approaches to programming, which follow the pattern of inserting data into a set of instructions and then producing output. The OOP language focuses instead on defining data as objects and then determining how different objects relate and interact with one another.

To create an OOP database, first you'd define all the objects you plan on storing in the database. Then, you'd define the way each object relates to every other object within the database. After you identify an object, you put it into a class, or set of objects. To define a class you have to determine what data each object within that class must have and which logic sequences, called methods, will affect those objects. The objects within a system can communicate with you or other objects using interfaces called messages.

It's easier to understand with an example. Let's say you're building a database containing information about American sports. You decide to start by defining baseball teams. Once you've created the definition of a baseball team, you can generalize it as a class within the database. The Atlanta Braves would be a specific instance of that class, also known as an object. The class of baseball teams belongs to a superclass of American sports teams, which would also include other classes like football and soccer teams.

To access information within a database (no matter how it organizes data), you use a query. A query is just a request for information. People and applications can submit queries to databases. A database responds to queries by sending data that meets the original request's parameters. Queries rely on special computer languages such as Structured Query Language (SQL). If you've ever used an Internet search engine, you've submitted a query -- your search terms.

This view shows only the data relevant to the query "customers who purchased more than $100 in products." This view shows only the data relevant to the query "customers who purchased more than $100 in products."
This view shows only the data relevant to the query "customers who purchased more than $100 in products."

The database responds to queries by creating a view of data. A view is a specific way of displaying data. In a data integration system, the returned view shows only the data directly related to the original query. In our table example, if you submitted a query asking for all the customers who bought more than $100 worth of products, you'd get this result:­

This view shows only the data relevant to the query "customers who purchased more than $100 in products."  Notice that it doesn't show what kind of products were purchased, nor does it display customers who purchased less than $100 of products.

What are the different approaches to data integration? Find out in the next section.