Meta? I hardly knew her!

Descriptions of data are called metadata. Metadata is useful for naming and defining data as well as describing the relationship of one set of data to other sets. Data integration systems use metadata to locate the information relevant to queries.

Data Warehouses

As we saw earlier, a data warehouse is a database that stores information from other databases using a common format. That's about as specific as you can get when describing data warehouses. There's no unified definition that dictates what data warehouses are or how designers should build them. As a result, there are several different ways to create data warehouses, and one data warehouse might look and behave very differently from another.

In general, queries to a data warehouse take very little time to resolve. That's because the data warehouse has already done the major work of extracting, converting and combining data. The user's side of a data warehouse is called the front end, so from a front-end standpoint, data warehousing is an efficient way to get integrated data.

From the back-end perspective, it's a different story. Database managers must put a lot of thought into a data warehouse system to make it effective and efficient. Converting the data gathered from different sources into a common format can be particularly difficult. The system requires a consistent approach to describing and encoding the data.

The warehouse must have a database large enough to store data gathered from multiple sources. Some data warehouses include an additional step called a data mart. The data warehouse takes over the duties of aggregating data, while the data mart responds to user queries by retrieving and combining the appropriate data from the warehouse.

One problem with data warehouses is that the information in them isn't always current. That's because of the way data warehouses work -- they pull information from other databases periodically. If the data in those databases changes between extractions, queries to the data warehouse won't result in the most current and accurate views. If the data in a system rarely changes, this isn't a big deal. For other applications, though, it's problematic.

Going back to our example from before with the traffic report and map, you can see how this would be a problem. While the town's map might not require frequent updates, traffic conditions can change dramatically in a relatively short amount of time. A data warehouse might not extract data very frequently, which means time-sensitive information may not be reliable. For those sort of applications, it's better to take a different data integration approach.

What's the alternative to data warehousing? Find out in the next section.