Are data mining and data warehousing related?

Both data mining and data warehousing are business intelligence tools that are used to turn information (or data) into actionable knowledge. The important distinctions between the two tools are the methods and processes each uses to achieve this goal.

Data mining is a process of statistical analysis. Analysts use technical tools to query and sort through terabytes of data looking for patterns. Usually, the analyst will develop a hypothesis, such as customers who buy product X usually buy product Y within six months. Running a query on the relevant data to prove or disprove this theory is data mining. Businesses then use this information to make better business decisions based on how they understand their customers' and suppliers' behaviors.


Data warehousing describes the process of designing how the data is stored in order to improve reporting and analysis. Data warehouse experts consider that the various stores of data are connected and related to each other conceptually as well as physically. A business's data is usually stored across a number of databases. However, to be able to analyze the broadest range of data, each of these databases needs to be connected in some way. This means that the data within them need a way of being related to other relevant data and that the physical databases themselves have a connection so their data can be looked at together for reporting purposes.

So the crux of the relationship between data mining and data warehousing is that data, properly warehoused, is easier to mine. If a data mining query has to run through terabytes of data spread across multiple databases, which sit on different physical networks - - that is not an efficient query and getting results will take a long a time. However, if the data warehouse expert designs a data storage system that closely connects relevant data in different databases, the data miner can now run much more meaningful and efficient queries to improve the business.