Data Mining

Data mining is the process of analyzing and evaluating sets of large data by looking for patterns and ways to summarize the data. By analyzing data, patterns will emerge that would not have been seen otherwise. This data in turn can be used to draw statistics and notice trends. A common example of data mining is that supermarkets notice that when men buy diapers, they often accompany this by purchasing beer as well. Most of the data is usually found in a data warehouse, a database that has a lot of information that is used for analysis. Data mining can be used in healthcare to notice patient trends, connections between prescriptions, pharmaceutical reps, and doctors, and many other things. It can also be used outside of healthcare; such as noticing credit card and shopping trends to suggest services and products based on a shopper's likes and dislikes. By utilizing the mass amounts of data that is available to us, we can then draw conclusions and gain more knowledge. Currently, we are "data rich" but "knowledge poor." By analyzing the data we can learn more about different trends that can be useful in the future. There are, however, some challenges faced with data mining. Some challenges include whether the data is accurate, such as patients lying on forms or not telling the truth, skewing the data. Also, data storage can be difficult because there is so much of it.


An important application for data mining that is highly topical in modern healthcare is the analysis of Big Data. Big Data in healthcare refers to the massive amount of individual datum points that medical institutions gather every day. These data come from every patient in every institution. Every bit of information related to someone's prescriptions, checkups, diagnoses, treatments, chronic conditions, family history, individual blood or body type, negative or positive reactions, and hundreds/thousands of others, fall under the umbrella of Big Data. The implications of harnessing these data effectively are staggering, and data mining is the key to harnessing them. When we have collected historical data on individual patients, groups of related patients, or even all patients as a whole, we can analyze them to find trends and patterns regarding treatments, drug doses, blood sugar spikes, cardiovascular irregularities, etc. Analyzing these trends and patterns electronically and administering the relevant results via decision support systems can improve the quality of physicians' care exponentially. It can save time and resources, and it can allow physicians to provide individualized care and predict complications before they occur.

external image kdd-data-mining.gif

Below is a video describing a famous case in which data mining was being conducted so rigorously and non-discriminately, that it heavily backfired on Target Corp.

Randeree Slides/Lectures

- What is Data Mining and Big Data?
- Data Mining Applications in Healthcare
- The Big Data Revolution in Healthcare