Data mining and machine learning
Valérie Bécaert Valérie Bécaert
November 26 5 min

Data mining and machine learning

Artificial intelligence and the broader field of data science are rapidly evolving areas of study, and the lines between different technologies and terms are often blurred. Many overlap in their methods and applications, and some are subsets of broader ideas. Data mining and machine learning are two terms that are sometimes used interchangeably, but there are significant differences that are important to understand.

Mining for meaning

Data mining is the general term for discovering hidden patterns in large datasets using methods that include machine learning. It includes approaches such as cluster analysis, which automatically groups together items in a dataset according to shared properties, as well as anomaly detection and other correlative techniques. It also analyzes the relationships between numerous variables and constants within the data to make predictions. It is often confused with Data analysis which is used to test models and hypotheses on the dataset.

Taking all of this into consideration, you may well conclude that the term data mining itself is actually a misnomer. After all, gold mining refers to the substance being mined, not the place from which it’s taken. With data mining, the same principle applies — it’s not the data that’s being mined, but instead, understanding is mined from the data. A more literal term would perhaps be insight mining or meaning mining.

That’s why data mining is sometimes also known as “knowledge discovery.” It’s the search for a needle of knowledge in the haystack of a huge dataset — a task that’s perfect for a fast, accurate, and tireless computer to assist with.

Learning and improving

Data mining techniques look for patterns in data, analyzing and extracting insights. Before the advent of AI, they used traditional approaches to software — that is, you had to tell the program exactly what to do and what you’re looking for.

Machine learning deals with data, too, but in a fundamentally different way. It’s a subset of artificial intelligence technology that deals with the ability of an AI system to analyze information and draw out insights. So far, so familiar. But what separates machine learning is the ability to make decisions, learn from them, and apply those lessons to make better decisions in the future.

In order to gain a deeper level of insight or categorize large quantities of data accurately, supervised or semi-supervised machine learning models are trained on selected and labelled data so that they’ll know what they’re looking for. An AI model can learn to discern certain defining features within the training data, and build a frame of reference with which to handle further data.

This expertise, gained from experience, improves these machine learning models, and they become more accurate and effective using the knowledge they accumulate as they analyze data or perform a task. And once they have been trained, they may be able to work with unstructured data, because they can then recognize relevant features without needing them to be pre-labelled.

Searching for trends

One prominent use of data mining is in insurance, where trends can be found in large quantities of claims. This allows insurance companies to adjust the conditions of their coverage and premiums accordingly.

A subset of data mining called text mining is often used for this purpose. In its simplest form, this is a keyword search, like the Find command in an application. But today’s text mining can utilize decision logic and natural language processing to establish relationships between items of data — for instance, customers’ locations and ages. This sophisticated text mining can even analyze customers’ sentiments, such as dissatisfaction, which allows the insurer to plan policies and services that are better targeted to these needs.

For instance, if many customers under the age of 20 are using negative language about automobile insurance pricing topics, text mining will discover this, and the insurance company can set more affordable premiums for young customers, who will then be less likely to take their insurance needs elsewhere. That’s only one example of the kinds of insights data mining can bring, and the benefits it offers businesses.

Understanding the market’s past and future

Financial services have been quick to benefit from machine learning. A 2019 study by Refinitiv (formerly Thomson Reuters) found that 90% of respondents had already deployed machine learning within their business. What’s more, 78% said that machine learning was a core component of their business strategy. The report also concluded that machine learning would be the greatest enabler of competitive advantage in financial services.

Financial businesses are eager to make the most of machine learning’s abilities to analyze current market conditions and learn from historical data in order to gain trading insights, judge performance, and find areas of potential risk. The industry has always been an enthusiastic adopter of technology, from the abacus to early automated trading systems, and today’s AI-driven innovations.

Digging even deeper

With even more advanced forms of machine learning appearing, such as deep learning, data mining techniques are now able to gain unprecedented levels of insight and accuracy, and be used for applications previously unimagined. What will be the next revolutionary insight they unearth?