The Geomblog: Data Mining, machine learning and statistics.

Thursday, March 20, 2014

Data Mining, machine learning and statistics.

How does one tell data mining, machine learning and statistics apart ?

If you spend enough time wandering the increasingly crowded landscape of Big Data-istan, you'll come across the warring tribes of Datamine, MachLearn and Stat, whose constant bickering will make you think fondly of the People's front of Judea:

Cosma Shalizi has what I think is a useful delineation of the three tribes that isn't prejudicial to any of them ("Stats is just inefficient learning !", "MachLearn is just the reinvention of statistics!" "DataMine is a series of hacks!"). It goes something like this:

Data mining is the art of finding patterns in data.
Statistics is the mathematical science associated with drawing reliable inferences from noisy data
Machine learning is [the branch of computer science] that develops technology for automated inference (his original characterization was as a branch of engineering).

I like this characterization because it emphasizes the different focus: data mining is driven by applications, machine learning by algorithms, and statistics by mathematical foundations.

This is not to say that the foci don't overlap: there's a lot of algorithm design in data mining and plenty of mathematics in ML. And of course applied stats work is an art as much as a science.

But the primary driving force is captured well.

The Geomblog

Pages

Thursday, March 20, 2014

Data Mining, machine learning and statistics.

No comments:

Post a Comment

Disqus for The Geomblog