Date post: | 24-Jun-2015 |
Category: |
Technology |
Upload: | xiaming-chen |
View: | 113 times |
Download: | 4 times |
A BRIEF TUTORIAL ON DATA MINING
Xiaming Chen, OMNI-Lab 2014-07
OUTLINE
• Whats Data Mining?
• A Hands-on Practice
2
WHATS DATA MINING
WHATS DATA MINING
• Science: probability, statistics, graph theory etc.
• Techniques: clustering, classification, regression, prediction etc.
• A way to think about this world.
On textbooks
4
WHATS DATA MINING
• Science? Maybe
In reality
Research on Social Networks5
WHATS DATA MINING
• Prediction? Yes!
In reality
The Highest Creature Intelligence (100%)
Anti-Prediction
6
US Election, Bayes Selection!
WHATS DATA MINING
• The world, thinking? Spying!
In reality
7
“Illegal SPYING below!”
WHATS DATA MINING• You Need, You Learn, You Expert
8
Insights Thinking Programming
HANDS-ON PRACTICE
HANDS-ON PRACTICE• Tools to Facilitate Your Data Analysis
• Commercial
• SAS
• IBM SPSS
• Matlab etc.
• Free/Open Source
• RapidMiner + Weka
• R (my favor)
• Python + SciPy + scikit-learn
• Hadoop/Spark etc.
10
HANDS-ON PRACTICE• Example: RapidMiner + StoneFlakes
http://archive.ics.uci.edu/ml/datasets/StoneFlakes11
HANDS-ON PRACTICE• RapidMiner (ads-free)
• A Java-based IDE for ML, data mining, text mining etc.
• Modular design, graphic interface, zero-line coding
• Complete Process logic: data ETL, visualization, modeling, prediction, reports etc.
• Growing extension market
• CLI and API for other programs
• Call functions of Weka and RDownload: http://www.rapidminer.com/12
HANDS-ON PRACTICE
• StoneFlakes • StoneFlakes.csv: flake
attribute information
• annotation.csv: inventory properties
Formated: http://io.hsiamin.com/data/StoneFlakes.tar.gz13
HANDS-ON PRACTICE
• Demo
14
SUMMER COURSE• Spatial-temporal Data Analysis
• 郑宇,MSR
• 7.1 ~ 31, 2014
• 周⼆二、四下午2:00 ~ 5:40
• 闵⾏行上院316
15