Date post: | 21-Mar-2017 |
Category: |
Data & Analytics |
Upload: | yousef-fadila |
View: | 57 times |
Download: | 5 times |
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING
Fall 2016 - Project 1
By:
Yousef Fadila ML TlachacFrancisco Guerrero
Filling in the missing valueDiscretize: ? = “unknown”
Manually filling in the data:
? = Germany GDPPC + Switzerland GDPPC) = 31.35
Regression imputation:GDPPC = 2.1069 * LIFE-EXP + 0.1911 * AC-S-ED + -40.4882 * (SWL= [175-200),[125-150),[200-225),
[225-250),[250-275)) -16.6881 *(SWL=[200-225),[225-250),[250-275)) - 100.3841. GDPPC (USA) = 2.1069 * 77.4 + 0.1911 * 94.6 -40.4882 *1 -16.6881 * 1 - 100.3841 = 23.59
Transforming COUNTRY attribute
COUNTRY HDI score COUNTRY HDI score
Ethiopia LOW Switzerland VERY-HIGH
India MEDIUM Germany VERY-HIGH
Mexico HIGH Japan VERY-HIGH
Thailand HIGH Canada VERY-HIGH
Russia HIGH Brazil HIGH
USA VERY-HIGH France VERY-HIGH
Discretizing AC-S-EDEqual width
Equal frequency
CfsSubsetEval algorithm
Merit
The CfsSubsetEval formula used to calculate merit is ∑corr(aj,t)/√((∑σ(aj)2)+2corr(aj1,aj2)∏σ(aj)) where t is the target attribute (play), and aj are the selected attributes (outlook & humidity).
=(corr(outlook,play) + corr(humidity,play))/√(12+12 + 2corr(humidity,outlook)(1)(1))
= (0.1960 + 0.1565)/√(1+1+2 (0.01610)) = 0.3525/√(2.032202) = 0.2473
Observing the Data
Correlation Matrix
Remove: numbUrban & medFamIncome
Multidimensional arrays and OLAP operations
Operations:
1.Roll-up time from day to year
2.Slice year == 2014
3.Roll-up patients from individual patients to all
OLAP operations on car’s sales data1. Rolling-up
2. Drilling-down
3. Slicing
4. Dicing
Thank You Questions?