Submit Predictions
Statistics &Analysis
Data Management
Hypotheses
Goal
Get Data
Predict whom survived the Titanic Disaster
Woman and Children First
Read dataset into Excel, R, etc
Some Age Missing Data, Analyze Gender Only
74% Women, 19% Men
320 / 418 = 76.5%
Variable Description Type Hypothesispclass Passenger Class Categorical,
Ordinal1st class 3rd
name Name TextSex Sex Categoricalage Age Numericsibsp Number of Siblings/Spouses Aboard Integer
parch Number of Parents/Children Aboard Integer
ticket Ticket Number Textfare Passenger Fare Numericcabin Cabin Textembarked Port of Embarkation Categorical
Predictor Variables
AgeAll
N = 891
MissingN = 177
DataN = 714
0 10 20 30 40 50 60 70 80 900
2
4
6
8
10
12
14
16
18
20
Survived Not
• Dependent variable, (Y) • Continuous• Categorical
Decision Trees
The Decision Tree looks for split on sample at the node that can lead to the most differentiation on Y
Survived
Age Lesser Than X
Age Greater Than X• Independent variables, (X’s)
• Continuous• Categorical
Age
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0
5
10
15
20
25
30
35
40
45
50
A B Delta N
0 10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
14
16
18
20
Prediction and Missing Values
Variable Descriptionpclass Passenger Classname NameSex Sexage Agesibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Ticket Numberfare Passenger Farecabin Cabinembarked Port of Embarkation
Correlation, Association of Age with other Variables?
Submit Predictions
Statistics &Analysis
Data Management
Hypotheses
Goal
Get Data
Predict whom survived the Titanic Disaster
Woman and Children First
Read dataset into Excel, R, etc
Some Age Missing Data, Analyze Gender Only
74% Women, 19% Men
320 / 418 = 76.5%
Gender and Age• Tree grows based on optimizing
only the split from the current node rather then optimizing the entire tree• Tree stops when further split
becomes ineffective
0 10 20 30 40 50 60 700%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Female Survival%
Submit Predictions
Statistics &Analysis
Data Management
Hypotheses
Goal
Get Data
Predict whom survived the Titanic Disaster
Woman and Children First
Read dataset into Excel, R, etc
Some Age Missing Data, Analyze Gender Only
Submit Predictions
Statistics &Analysis
Data Management
Hypotheses
Goal
Get Data
Predict whom survived the Titanic Disaster
Woman and Children First
Read dataset into Excel, R, etc
Age + Gender