+ All Categories
Home > Documents > Submit Predictions

Submit Predictions

Date post: 22-Feb-2016
Category:
Upload: osman
View: 38 times
Download: 0 times
Share this document with a friend
Description:
Goal. Predict whom survived the Titanic Disaster. Hypotheses. Woman and Children First. Get Data. Read dataset into Excel, R, etc. Data Management. Some Age Missing Data, Analyze Gender Only. Statistics & Analysis. 74% Women, 19% Men . Submit Predictions. 320 / 418 = 76.5%. - PowerPoint PPT Presentation
Popular Tags:
15
Submit Prediction s Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disas Woman and Children First Read dataset into Excel, R, etc Some Age Missing Data, Analyze Gender Only 74% Women, 19% Men 320 / 418 = 76.5%
Transcript

Submit Predictions

Statistics &Analysis

Data Management

Hypotheses

Goal

Get Data

Predict whom survived the Titanic Disaster

Woman and Children First

Read dataset into Excel, R, etc

Some Age Missing Data, Analyze Gender Only

74% Women, 19% Men

320 / 418 = 76.5%

Variable Description Type Hypothesispclass Passenger Class Categorical,

Ordinal1st class 3rd

name Name TextSex Sex Categoricalage Age Numericsibsp Number of Siblings/Spouses Aboard Integer

parch Number of Parents/Children Aboard Integer

ticket Ticket Number Textfare Passenger Fare Numericcabin Cabin Textembarked Port of Embarkation Categorical

Predictor Variables

AgeAll

N = 891

MissingN = 177

DataN = 714

0 10 20 30 40 50 60 70 80 900

2

4

6

8

10

12

14

16

18

20

Survived Not

• Dependent variable, (Y) • Continuous• Categorical

Decision Trees

The Decision Tree looks for split on sample at the node that can lead to the most differentiation on Y

Survived

Age Lesser Than X

Age Greater Than X• Independent variables, (X’s)

• Continuous• Categorical

Age

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0

5

10

15

20

25

30

35

40

45

50

A B Delta N

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16

18

20

• maximize data likelihood (minimize deviance).

Decision Trees

Prediction and Missing Values

Variable Descriptionpclass Passenger Classname NameSex Sexage Agesibsp Number of Siblings/Spouses Aboard

parch Number of Parents/Children Aboard

ticket Ticket Numberfare Passenger Farecabin Cabinembarked Port of Embarkation

Correlation, Association of Age with other Variables?

Submit Predictions

Statistics &Analysis

Data Management

Hypotheses

Goal

Get Data

Predict whom survived the Titanic Disaster

Woman and Children First

Read dataset into Excel, R, etc

Some Age Missing Data, Analyze Gender Only

74% Women, 19% Men

320 / 418 = 76.5%

Gender

Gender and Age• Tree grows based on optimizing

only the split from the current node rather then optimizing the entire tree• Tree stops when further split

becomes ineffective

0 10 20 30 40 50 60 700%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Female Survival%

Prediction: Gender + Age

Submit Predictions

Statistics &Analysis

Data Management

Hypotheses

Goal

Get Data

Predict whom survived the Titanic Disaster

Woman and Children First

Read dataset into Excel, R, etc

Some Age Missing Data, Analyze Gender Only

Submit Predictions

Statistics &Analysis

Data Management

Hypotheses

Goal

Get Data

Predict whom survived the Titanic Disaster

Woman and Children First

Read dataset into Excel, R, etc

Age + Gender

Kitchen Sink

Kitchen Sink


Recommended