+ All Categories
Home > Data & Analytics > A data driven nomogram for breast cancer survival

A data driven nomogram for breast cancer survival

Date post: 17-Jan-2017
Category:
Upload: lisa-federer
View: 280 times
Download: 0 times
Share this document with a friend
14
A Data-Driven Nomogram for Breast Cancer Survival Capstone by Lisa Federer, Bret Gaulin, Michal Haskell, Andy Pollack, and Carmen Smiley
Transcript
Page 1: A data driven nomogram for breast cancer survival

A Data-Driven Nomogram for Breast Cancer SurvivalCapstone by Lisa Federer, Bret Gaulin, Michal Haskell, Andy Pollack, and Carmen Smiley

Page 3: A data driven nomogram for breast cancer survival

1 in 8 US women will be diagnosed with breast cancer during their lifetime.

Page 5: A data driven nomogram for breast cancer survival

Our hypothesisCertain demographic, diagnostic, and treatment parameters can reliably predict survival time for women with breast cancer. Using this knowledge, we could build a “calculator” to estimate survival time for individuals.

Page 6: A data driven nomogram for breast cancer survival

The data

http://seer.cancer.gov/

Page 7: A data driven nomogram for breast cancer survival

Data exploration and variable selection

• Survival time (months)• Age at diagnosis• Year of birth• Race• Origin (Hispanic recode)• Stage• Histology• Tumor extent• Number of primary tumors• Laterality• ER Status• PR Status• Radiation therapy

146 variables in SEER database

13 variablesof interest

Page 8: A data driven nomogram for breast cancer survival

Classification by survival time

Page 9: A data driven nomogram for breast cancer survival

Classification model resultsModel   Precision Recall f-score Support

Logistic Regression

0 0.65 0.43 0.23 146

1 0.55 0.37 0.44 4162 0.55 0.76 0.63 471

Average 0.56 0.56 0.54 1033

Naïve Bayes

0 0.39 0.16 0.23 1621 0.63 0.3 0.4 4022 0.5 0.84 0.63 466

Average 0.54 0.84 0.63 1030

Decision Tree

0 0.3 0.39 0.34 1311 0.76 0.75 0.75 4162 0.8 0.75 0.78 485

Average 0.72 0.7 0.71 1032

Random Forest

0 0.58 0.43 0.5 1831 0.73 0.75 0.74 4142 0.71 0.78 0.74 440

Average 0.7 0.7 0.7 1037

K Neighbors

0 0.53 0.33 0.41 1501 0.83 0.82 0.82 4252 0.77 0.88 0.82 472

Average 0.76 0.77 0.76 1047

Page 10: A data driven nomogram for breast cancer survival

Regression and survival models

Cox proportional hazards model

Aalen’s additive model

Page 11: A data driven nomogram for breast cancer survival

Nomogram demonstration

Page 12: A data driven nomogram for breast cancer survival

Limitations

Not a substitute for medical advice

Page 13: A data driven nomogram for breast cancer survival

Limitations

Missing or incomplete data in SEER database

Page 14: A data driven nomogram for breast cancer survival

Questions?


Recommended