+ All Categories
Home > Documents > Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Date post: 07-Jan-2016
Category:
Upload: gratia
View: 37 times
Download: 1 times
Share this document with a friend
Description:
Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning. Jian Zhang Supervised by: Karen Petrie. Background. Cancer research has become an extremely data rich environment. Plenty of analysis packages can be used for analyzing the data. Data preprocessing. - PowerPoint PPT Presentation
Popular Tags:
17
Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning Jian Zhang Supervised by: Karen Petrie 1
Transcript
Page 1: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Jian Zhang

Supervised by: Karen Petrie

1

Page 2: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Background

Cancer research has become an extremely data rich environment.

Plenty of analysis packages can be used for analyzing the data.

Data preprocessing.

2

Page 3: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Rich data environment

3

• There are some factors about breast cancer

Page 4: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Raw clinical data sample

Yes-No data:

yes: yes, Yes, Ye, yed, yef …

no: No, n, not …

null: don’t know, no data, waiting for lab Positive-Negative data:

Positive: +, ++, p, p++…

Negative: -, n, neg, n---…

Null: no data, ruined sample, waiting for lab

4

Page 5: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Basic version

5

Page 6: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Question?

Could we make the process automated?

6

Page 7: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Introduction

Decision Tree learning Weka

7

Page 8: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Decision Tree Learning

Decision tree learning is a method for approximating discrete-valued functions, which is one of the most popular inductive algorithms.

8

Page 9: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Decision tree sample

9

Page 10: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Weka

Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, which contains a collection of algorithms for data analysis and predictive modeling.

10

Page 11: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Experiment

Data: Training dataset with 100 instances

Test dataset with 100 instances, which has 17 different values from the training dataset

Tool: weka

11

Page 12: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Experiment

Experiment 1 : training dataset Experiment 2 : training dataset, test dataset

12

Page 13: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Experiment 1

Name of Tree Correctly Classified Instances (%)

Testing (%) Root mean squared error

BFTree 89 99 0.0588DecisionStump 47 55 0.422

FT 87 98 0.1698J48 82 98 0.0976

J48graft 82 98 0.0976LADTree 81 90 0.2317

LMT 84 91 0.2344NBTree 80 98 0.2326

RandomForest 83 100 0.0781

RandomTree 83 100 0.0447

REPTree 82 98 0.0985SimpleCart 89 96 0.1511

13

Page 14: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Experiment 2

Name of Tree Correctly Classified Instances(%)

Testing (%)

Root mean squared error

BFTree 89 88 0.2813

DecisionStump 47 49 0.4318

FT 87 90 0.2194

J48 82 88 0.2098

J48graft 82 88 0.2098

LADTree 81 89 0.2494

LMT 84 89 0.234

NBTree 80 88 0.2569

RandomForest 83 88 0.2095

RandomTree 83 88 0.209

REPTree 82 88 0.2098

SimpleCart 89 87 0.284814

Page 15: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Result

Through the results, the decision tree has a good classification and prediction for the existing entries, but for the unknown entries, the prediction is not as good as expected.

15

Page 16: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Future work

Find and correct the incorrect prediction in the process

Automated transformation for unknown entries

16

Page 17: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Thank you !

17


Recommended