Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | dalton-gaines |
View: | 83 times |
Download: | 23 times |
داده كاوي و كاربرد آن در پزشكي
بنام خدا
نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510
استاد راهنما : جناب آقاي دكتر توحيد خواه )سمينار درس كاربرد فناوري اطالعات در پزشكي(
Necessity is mother of invention Huge amounts of data Electronic records of our decisions
Choices in the supermarket Financial records Our comings and goings
We swipe our way through the world – every swipe is a record in a database
Data rich – but information poor Lying hidden in all this data is information! 2
3
Extracting or “mining” knowledge from large amounts of data
Data -driven discovery and modeling of hidden patterns in large volumes of data
Extraction of implicit, previously unknown and unexpected, potentially extremely useful information from data
4
Large database
Data mining
Data visualization
Ways of seeing patterns in large data sets Uses the efficiency of human pattern recognition
5
Gold Mining Knowledge mining from databases Knowledge extraction Data/pattern analysis Knowledge Discovery Databases or
KDD
6
______
______
______
Transformed Data
Patternsand
Rules
Target Data
Raw Data
KnowledgeData MiningTransformation
Interpretation& Evaluation
Selection& Cleaning
IntegrationUnderstanding
Knowledge Discovery Process
DATAWarehouse
Knowledge
7
Find true patterns and avoid overfitting (false patterns due to randomness)
8
Classification: predicting an item class Clustering: finding clusters in data Associations: e.g. A & B & C occur frequently Visualization: to facilitate human discovery Summarization: describing a group Estimation: predicting a continuous value Deviation Detection: finding changes Link Analysis: finding relationships
9
Computationally expensive to investigate all possibilities
Dealing with noise/missing information and errors in data
Choosing appropriate attributes/input representation
Finding the minimal attribute space Finding adequate evaluation function(s) Extracting meaningful information Not over fitting
10
INSIGHTFUL MINERAngoss Knowledge ACCESS ARMiner Eudaptics Viscovery Goal TV MDR
Viscovery SOMine
SPSS
11
Science: Chemistry, Physics Bioscience
Sequence-based analysis Protein structure and function prediction Protein family classification Microarray gene expression
Financial Industry - banks, businesses, e-commerce Stock and investment analysis
Pharmaceutical companies Health care Sports and Entertainment
Clinical Data Mining processes
Digital format for all pertinent data Create structure Obtain coded information Natural language understanding Create a widely accessible repository
12
13
Minimum systolic blood pressure over a 24-hour period following admission to the hospital
Class 2:
Early death
Age of Patient
Class 1:
Survivors
Was there sinus tachycardia?
Class 1:
Survivors
Class 2:
Early death
<= 91 > 91
<=62.5>62.5
YESNO
14
15
16
An organism’s genome is the “program” for making the organism, encoded in DNA Human DNA has about 30-35,000 genes A gene is a segment of DNA that specifies how
to make a protein Cells are different because of differential
gene expression About 40% of human genes are expressed at
one time Microarray devices measure gene expression
17
Gene ValueD26528_at 193D26561_cds1_at -70D26561_cds2_at 144D26561_cds3_at 33D26579_at 318D26598_at 1764D26599_at 1537D26600_at 1204D28114_at 707
Scanner
enlarged section of raw image
raw data
18
New and better molecular diagnostics New molecular targets for therapy
few new drugs, large pipeline, … Outcome depends on genetic signature
best treatment? Fundamental Biological Discovery
finding and refining biological pathways Personalized medicine ?!
19
Avoiding false positives, due to too few records (samples), usually < 100 too many columns (genes), usually > 1,000
Model needs to be robust in presence of noise
For reliability need large gene sets; for diagnostics or drug targets, need small gene sets
Estimate class probability Model needs to be explainable to biologists
20
21
22
23
24
25
26
27
28
29
Discover useful relationships in data Discover information otherwise overlooked Provide intelligence to improve various
phases Intellectual property Competitive advantages:
Getting more out of your data Finding other relevant information faster Exploratory, hypothesis-generating analyses
Increase productivity – reduced amount of time and money
30