Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

Post on 20-Jan-2016

218 views 0 download

Tags:

transcript

Artificial Intelligence Project #3 Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks: Diagnosis Using Bayesian Networks

May 19, 2005

(c) 2005 SNU CSE Biointelligence Lab

2

Goals of the ProjectGoals of the Project

Analysis of the influence of network size and data size on structural learning of Bayesian networks Six Bayesian networks of various sizes are given. Generate data examples from each Bayesian network. Learn Bayesian network structures from the generated data. Analyze the learned results according to the network size and the data size.

Classification using Bayesian networks A microarray dataset consisting of two classes of samples is given. Learn Bayesian network classifiers from the dataset. Compare the classification accuracy of Bayesian network classifiers with that of other

classifiers such as neural networks.

(c) 2005 SNU CSE Biointelligence Lab

3

Given Bayesian NetworksGiven Bayesian Networks

Randomly generated Network structure: scale-free and modular # of variables: 10, 30, and 45 All variables are binary Network file format: *.dsc for MSBNX (http://research.microsoft.com/adapt/MSBNx/)

(c) 2005 SNU CSE Biointelligence Lab

4

Example Bayesian Network Structure IExample Bayesian Network Structure I

(c) 2005 SNU CSE Biointelligence Lab

5

Example Bayesian Network Structure IIExample Bayesian Network Structure II

(c) 2005 SNU CSE Biointelligence Lab

6

**.dsc Files.dsc Files

Node name

Possible states

Parents

Child

Conditional probability

distribution

(c) 2005 SNU CSE Biointelligence Lab

7

Data GenerationData Generation

X1

X3 X4

X2

X5 X6

1. Sample X1 from P(X1)

2. Sample X2 from P(X2)

3. Sample X3 from P(X3| X1)

4. Sample X4 from P(X4| X1, X2)

5. Sample X5 from P(X5| X3)

6. Sample X6 from P(X6| X4)

(c) 2005 SNU CSE Biointelligence Lab

8

Data Generation ToolData Generation Tool

data_generator Usage: data_generator [network file style] [# of nodes] [# of

data samples] [input file] [output file]...

(c) 2005 SNU CSE Biointelligence Lab

9

Structural Learning of Bayesian NetworksStructural Learning of Bayesian Networks

Using WEKA software (http://www.cs.waikato.ac.nz/ml/weka/)

(c) 2005 SNU CSE Biointelligence Lab

10

Learning ExampleLearning Example

The original networkstructure

Learned networkstructure

(c) 2005 SNU CSE Biointelligence Lab

11

Materials for the First OneMaterials for the First One

Given Bayesian networks

sf_10.dsc, sf_30.dsc, sf_45.dsc, md_10.dsc, md_30.dsc, md_45.dsc

Data generation tool data_generator.exe [for Windows], data_generator [for Linux]

Downloadable MSBNX (http://research.microsoft.com/adapt/MSBNx/)

WEKA (http://www.cs.waikato.ac.nz/ml/weka/)

You should write your own code for comparing Bayesian network structures.

(c) 2005 SNU CSE Biointelligence Lab

12

StudyStudy

Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, 2003.

60 leukemia patients

Bone marrow samples

Affymetrix GeneChip arrays

Gene expression data

(c) 2005 SNU CSE Biointelligence Lab

13

Gene Expression DataGene Expression Data

# of data examples 120 (60: before treatment, 60: after treatment)

# of genes measured 12600 (Affymetrix HG-U95A array)

Task Classification between “before treatment” and “after treatment”

based on gene expression pattern

(c) 2005 SNU CSE Biointelligence Lab

14

Affymetrix GeneChip ArraysAffymetrix GeneChip Arrays

Use short oligos to detect gene expression level. Each gene is probed by a set of short oligos. Each gene expression level is summarized by

Signal: numerical value describing the abundance of mRNA A/P call: denotes the statistical significance of signal

(c) 2005 SNU CSE Biointelligence Lab

15

PreprocessingPreprocessing

Remove the genes having more than 60 ‘A’ calls # of genes: 12600 3190

Discretization of gene expression level Criterion: median gene expression value of each sample 0 (low) and 1 (high)

(c) 2005 SNU CSE Biointelligence Lab

16

Gene FilteringGene Filtering

Using mutual information

Estimated probabilities were used. # of genes: 3190 1000

Final dataset # of attributes: 1001 (one for the class)

Class: 0 (after treatment), 1 (before treatment)

# of data examples: 120

,

( , )( ; ) ( , ) log

( ) ( )G C

P G CI G C P G C

P G P C

(c) 2005 SNU CSE Biointelligence Lab

17

Final DatasetFinal Dataset

120

1000

(c) 2005 SNU CSE Biointelligence Lab

18

Materials for the Second OneMaterials for the Second One

Given Preprocessed microarray data file: data2.txt

Downloadable WEKA (http://www.cs.waikato.ac.nz/ml/weka/)

(c) 2005 SNU CSE Biointelligence Lab

19

Due: June 16, 2005Due: June 16, 2005

Analysis of the influence of network size and data size on structural learning of Bayesian networks Six Bayesian networks of various sizes are given. Generate data examples from each Bayesian network. Learn Bayesian network structures from the generated data. Analyze the learned results according to the network size and the data size.

Classification using Bayesian networks A microarray dataset consisting of two classes of samples is given. Learn Bayesian network classifiers from the dataset. Compare the classification accuracy of Bayesian network classifiers with that of other

classifiers such as neural networks.