Phenotype Prediction by Integrative Network Analysis of SNP and Gene Expression Microarrays

Post on 11-Jan-2016

28 views 0 download

description

Phenotype Prediction by Integrative Network Analysis of SNP and Gene Expression Microarrays. Hsun-Hsien Chang 1 , Michael McGeachie 1,2 1 Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School - PowerPoint PPT Presentation

transcript

1

Phenotype Prediction by Integrative Network Analysis of SNP and Gene Expression Microarrays

Hsun-Hsien Chang1, Michael McGeachie1,2

1 Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School

2 Channing Lab, Brigham and Women Hospital

September 3, 2011

2

Genetic Information Flows from DNA to RNA

• Central dogma of molecular biology.

• Research goals:– Decipher how genetic

variants influence RNA transcript expression, leading to disease formation.

– Create clinical tools to perform diagnosis & prognosis, design treatment strategies, etc.

3

Measure Genetic Variants and RNA Abundance by Microarrays

• Genetic variants are measured by single nucleotide polymorphisms (SNPs).

• Modeled by discrete (multinomial) random variables.

• Microarrays can assess 500K SNPs in parallel.

• RNA abundance is measured by transcriptional expression levels.

• Modeled by continuous (log-normal) random variables.

• Microarrays can assess 50K transcripts in parallel.

4

Identify SNP-Transcript Dependence

High expression level

Low expression level

Medium expression level

• Challenges: – Need an intelligent method to compare pairs of

500K SNPs and 50K transcripts. – Need a network analysis to capture molecular

interactions between SNPs and transcripts.

5

Reduce Dimensionality by Phenotypes

SNPs microarrays

(discrete variables)

expression microarrays (continuous variables)

Filter by Phenotypes

(Bayes factor)

Filter by Phenotypes

(Bayes factor)

6

Model SNP-Transcript Dependence

Reduced SNPs data

Reduced expression

data

S1

SM

G1

GN

A SNP can be influenced by other SNPs.

A transcript can be influenced by SNPs and other transcripts.

7

Interplay of Phenotypes, SNPs, and Transcripts

• Network analysis is performed on the reduced data set.

• For each variable, find the set of modulating variables with the highestlikelihood .

• Implement a greedy search algorithm to search the best network.

Pheno

8

Pediatric Acute Lymphoblastic Leukemia (ALL)

• Mutation of lymphoblasts leads to acute lymphoblastic leukemia (ALL).

• Two types of ALL have different responses to chemotherapies:– B-cell precursor ALL

(BCP-ALL)– common ALL (C-ALL)

9

A SNP-Transcript Network Distinguishes Pediatric Acute Lymphoblastic Leukemia

• Database from GEO with access # GSE10792.

• 28 patients; 8 with BCP-ALL and 20 with C-ALL.

• Genotyped at 100k SNPs by Affymetrix Human Mapping 100K Set microarrays.

• Expression patterns of 50k genes were profiled using Affymetrix HG-U133 Plus 2.0 platforms.

• 96% phenotype classification accuracy.

10

Functional Analysis of Signature GenesSNP/Gene Symbol Chromosome

LocationFunction

MAP1B 5q13 Cell signaling, Cell morphology, Cellular assembly

C8orf84 8q21.11 Cancer, Genetic disorder

SEMA6D 15q21.1 Cellular movement,

ID4 6p22-p21 Cellular growth

CDH2 18q11.2 Cell morphology, Cellular assembly, Cellular movement

CHRNA1 2q24-q32 Cell morphology

MYO3A 10p11.1 Genetic disorder

NID2 14q21-q22 Cell signaling

11

Conclusions

• Use phenotypes to reduce data dimensionality.

• Capture genetic flow by modeling SNP-transcript dependence networks.

• Create phenotype dependent SNP-transcript networks.

• Apply the analysis to pediatric acute lymphoblastic leukemia.