+ All Categories
Home > Documents > Large-scale mining of gene expression patterns

Large-scale mining of gene expression patterns

Date post: 16-Jan-2016
Category:
Upload: dai
View: 43 times
Download: 0 times
Share this document with a friend
Description:
Large-scale mining of gene expression patterns. Paul Pavlidis [email protected]. VanBUG September 2007. Students Leon French Meeta Mistry Vaneet Lotay Postdoc Jesse Gillis Undergraduates Raymond Lim Suzanne Lane Programmers Kelsey Hamer Luke McCarthy. Genome. Synapse. - PowerPoint PPT Presentation
Popular Tags:
43
Large-scale mining of Large-scale mining of gene expression gene expression patterns patterns Paul Pavlidis Paul Pavlidis [email protected] [email protected] VanBUG September 2007
Transcript
Page 1: Large-scale mining of gene expression patterns

Large-scale mining of Large-scale mining of gene expression gene expression

patternspatterns

Paul PavlidisPaul [email protected]@bioinformatics.ubc.ca

VanBUG September 2007

Page 2: Large-scale mining of gene expression patterns

StudentsStudentsLeon FrenchLeon FrenchMeeta MistryMeeta MistryVaneet LotayVaneet Lotay

PostdocPostdocJesse GillisJesse Gillis

UndergraduatesUndergraduatesRaymond LimRaymond LimSuzanne LaneSuzanne Lane

ProgrammersProgrammersKelsey HamerKelsey HamerLuke McCarthyLuke McCarthy

Page 3: Large-scale mining of gene expression patterns

Synapse Genome

Signal transduction

Synaptic modulation

InjuryStress

DiseaseAging

Development

Page 4: Large-scale mining of gene expression patterns
Page 5: Large-scale mining of gene expression patterns

TopicsTopics

• Connectivity database and analysis• Gene expression data re-use system• Scaling up gene coexpression analysis• Applications and ongoing work

Page 6: Large-scale mining of gene expression patterns

Another ‘omeAnother ‘ome

Page 7: Large-scale mining of gene expression patterns

Leon French, Suzanne Lane

Page 8: Large-scale mining of gene expression patterns
Page 9: Large-scale mining of gene expression patterns

Growth of GEO

0

20000

40000

60000

80000

100000

120000

Dec-99 Apr-01 Sep-02 Jan-04 May-05 Oct-06 Feb-08

Date

Su

bm

iss

ion

s

Page 10: Large-scale mining of gene expression patterns

Age

Genes

SamplesWith JJ Mann, V Arango, E Sibille et al.

Page 12: Large-scale mining of gene expression patterns

Age

Genes

SamplesData from http://national_databank.mclean.harvard.edu/

Page 13: Large-scale mining of gene expression patterns
Page 14: Large-scale mining of gene expression patterns

GEO

Page 15: Large-scale mining of gene expression patterns

Goals for a systemGoals for a system

• Researchers should be able to put their new expression data in a wider context of previous studies without extraordinary effort.

• Move analyzing multiple microarray data sets from a niche activity to the mainstream

• Integration of other data types, domain specific information.

Page 16: Large-scale mining of gene expression patterns

CoexpressionDifferential expression

Public data sources

Page 17: Large-scale mining of gene expression patterns

Challenges to comparing data Challenges to comparing data setssets

• Need to match genes/transcripts across platforms• Data from third parties not always easy to handle• Varying scales, normalization, etc.• Varying data quality• Varying levels of “raw data” available• Selecting appropriate data to compare

Page 18: Large-scale mining of gene expression patterns

With Cincinnati Children’s Hospital (D.Glass, M. Barnes et al.)

Page 19: Large-scale mining of gene expression patterns

Fraction of probes with alignments

Fre

qu

en

cy

0.0 0.2 0.4 0.6 0.8 1.0

05

10

15

20

Fraction non-specific probes

Fre

qu

en

cy

0.0 0.2 0.4 0.6 0.8 1.0

02

46

81

01

21

4

Probe specificity (or lack Probe specificity (or lack thereof)thereof)

Page 20: Large-scale mining of gene expression patterns

Which data sets are reasonable to Which data sets are reasonable to compare?compare?

All mouse data sets

Mouse brain data sets

Mouse neocortex data sets

Mouse neocortex data sets examining stress

Mouse neocortex data sets examining hypoxic stress

Mouse neocortex data sets examining hypoxic stress after 3 hours of hypoxia

Too general, but lots of power

Very specific, low power

Page 21: Large-scale mining of gene expression patterns
Page 22: Large-scale mining of gene expression patterns
Page 23: Large-scale mining of gene expression patterns

Expression experiments 519 Mus musculus 254 Homo Sapiens 203

Rattus norvegicus 62 Array Designs: 178 Assays (i.e., chips): 20837 Coexpression links (probe-level): >100 million

Page 24: Large-scale mining of gene expression patterns

Scaling up analysis of gene Scaling up analysis of gene coexpressioncoexpression

• Genes that are coexpressed tend to have related function• Needed at the same place at the same time• “Guilt by association”

• Reasonable to compare across studies

Samples

Exp

ress

ion

Eisen et al., 1998 PNAS

Two ribosomal protein genes.

Page 25: Large-scale mining of gene expression patterns

Biological noiseBiological noise• Induced gene expression effects are often small.• Gene expression varies between “replicates” in

biologically-meaningful ways. • Allows us to repurpose data

Sample type

Page 26: Large-scale mining of gene expression patterns

Functional coexpression should be Functional coexpression should be (somewhat) generalized(somewhat) generalized

• If two genes are coexpressed under one condition, they will probably be coexpressed under at least some other conditions (or data sets).

• Coexpression seen “only once” needs special care in interpretation.• We shouldn’t expect coexpression to be perfectly reproducible (for biological

and technical reasons)

Correlation Correlation

Page 27: Large-scale mining of gene expression patterns

Genome Research, June 2004

A simple approach:

Count Recurring patterns

Page 28: Large-scale mining of gene expression patterns

Pipeline for one datasetPipeline for one dataset

Page 29: Large-scale mining of gene expression patterns
Page 30: Large-scale mining of gene expression patterns

Proof of concept analysisProof of concept analysis

• 60 human data sets, 15700 RefSeq genes.• 70% cancer data• 11 million “links”• About 9.7 million different links

Page 31: Large-scale mining of gene expression patterns

Many links are replicated across Many links are replicated across studiesstudies

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1 10 100

Minimum number of data sets link is seen in

Nu

mb

er

of

lin

ks

Observed

Shuff led database (mean)

Page 32: Large-scale mining of gene expression patterns

Evaluation on biological Evaluation on biological groundsgrounds

Page 33: Large-scale mining of gene expression patterns

Cluster involving NMDAR1 Cluster involving NMDAR1 (GRIN1)(GRIN1)

Page 34: Large-scale mining of gene expression patterns

ATP6V0A1PLD3

GRIN1

Allen Brain Institute

Page 35: Large-scale mining of gene expression patterns

Application: analysis of imprinted Application: analysis of imprinted genesgenes

Laurent Journot, INSERM – Universités Montpellier

Page 36: Large-scale mining of gene expression patterns

Ewing et al, 2007 Molecular Systems Biology

Cor

rela

tion

p-va

lue

LYAR interacting proteinsLYAR interacting proteins

LYAR-interactors

Page 37: Large-scale mining of gene expression patterns

Vote counting limitationsVote counting limitations

• Weak evidence distributed across data sets will not be picked up.

• This example meets strict “vote counting” criteria in only 2/23 data sets

Correlation

Page 38: Large-scale mining of gene expression patterns

2 4 6 8 10 12 14

-1.0

-0.5

0.0

0.5

1.0

Support (datasets)

Glo

ba

l effe

ct s

ize

Cor

rela

tion

(Glo

bal)

Support (# of datasets)

Page 39: Large-scale mining of gene expression patterns

Gen

es p

airs

Datasets

Related work: Zhou XJ et al., Nat.Biotech 2005

Page 40: Large-scale mining of gene expression patterns

SummarySummary

• Reuse of public data: ‘adding value’• Meta-analysis of coexpression• Some applications

• Functional prediction• Candidate identification• Platform evaluation

Page 41: Large-scale mining of gene expression patterns

Ongoing and future workOngoing and future work• Applications and analyses

• Protein interactions and hubs• Prediction of gene function at the synapse• Differential expression analysis

• Regionalization• Mouse models of brain injury• Mouse models of psychosis

• Expanding our public database and softwarehttp://www.bioinformatics.ubc.ca/GemmaWeb-based tools for biologists; web services coming soon

• Integration with other information sources

Page 42: Large-scale mining of gene expression patterns

ThanksThanksGemmaXiang Wan Kelsey HamerLuke McCarthyKiran KeshavSuzanne LaneMeeta MistraJesse Gillis

Joseph SantosGozde CozenDavid QuigleyAnshu SinhaSpiro PantazatosWei-Keat Lim

TmmHomin LeeAmy HsuJon SajdakJie QinTzu-Lin Hsaio

And to:

NCBI GEO team

Groups who made data available

Collaborators who provided data prior to publication

Conrad Gilliam

Abraham Palmer

Andreas Kottmann

Etienne Sibille

CollaboratorsBarclay MorrisonJoseph GogosMichael HaydenBlair LeavittTony BlauPanos Papapanou

Page 43: Large-scale mining of gene expression patterns

Answers to FAQsAnswers to FAQs

• No, they don’t have to be time course experiments.• Yes, we’re using cDNA as well as Affymetrix etc.• Yes, we see reproducible negative correlations.• Yes, we’re interested in finding differences as well as

similarities between data sets.• No, we aren’t necessarily inferring regulatory relationships• Yes, we know that RNA is just one way of measuring cell

state.• No, we don’t have {worm,fly,yeast…} data, but we’d like to.


Recommended