+ All Categories
Home > Documents > Romualdi Chiara

Romualdi Chiara

Date post: 28-Jan-2016
Category:
Upload: fay
View: 44 times
Download: 0 times
Share this document with a friend
Description:
CRIBI Biotechnology Centre Università di Padova, Italy. Genomic Research University of Padova. Romualdi Chiara Improved detection of differentially expressed genes in microarray experiments through multiple scanning and image integration NETTAB 2003 Workshop - PowerPoint PPT Presentation
32
Romualdi Chiara Improved detection of differentially expressed genes in microarray experiments through multiple scanning and image integration NETTAB 2003 Workshop Bioinformatics for the management, analysis and interpretation of microarray data CRIBI Biotechnology Centre Università di Padova, Italy Genomic Research University of Padova
Transcript
Page 1: Romualdi Chiara

Romualdi Chiara

Improved detection of differentially expressed genes in microarray experiments through multiple scanning and image integration

NETTAB 2003 Workshop

Bioinformatics for the management, analysis and interpretation of microarray data

CRIBI Biotechnology Centre

Università di Padova, Italy

Genomic Research University of Padova

Page 2: Romualdi Chiara

Microarray variability

1. Inter - experiment variability

Gene probes deposited in replicates

Replicates are deposited in different region of the chip

2. Intra - experiment variability

Swap of Cy3 and Cy5

Replicate of the experiment

3. Hybridisation, labelling, amplification … variability

Global, local and surface normalization

Page 3: Romualdi Chiara

… and image variability ?

Each microarray is scanned with a single laser run for

each fluorochrome …

… intensity values of spots are calculated.

… if a single microarray undergoes multiple scanning runs, the DNA spot images obtained are not exactly superimposable…

SCANLaser

16-bit TIFFs

Log2(ch1/ch2)

Page 4: Romualdi Chiara

DNA spot images obtained from multiple scanning runs, are not exactly superimposable

IVIIIIII

Page 5: Romualdi Chiara

B = moderately expressed

A = weakly expressed

C = highly expressed

Serial scans

C

B

A

I II III IV V VI VII VIII IX X

spot

Differences in pixels intensities

Page 6: Romualdi Chiara

Pixel intensities differences

Probably only a portion of the fluorochromes is excitable by the laser beam and measurable by the photomultiplier, while the confocal

scanning system is detecting the fluorescence of a spot subregion.

Image variability

Quantification output variability Different microarray results

4% FP

Page 7: Romualdi Chiara

1) pot superimposes n Tif images (input microarray images)

VP=(pixel11, pixel12, … , pixel1n)

2) Calculates for each pixel vector of the n images:

- Pixel intensity mean (mean of VP)

- Pixel intensity maximum, exclusion of saturated pixels (Max of VP)

3) Develops a virtual Tif image that summarizes the n input ones

Novel software for image integrationhttp://muscle.cribi.unipd.it/microarrays/spot/

I1 I2 I3 I4

Page 8: Romualdi Chiara

Max

.

Mea

n

I II III IV V VI VII VIII IX X

C

B

A

B = moderately expressed

A = weakly expressed

C = highly expressed

Resulting virtual image after ten serial scans

Page 9: Romualdi Chiara

Resulting virtual image after ten serial scans:

entire microarray

Page 10: Romualdi Chiara

Serial scans and image integration improve spot (A) and background (B) uniformity

range

II minmax1

B

N. o f sc a ns

1 2 4 6 8 1 0

1 2 4 6 8 1 0

A

0.9

90.

994

0.9

980.

990.

994

0.99

8

Image uniformity improves spot detection

and quantification

Page 11: Romualdi Chiara

Serial scans and image integration improve reliability of microarray results

4% False Positives

< 1 % False Positives

Page 12: Romualdi Chiara

Competitive hybridisation with the same mRNA

Two experiments where two equal aliquots of skeletal muscle RNA (A) and heart muscle RNA (B) were labelled with Cy3 and Cy5 and challenged in competitive hybridisation.

0

20

40

60

80

100

21 4 6 8 10

N. o f sc a ns

0

20

40

60

80

100

Perc

ent

ag

e o

f de

cre

ase

of o

utlie

r sp

ots

A

B

In these case, all the Cy3/Cy5 ratios of spot intensities should lie at around 1.

Due to experimental variability, a portion of spot intensity ratios are far from 1

Number of outliers decreases with image integration

meanmax

Page 13: Romualdi Chiara

Variation of spot signal intensity with incremental number of scans

= RT-labelling of total RNA = Amino-allyl

= RT-labelling of aRNA = DNA dendrimer probe

= TSA

Spot Intensity ~40.000 units Spot Intensity ~500 units

0.4

0.6

0.8

1.0

1.2

1.4

1.6

14

2

3

5

A

2 4 6 8 10 12 14

N . O f scans

2 4 6 8 10 12 14

N . O f scans

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1

4

2

3

5

B

Page 14: Romualdi Chiara

We performed and analysed two microarray experiments hybridised with a target made with RT labelling, and TSA methodology:

In the first experiment, we challenged RNAs of skeletal and heart muscle in competitive hybridisation.

1

In the second one, we compared RNAs of dystrophic (facioscapulohumeral muscular dystrophy) and normal muscle.

2

2 replicates for each experiment with dye swapping (4 spots replicates)

SNOMAD web tool (global and local options) for data normalization

SAM for identification of differentially expressed genes

Quantification of the efficacy of the multi-scans approach in detecting differentially expressed genes

We performed and analysed two microarray experiments hybridised with a target made with RT labelling, and TSA methodology:

Page 15: Romualdi Chiara

Integrate the first 2, 4, 6, 8 and 10 serial scans and for each integration find genes differentially expressed

2

NFP: genes found to be differentially expressed with the i –th integration but not with all the subsequent

NFN: genes found to be differentially expressed with the i-th integration but not with the previous ones

4

Evaluation efficacy approach

Identify differentially expressed genes in 1 scan experiments1

CFP: genes found to be differentially expressed with 1 scans but not with the all the serial integrated images

CFN: genes found to be differentially expressed with the serial integrated images but not with 1 scan

3

Page 16: Romualdi Chiara

CFP e CFN (consistent false positives and negatives) = genes found to be differentially expressed with the integration of n scans and confirmed by all the n-i ones

Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling -

1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle

Overexpressed genes Underexpressed genes

0

5

10

15

20

25

30

35

40

2 4 6 8

Number of scans

Pe

rce

nta

ge

inc

rea

se

CFP CFN

0

2

4

6

8

10

12

14

16

18

2 4 6 8

Number of scans

Pe

rce

nta

ge

inc

rea

se

CFP CFN

Page 17: Romualdi Chiara

NFP e NFN (novel false positives and negatives) = real improvement achieved by the inclusion of each additional serial microarray image

Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling -

1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle

Overexpressed genes Underexpressed genes

0

5

10

15

20

25

30

2 4 6 8 10

Number of scans

Pe

rce

nta

ge

inc

rea

se

NFP NFN

0123456789

10

2 4 6 8 10

Number of scans

Pe

rce

nta

ge

inc

rea

se

NFP NFN

Page 18: Romualdi Chiara

Increase of the number of differentially expressed genes FSHD vs. normal – TSA -

With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle

CFP e CFN (consistent false positives and negatives) = genes found to be differentially expressed with the integration of n scans and confirmed by all the n-1 ones

0

50

100

150

200

250

300

2 4 6 8

Serial scans

Pe

rce

nta

ge

inc

rea

se

CFP CFN

0

50

100

150

200

250

300

2 4 6 8

Serial scans

Pe

rce

nta

ge

inc

rea

se

CFP CFN

Overexpressed genes Underexpressed genes

Page 19: Romualdi Chiara

Increase of the number of differentially expressed genes FSHD vs. normal – TSA -

With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle

NFP e NFN (novel false positives and negatives) = real improvement achieved by the inclusion of each additional serial microarray image

0

50

100

150

200

250

300

2 4 6 8 10

Serial scans

Pe

rce

nta

ge

inc

rea

se

NFP NFN

020406080

100120140160180200

1 2 3 4 5

Serial scans

Pe

rce

nta

ge

inc

rea

se

NFP NFN

Overexpressed genes Under expressed genes

2 4 6 8 10

Page 20: Romualdi Chiara

Relationship between CFN and their spot intensities

Dystrophic vs. normal muscle Skeletal muscle vs. heart

Spot Intensity Spot IntensityF

req

uen

cy

Fre

qu

ency

The greatest improvement in differentially expressed genes revealed by multi-scan approach concerns weakly expressed

genes.

Cy5Cy3

Page 21: Romualdi Chiara

ΣPOT results validation with RT-PCR semi-quantitative

sk. muscleheart

CFN, over expressed in sk. muscle

(1) myosin-binding protein C, fast type

(2) titin

(3) human DNA sequence

(4) human DNA sequence

(5) H.sapiens mRNA for striate muscle-specific hypothetical protein (ORF1), clone 00275

(6) human DNA sequence

(7) H.sapiens acetyl-coenzyme A transporter

(8) human autoantigen small nuclear ribonucleoprotein Sm-D

CFN, underexpressed in sk. muscle

(9) troponin T2, cardiac

(10) alpha-actin, cardiac muscle

(11) myosin-binding protein C, cardiac

(12) H.sapiens heat shock 90 kDa protein 1, alpha

(13) H.sapiens haplotype M*2 mitochondrion

(14) H.sapiens chromosome 5, BAC

(15) H.sapiens macrophage migration inhibitory factor (glycosylation-inhibiting factor)

(16) H.sapiens ring finger protein 28

CFP

(17) H.sapiens clone alpha_est218/52C1

(18) H.sapiens CD27-binding (Siva) protein transcript variant 1

(19) human skeletal muscle 1.3 kb mRNA for tropomyosin;

(20) H.sapiens cathepsin H

Page 22: Romualdi Chiara

Conclusions

RT-labelling :Many FP (~ 10% of differentially expressed genes found with 1 scan)Many FN (~ + 50% of differentially expressed genes found with 1 scan)

TSA-labelling :Small number of FPHighly increasing of FN (~ + 200%)

4-6 scans seems to be the best number of scans required for a satisfactory inprovement in detecting differentially expressed genes

Maximum and mean results overlap for the 80% of FP and FN transcripts

Integration of pot into scanner softwares

Future work

Page 23: Romualdi Chiara

Technical details

pot is written in C language with libtiff libraries, it runs on UNIX system

SAM http://www-stat.stanford.edu/~tibs/SAM/index.html

SNOMAD http://pevsnerlab.kennedykrieger.org/snomadinput.html

Spotting device: GenePackArray 21 with 16 stealth micro pins

Scanner: Perkin Elmer LITE dual confocal laser scanner with software Scan Array

Image analysis software: QuantArray

HumanMuscleArray: http://muscle.cribi.unipd.it/microarrays/human.html

Page 24: Romualdi Chiara

Acknowledgements

Gerolamo Lanfranchi project supervisor

Microarray Team

Silvia Trevisan, Barbara Celegato,

Bioinformatics Team

Germano Costa, Micky Del Favero

Reference

Romualdi Chiara et al. (2003) Nucl. Acids. Res. 31: e149.

Web sites

http://muscle.cribi.unipd.it/microarrays/

http://muscle.cribi.unipd.it/microarrays/spot/

Genomic Research University of Padova

http://grup.cribi.unipd.it/

Page 25: Romualdi Chiara

Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling -

1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle

  2 scans 4 scans 6 scans 8 scans 10 scans

Mean Max Mean Max Mean Max Mean Max Mean Max

OverExp.

FP 26 (13) 19 (10) 21 (11) 21 (11) 24 (12) 19 (10) 20 (10) 30 (15) 24 (12) 24 (12)

FN 18 (9) 41 (21) 37 (19) 36 (18) 36 (18) 53 (27) 50 (25) 18 (9) 34 (17) 29 (15)

UnderExp.

FP 6 (19) 7 (22) 2 (6) 5 (16) 2 (6) 3 (9) 4 (13) 3 (9) 4 (13) 1 (3)

FN 7 (22) 12 (38) 13 (41) 15 (47) 14 (44) 18 (56) 23 (72) 15 (47) 20 (63) 20 (63)

FP (false positives) = genes found to be differentially expressed with 1 scan but not confirmed with the integration of the others

FN (false negatives) = genes found to be differentially expressed with the integration of additional scans but not with 1 scan

Page 26: Romualdi Chiara

FP (false positives) = genes found to be differentially expressed with 1 scan but not confirmed with the integration of the others

FN (false negatives) = genes found to be differentially expressed with the integration of additional scans but not with 1 scan

Overexpressed genes Underexpressed genes

Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling -

1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle

0

10

20

30

40

50

60

70

80

2 4 6 8 10

Number of scans

Pe

rce

nta

ge

inc

rea

se

FP FN

0

10

20

30

40

50

60

70

80

2 4 6 8 10

Number of scans

Pe

rce

nta

ge

inc

rea

se

FP FN

Page 27: Romualdi Chiara

  2 scans 4 scans 6 scans 8 scans 10 scans

Mean Max Mean Max Mean Max Mean Max Mean Max

Over Exp.

FP 26 (13) 19 (10) 21 (11) 21 (11) 24 (12) 19 (10) 20 (10) 30 (15) 24 (12) 24 (12)

CFP - - 19 15 18 13 15 13 16 14

FN 18 (9) 41 (21) 37 (19) 36 (18) 36 (18) 53 (27) 50 (25) 18 (9) 34 (17) 29 (15)

CFN - - 15 20 22 20 24 17 34 15

UnderExp.

FP 6 (19) 7 (22) 2 (6) 5 (16) 2 (6) 3 (9) 4 (13) 3 (9) 4 (13) 1 (3)

CFP - - 2 4 2 3 2 2 2 1

FN 7 (22) 12 (38) 13 (41) 15 (47) 14 (44) 18 (56) 23 (72) 15 (47) 20 (63) 20 (63)

CFN - - 4 7 10 9 13 11 17 14

CFP e CFN (consistent false positives and negatives) = genes found to be differentially expressed with the integration of n scans and confirmed by all the n-1 ones

Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling -

1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle

Page 28: Romualdi Chiara

NFP e NFN (novel false positives and negatives) = real improvement achieved by the inclusion of each additional serial microarray image

  2 scans 4 scans 6 scans 8 scans 10 scans

Mean Max Mean Max Mean Max Mean Max Mean Max

 Over Exp.

NFP 26 19 2 6 3 2 1 7 2 2

NFN 18 41 10 5 4 10 4 0 0 3

 UnderExp.

NFP 6 7 0 1 1 0 1 0 1 1

NFN 7 12 9 8 3 8 7 3 2 3

Increase of the number of differentially expressed genes skeletal muscle vs. heart – RT labelling -

1 scan: 200 transcripts over expressed e 31 underexpressed in the muscle

Page 29: Romualdi Chiara

  2 scans 4 scans 6 scans 8 scans 10 scans

Mean Max Mean Max Mean Max Mean Max Mean Max

Over Exp.

FP 0 (0) 0 (0) 0 (0) 0 (0) 1(1) 1(1) 0 (0) 1(1) 0 (0) 0 (0)

FN110 (74)

90(61)

154 (104)

131(89)

184 (124)

137(93)

198 (134)

158 (107)

207 (140)

169 (114)

UnderExp.

FP 2 (2) 2 (2) 2 (2) 2 (2) 1 (1) 2 (2) 2 (2) 2 (2) 2 (2) 2 (2)

FN175

(164)157

(147)214

(200)198

(185)229

(214)191

(179)263

(246)212

(198)255

(238)244

(228)

Increase of the number of differentially expressed genes FSHD vs. Normal – TSA -

With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle

FP (false positives) = genes found to be differentially expressed with 1 scan but not confirmed with the integration of the others

FN (false negatives) = genes found to be differentially expressed with the integration of additional scans but not with 1 scan

Page 30: Romualdi Chiara

Increase of the number of differentially expressed genes FSHD vs. normal – TSA -

With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle

FP (false positives) = genes found to be differentially expressed with 1 scan but not confirmed with the integration of the others

FN (false negatives) = genes found to be differentially expressed with the integration of additional scans but not with 1 scan

0

50

100

150

200

250

300

2 4 6 8 10

Serial scans

Pe

rce

nta

ge

inc

rea

se

FP FN

0

50

100

150

200

250

300

1 2 3 4 5

Serial scans

Pe

rce

nta

ge

inc

rea

se

FP FN

Overexpressed genes Underexpressed genes

2 4 6 8 10

Page 31: Romualdi Chiara

  2 scans 4 scans 6 scans 8 scans 10 scans

Mean Max Mean Max Mean Max Mean Max Mean Max

Over Exp.

FP 0 (0) 0 (0) 0 (0) 0 (0) 1(1) 1(1) 0 (0) 1(1) 0 (0) 0 (0)

CFP - - 0 0 0 0 0 1 0 0

FN110 (74)

90(61)

154 (104)

131(89)

184 (124)

137(93)

198 (134)

158 (107)

207 (140)

169 (114)

CFN - - 107 85 152 117 175 131 189 150

Und.Exp.

FP 2 (2) 2 (2) 2 (2) 2 (2) 1 (1) 2 (2) 2 (2) 2 (2) 2 (2) 2 (2)

CFP - - 2 2 0 2 1 2 1 2

FN175

(164)157

(147)214

(200)198

(185)229

(214)191

(179)263

(246)212

(198)255

(238)244

(228)

CFN - - 170 149 203 173 223 179 241 197

Increase of the number of differentially expressed genes FSHD vs. normal – TSA -

With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle

CFP e CFN (consistent false positives and negatives) = genes found to be differentially expressed with the integration of n scans and confirmed by all the n-1 ones

Page 32: Romualdi Chiara

  2 scans 4 scans 6 scans 8 scans 10 scans

Mean Max Mean Max Mean Max Mean Max Mean Max

Over Exp.

NFP 0 0 0 0 1 1 0 0 0 0

NFN 110 90 47 46 31 17 21 16 14 14

 Und.Exp.

NFP 2 2 0 0 0 0 0 0 0 0

NFN 175 157 44 49 22 17 32 23 6 21

Increase of the number of differentially expressed genes FSHD vs. normal – TSA -

With 1 scan: 149 overexpressed and 107 underexpressed in normal muscle

NFP e NFN (novel false positives and negatives) = real improvement achieved by the inclusion of each additional serial microarray image


Recommended