+ All Categories
Home > Documents > Henrik Bengtsson [email protected] Bioinformatics Group

Henrik Bengtsson [email protected] Bioinformatics Group

Date post: 05-Jan-2016
Category:
Upload: ronna
View: 17 times
Download: 0 times
Share this document with a friend
Description:
cDNA Microarrays - an introduction. Henrik Bengtsson [email protected] Bioinformatics Group Mathematical Statistics, Centre for Mathematical Sciences Lund University. Outline. The Genomic Code The Central Dogma of Biology The cDNA Microarray Technique - PowerPoint PPT Presentation
25
Henrik Bengtsson [email protected] Bioinformatics Group Mathematical Statistics, Centre for Mathematical Sciences Lund University cDNA Microarrays cDNA Microarrays - - an introduction an introduction
Transcript
Page 1: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Henrik [email protected]

Bioinformatics Group

Mathematical Statistics, Centre for Mathematical Sciences

Lund University

cDNA MicroarrayscDNA Microarrays--

an introductionan introduction

Page 2: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Outline

• The Genomic Code

• The Central Dogma of Biology

• The cDNA Microarray Technique

• Data Analysis of cDNA Microarray Data

• Statistical Problems

• Take-home message

Page 3: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

The Genomic Code

3 180 000 000 bp

120.000 genes ? 80.000 genes ? 35.000 genes ?

or ?

22+1 chromosome pairs

Page 4: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

The Central Dogma of Biology

DNA

RNA

Protein

transcription

translation

CCTGAGCCAACTATTGATGAA

PEPTIDE

CCUGAGCCAACUAUUGAUGAA

Page 5: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

The cDNA Microarray Technique

• High-throughput measuring- 5000-20000 gene expressions at the same time

• Identify genes that behaves different in different cell populations- tumor cells vs healthy cells- brain cells vs liver cells- same tissue different organisms

• Time series experiments- gene expressions over time after treatment

• ...

Page 6: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Example of a cDNA Microarray

Page 7: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Overview

microarray

scanning

analysis

cDNA clones(probes)

PCR product amplificationpurification

printing

0.1nl / spotHybridize

RNA

Tumor sample

cDNA

RNA

Reference sample

cDNA

excitation

red lasergreen

laser

emission

overlay images and normalise

Page 8: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Creating the slides

Page 9: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group
Page 10: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

RNA Extraction & Hybridization

Hybridize

RNA

Tumor sample

cDNA

RNA

Reference sample

cDNA

Page 11: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Scanning & Image Analysis

Page 12: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Data Output

Page 13: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Biological questionDifferentially expressed genesSample class prediction etc.

Testing

Biological verification and interpretation

Microarray experiment

Estimation

Experimental design

Image analysis

Normalization

Clustering Discrimination

R, G

16-bit TIFF files

(Rfg, Rbg), (Gfg, Gbg)

Page 14: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Transformed data {(M,A)}n=1..5184:

M = log2(R/G) (ratio),

A = log2(R·G)1/2 = 1/2·log2(R·G) (intensity signal)

R=(22A+M)1/2, G=(22A-M)1/2

Data Transformation

“Observed” data {(R,G)}n=1..5184:

R = red channel signalG = green channel signal

(background corrected or not)

Page 15: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Normalization

Biased towards the green channel & Intensity dependent artifacts

Page 16: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Replicated measurements

Scaled print-tip normalization

Median Absolute Deviation (MAD) Scaling

Averaging

Page 17: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Identification of differentially expressed genes

Extreme in T values?

Extreme in M values?...or extreme in some other statistics?

Page 18: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

List of genes that the biologist can understand and verify with other experiments

Gene: Mavg Aavg T SE

2341 -0.86 10.9 -18.0 0.125 6412 -0.75 11.1 -14.7 0.102 6123 -0.70 9.8 -12.2 0.121

102 0.65 10.3 -14.5 0.136 2020 0.64 9.3 -11.9 0.118 3132 0.62 9.9 -14.4 0.090 4439 -0.62 9.7 -14.6 0.088 2031 -0.61 10.7 -13.7 0.087

657 -0.60 9.2 -13.6 0.094 502 0.58 10.0 -12.7 0.101

1239 -0.58 9.8 -11.4 0.103 5392 -0.57 9.9 -20.7 0.057 3921 0.52 11.3 13.5 0.083

...

Page 19: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Time Course Gene Expression Profiles

Page 20: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Statistical Problems10. Which genes are actually up- and down

regulated?

11. P-values.

12. Planning of experiments:- what is best design?- what is an optimal sample sizes?

13. Classification:- of samples.- of genes.

14. Clustering:- of samples.- of genes.

15. Time course experiments.

16. Gene networks.- identification of pathways

17. ...

1. Image analysis- what is foreground?- what is background?

2. Quality- which spots can we trust?- which slides can we trust?

3. Artifacts from preparing the RNA, the printing, the scanning etc.

4. Data cleanup

5. Normalization within an experiment:- when few genes change.- when many genes change.- dye-swap to minimize dye effects.

6. Normalization between experiments:- location and scale effects.

7. What is noise and what is variability?

Page 21: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Total microarray articles indexed in Medline

1995 1996 1997 1998 1999 2000 2001

0

100

200

300

400

500

600

(projected)

Year

Num

ber

of

papers

Page 22: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Acknowledgments/Collaborators

Statistics Dept, UC Berkeley:

Sandrine Dudoit

Terry Speed

Yee Hwa Yang

CSIRO Image Analysis Group, Melbourne:

Michael Buckley

Oncology Dept, Lund University:

Pär-Ola Bendahl

Åke Borg

Johan Vallon-Christersson

Lawrence Berkeley National Laboratory:

Saira Mian

Matt Callow

Endocrinology, Lund University, Malmö:

Leif Groop

Peter Almgren

Mathematical Statistics, Chalmers University:

Olle Nerman

Staffan Nilsson

Dragi Anevski

Enerst Gallo Research Inst., California:

Monica Moore

Karen Berger

Page 23: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Take-home message

• Bioinformatics is the future!

• More educated people are needed!

• Statistics is fun when it is applied!

• Master’s thesis project? Talk to us!

http://www.maths.lth.se/matstat/bioinformatics/

Page 24: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

Finding genes in DNA sequence“This is one of the most challenging and interesting problems in computational biology at the moment. With so many genomes being sequenced so rapidly, it remains important to begin by identifying genes computationally.” – Terry Speed.

Page 25: Henrik Bengtsson hb@maths.lth.se Bioinformatics Group

The Central Dogma of Biology

SequencingFragment assemblyGene finding Linkage analysis etcHomology searchesAnnotation

IsolationSequencingRNA structure predictionGene expression: microarrays etc

Protein structure prediction Protein foldingHomology searchesFunctional pathwaysAnnotation

Challenges:

DNA

RNA

Protein

transcription

translation


Recommended