+ All Categories
Home > Documents > Microarray Data Analysis

Microarray Data Analysis

Date post: 06-Jan-2016
Category:
Upload: beulah
View: 33 times
Download: 0 times
Share this document with a friend
Description:
Microarray Data Analysis. The Bioinformatics side of the bench. The anatomy of your data files from Affymetrix array analysis. .DAT= image file (10 7 pixels) .CEL= measured cell intensities .CDF= cell descriptions files (identify probe sets and probe set pairs) - PowerPoint PPT Presentation
Popular Tags:
55
Microarray Data Analysis The Bioinformatics side of the bench
Transcript
Page 1: Microarray Data Analysis

Microarray Data Analysis

The Bioinformatics side of the bench

Page 2: Microarray Data Analysis

The anatomy of your data files from Affymetrix array analysis

• .DAT= image file (107 pixels)

• .CEL= measured cell intensities

• .CDF= cell descriptions files (identify probe sets and probe set pairs)

• .CHP= calculated probe set data

• .RPT= report generated from .CHP

Page 3: Microarray Data Analysis

Quality Control (QC) of the chip – visual inspection• Look at the .DAT file or the .CHP file

image– Scratches? Spots?– Corners and outside border

checkerboard appearance (B2 oligo)• Positive hybridization control• Used by software to place grid over image

– Array name is written out in oligos!

Page 4: Microarray Data Analysis
Page 5: Microarray Data Analysis

Chip defects

Page 6: Microarray Data Analysis

Internal controls• B. subtilis genes (added poly-A tails)

– Assessment of quality of sample preparation– Also as hybridization controls

• Hybridization controls (bioB, bioC, bioD, cre) – E. coli and P1 bacteriophage biotin-labeled cRNAs – Spiked into the hybridization cocktail– Assess hybridization efficiency

• Actin and GAPDH assess RNA sample/assay quality– Compare signal values from 3’ end to signal values from 5’ end

• ratio generally should not exceed 3

• Percent genes present (%P)– Replicate samples - similar %P values

Page 7: Microarray Data Analysis

1. Experimental Design

2. Image Analysis – scan to intensity measures (raw data)

3. Normalization – “clean” data

4. More “low level” analysis-fold change, ANOVA, data filtering

5. Data mining-how to interpret > 6000 measures– Databases– Software– Techniques-clustering, pattern recognition

etc.– Comparing to prior studies, across platforms?

6. Validation

Microarray Data Process/Outline

Page 8: Microarray Data Analysis

Experimental Design

A good microarray design has 4 elements

1. A clearly defined biological question or hypothesis

2. Treatment, perturbation and observation of biological materials should minimize systematic bias

3. Simple and statistically sound arrangement that minimizes cost and gains maximal information

4. Compliance with MIAME (minimal information about microarray experiment)

• The goal of statistics is to find signals in a sea of noise

• The goal of exp. design is to reduce the noise so signals can be found with as small a sample size as possible

Page 9: Microarray Data Analysis

Observational Study vs. Designed Experiment

• Observational study-– Investigator is a passive observer who

measures variables of interest, but does not attempt to influence the responses

• Designed Experiment-– Investigator intervenes in natural course of

events

What type is our DMSO exp?

Page 10: Microarray Data Analysis

Experimental Replicates

• Why?– In any exp. system there is a certain amount of

noise—so even 2 identical processes yield slightly different results

– Sources?– In order to understand how much variation there is

it is necessary to repeat an exp a # of independent times

– Replicates allow us to use statistical tests to ascertain if the differences we see are real

Page 11: Microarray Data Analysis
Page 12: Microarray Data Analysis

Technical vs. Biological Replicates

As we progress from the starting material to the scanned image we are moving from a system dominated by biological effects through one dominated by chemistry and physics noise

Within Affy platform the dominant variation is usually of a biological nature thus best strategy is to produce replicates as high up the experimental tree as possible

Page 13: Microarray Data Analysis

Low level data analysis / pre-processing

• Varying biological or cellular composition among sample types.

• Differences in sample preparation, labeling or hybridization• Non specific cross-hybridization of target to probes.

Lead to systemic differences between individual arrays

• Raw Data Quality Control

• Scaling

• Normalization and filtering.

Page 14: Microarray Data Analysis

Image Analysis - Raw Data

Page 15: Microarray Data Analysis

From probe level signals to gene abundance estimates

The job of the expression summary algorithm is to take a set of Perfect Match (PM) and Mis-Match (MM) probes, and use these to generate a single value representing the estimated amount of transcript in solution, as measured by that probeset.

To do this, .DAT files containing array images are first processed to produce a .CEL file, which contains measured intensities for each probe on the array.

It is the .CEL files that are analyzed by the expression calling algorithm.

Page 16: Microarray Data Analysis

MAS 5.0 output files

• For each transcript (gene) on the chip:– signal intensity– a “present” or “absent” call (presence call)– p-value (significance value) for making that

call

• Each gene associated with GenBank accession number (NCBI database)

Page 17: Microarray Data Analysis

How are transcripts determined to be present or absent?

• Probe pair (PM vs. MM) intensities – generate a detection p-value

• assign “Present”, “Absent”, or “Marginal” call for transcript

• Every probe pair in a probe SET has a potential “vote” for presence call

Page 18: Microarray Data Analysis

PM and MM Probes

• The purpose of each MM probe is to provide a direct measure of background and stray-signal (perhaps due to cross-hybridization) for its perfect-match partner. In most situations the signal from each probe-pair is simply the difference PM - MM.

• For some probe-pairs, however, the MM signal is greater than the PM value; we have an apparently impossible measure of background.

Page 19: Microarray Data Analysis

Thank goodness for software!!!

• MAS 5.0 does these calculations for you – .CHP file

• Basic analysis in MAS 5.0, but it won’t handle replicates

• Import MAS 5.0 (.CHP) data into other software, Genesifter, GCOS, SpotFire, and many others

Page 20: Microarray Data Analysis

Signal Intensity

• Following these calculations, the MAS 5.0 algorithm now has a measure of the signal for each probe in a probeset.

• Other algortihms, ex RMA, GCRMA, dCHIP, PLIER and others have been developed by academic teams to improve the precision and accuracy of this calculation

• In our Exp we will use RMA and GCRMA

Page 21: Microarray Data Analysis

How do we want to analyze this data?• Pairwise analysis is most appropriate

– Control vs. DMSO

• List of genes that are “upregulated” or “downregulated”

• Determine fold up or down cutoffs– What is significant?

• 1.5 fold up/down? • 2 fold up/down? • 10 fold up/down?

Page 22: Microarray Data Analysis

Normalization - “clean” data

• “Normalizing” data allows comparisons ACROSS different chips– Intensity of fluorescent markers might

be different from one batch to the other– Normalization allows us to compare

those chips without altering the interpretation of changes in GENE EXPRESSION

Page 23: Microarray Data Analysis

Why Normalize Data?

•The experimental goal is to identify biological variation (expression changes between samples)

•Technical variation can hide the real data

•Unavoidable systematic bias should be recognized and corrected

•Normalization is necessary to effectively make comparisons between chips-and sometimes within a single chip.•There are different methods of normalization the assumptions of where variation exist will determine the normalization techniques used.

•Always look at data before and after normalization

•Spike in controls can help show which method may be best

Page 24: Microarray Data Analysis

Caveat…

• There is NO standard way to analyze microarray data

• Still figuring out how to get the “best” answers from microarray experiments

• Best to combine knowledge of biology, statistics, and computers to get answers

Page 25: Microarray Data Analysis

MAS 5.0 GCRMARMA

RMA

GCRMAMAS 5.0

Venn Diagrams

Page 26: Microarray Data Analysis

Data processing is completed now what?

Fold change, ANOVA, Data filtering

Page 27: Microarray Data Analysis
Page 28: Microarray Data Analysis
Page 29: Microarray Data Analysis
Page 30: Microarray Data Analysis
Page 31: Microarray Data Analysis
Page 32: Microarray Data Analysis
Page 33: Microarray Data Analysis
Page 34: Microarray Data Analysis
Page 35: Microarray Data Analysis

Where are we now?

• Ran analysis, output is a GENE LIST– List indicates what genes are up or

down regulated– p values for t-test– Graphs of signal levels

• Absolute numbers not as important here as the trends you see

– Now what????

Page 36: Microarray Data Analysis

What is the first set of genes on our chips that will be “filtered” out?

Page 37: Microarray Data Analysis

Follow the links

• Click on a gene

• Find links to other databases

• Follow links to discover what the protein does

• Now the fun part begins….

Page 38: Microarray Data Analysis

Back to Biology

• Do the changes you see in gene expression make sense BIOLOGICALLY?

• If they don’t make sense, can you hypothesize as to why those genes might be changing?

• Leads to many, many more experiments

Page 39: Microarray Data Analysis

A Common Language for Annotation of Genes from

Yeast, Flies and Mice

The Gene Ontologies

…and Plants and Worms

…and Humans

…and anything else!

Page 40: Microarray Data Analysis

Gene Ontology Objectives

• GO represents concepts used to classify specific parts of our biological knowledge:– Biological Process– Molecular Function– Cellular Component

• GO develops a common language applicable to any organism

• GO terms can be used to annotate gene products from any species, allowing comparison of information across species

Page 41: Microarray Data Analysis

Sriniga Srinivasan, Chief Ontologist, Yahoo!

The ontology. Dividing human knowledge into a clean set of categories is a lot like trying to figure out where to find that suspenseful black comedy at your corner video store. Questions inevitably come up, like are Movies part of Art or Entertainment? (Yahoo! lists them under the latter.) -Wired Magazine, May 1996

Page 42: Microarray Data Analysis

• Molecular Function = elemental activity/task

– the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity

• Biological Process = biological goal or objective

– broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

• Cellular Component = location or complex– subcellular structures, locations, and macromolecular complexes;

examples include nucleus, telomere, and RNA polymerase II holoenzyme

The 3 Gene Ontologies

Page 43: Microarray Data Analysis

Function (what) Process (why)

Drive nail (into wood) Carpentry

Drive stake (into soil) Gardening

Smash roach Pest Control

Clown’s juggling object Entertainment

Example: Gene Product = hammer

Page 44: Microarray Data Analysis

Biological ExamplesMolecular FunctionBiological Process Cellular Component

Page 45: Microarray Data Analysis

Validation

• Not enough to just do microarrays

• Usually “validate” microarray results via some other technique– rt-PCR– TaqMan– Northern analysis– Protein level analysis

• No technique is perfect…

Page 46: Microarray Data Analysis

Yeast Genome and Data Mining

Page 47: Microarray Data Analysis

Dynamic Nature of Yeast Genome

eORF= essential

kORF= known

hORF= homology identified

shORF= short

tORF= transposon identified

qORF= questionable

dORF= disabled

First published sequence claimed 6274 genes– a # that has been revised many times, why?

Page 48: Microarray Data Analysis

The Affy detection oligonucleotide sequences are frozen at the time of synthesis, how does this impact downstream data analysis?

6603

4373

1410

820

Page 49: Microarray Data Analysis

term: MAPKKK cascade (mating sensu Saccharomyces)

goid: GO:0007244

definition: MAPKKK cascade involved in transduction of mating pheromone signal, as described in Saccharomyces

definition_reference: PMID:9561267

Terms, Definitions, IDs

Page 50: Microarray Data Analysis

SGD

Page 51: Microarray Data Analysis
Page 52: Microarray Data Analysis
Page 53: Microarray Data Analysis

SGD public microarray data sets available for public query

Page 54: Microarray Data Analysis

Homework

1. Go to http://www.yeastgenome.org/ and find 3 candidate genes of known f(x) and one of undefined f(x) that you might predict to be altered by DMSO treatment

2. What GO biological processes and molecular mechanisms are associated with your candidate genes?

3. Where, subcellularly does the protein reside in the cell?4. What other proteins are known or inferred to interact with yours? How

was this interaction determined? Is this a genetic or physical interaction?

5. Find the expression of at least one of your known genes in another public ally deposited microarray data set?

1. Name of data set and how you found it?2. What is the largest Fold change observed for this gene in the public study?

6. Now that you are microarray technology experts can you give me 3 reasons why the observed transcript level difference may not be confirmed through a second technology like RTQPCR?

Page 55: Microarray Data Analysis

Suggested Reading


Recommended