+ All Categories
Home > Documents > Biostatistics Role in Microarray Analysis

Biostatistics Role in Microarray Analysis

Date post: 07-Apr-2018
Category:
Upload: geronimo-maldonado-martinez
View: 219 times
Download: 0 times
Share this document with a friend

of 44

Transcript
  • 8/3/2019 Biostatistics Role in Microarray Analysis

    1/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    2/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    3/44

    Define the elemental concept of microarrays.

    Describe the utility of the analysis of microarrays. Describe the different sources of variability among the

    analysis of microarrays.

    Describe the linear technique used to normalize

    microarray data.

    Describe the role of statistics in the normalization

    techniques described today.

    Describe the different transformation techniques

    mentioned today.

    Distinguish between the different transformationtechniques described today

    Describe the different pairwise comparisons techniques

    used to test for independence among genes.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    4/44

    Layman's term:

    A DNA microarray (also commonly known as genechip, DNA chip, or biochip) is a collection of

    microscopic DNA spots attached to a solid surface.

    Scientists use DNA microarrays to measure theexpression levels of large numbers of genessimultaneously or to genotype multiple regions of

    a genome. http://www.sciencedaily.com/articles/d/dna_microarray.

    htm

    http://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htm
  • 8/3/2019 Biostatistics Role in Microarray Analysis

    5/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    6/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    7/44

    Which genes are related.

    Which genes causes a certain disease.

    What subcategories of disease X are there.

    How certain can we be about this.

    Dont expect it to fix bad data!

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    8/44

    Microarray data are inherently highly variable. YOU are measuring mRNA levels

    Some of this variability is relevant since itcorresponds to the differential expression ofgenes.

    Unfortunately, a large portion of undesirablebiases are introduced during the many technicalsteps of the experimental procedure.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    9/44

    Biological variability

    RNA extraction

    Probe labeling Ex: dye differences

    Printing Ex: print-order, plate-order, clone variation

    Hybridization Ex: temperature, time, mixing technique

    Human

    Ex: variation between lab researchers Scanning

    Ex: laser & detector, chemistry of the fluorescent label

    Image analysis Ex: identification, quantification, background methods

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    10/44

    Raw Exploration

    Normalization

    Logarithmic Transformation (adjustment of variances)

    M vs. A plot (rotation of logarithmic transformation)

    This method adjust the median of differences to 0.

    Background Transformation (RMA background approach usedfor linear scenarios) (to minimize the noise in the observed

    plot)

    Averaging normalization techniques

    After normalization of all of the spots in the microarraychip, we average them to obtain a more stable masterslide.

    Establish the cutting points

    Nave approach (Establish cut off points by logs ratios)

    Justifiable approach (Establish cut off points by T-statistic)

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    11/44

    Statistical analysis For eachgene iwe have the hypothesis test:

    Null (neutral) hypothesis H0,i: Mi = 0 Alternative hypothesis H1,i: Mi 0

    Post-hoc pairwise comparisons

    Minimize false positives

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    12/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    13/44

    At first, your data would probably be like this:

    Large numbers are very heavy to workwith, so we need a more suitable way to

    play with them

    Observed data (R,G):

    R= signal in red channel

    G= signal in greenchannel

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    14/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    15/44

    Not be confused with the normalization in statistical procedures in

    which the purpose is to make the data distribution to a normal orGaussian distribution.

    Normalization of microarray data is aimed to correct for the systematicmeasurement errors and bias in the observed data.

    The process of normalization can be classified into linear and non linearnormalization.

    Linear= is applied to selected genes or global ones. The process is quite suitablefor consistent data.

    Non-linear= is highly precise for data at extreme values, but requires a gene setfor reference.

    The purpose of both methods is to bring each image in the microarraydata to same average brightness using statistical modeling.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    16/44

    Expectation: Most genes are non-differentially expressed

    i.e. most of the data points should be around M=0.

    Idea: Do various exploratory plots to see if this assumption is met. For example, M vs A, spatial plots, density & boxplots plots, print-

    order plots etc.

    Result: We commonly observe something like this: Measured value= real value +systematic errors+noise

    Correction: If so, normalizethe data to get rid of errors &noise: Corrected value= real value +systematic errors+noise

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    17/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    18/44

    Logarithmic Transformation

    Why Log2??...

    log2R=log2G

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    19/44

    M vs. A is basically arotation of the log2R vs.log2G scatter plot.

    Now the quantity of

    interest, i.e. the foldchange, is contained inone variable, namely M!

    Transformed data (M,A): M = log2(R) - log2(G) (log ratio)

    A = [log2(R) + log2(G)] (logintensity)

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    20/44

    R vs. G log(R) vs. log(G) M vs AR=red channel signalG=green channel signal

    M= log2(R/G)aka log-ratio

    A = log2(RG)aka log-intensity

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    21/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    22/44

    It stands for Robust Multichip Average (Irizarry, 2003)

    More robust than the Lowess (aka Loess) technique.

    Mostly used in Affymetrix microarray data.

    It is biologically sound to assume that fluorescenceintensities from a microarray experiment are composed

    of both signal and noise, and that the noise isomnipresent throughout the entire signal distribution.

    A convolution model of a signal distribution and a noise

    distribution is a good choice in such a situation.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    23/44

    Convolution model is a mathematicaloperation on two functions fand g, producing athird function that is typically viewed as a modifiedversion of one of the original functions.

    Fluorescent signal

    Observed data

    Background noise

    http://en.wikipedia.org/wiki/Operation_(mathematics)http://en.wikipedia.org/wiki/Operation_(mathematics)
  • 8/3/2019 Biostatistics Role in Microarray Analysis

    24/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    25/44

    The equation of the RMA method, E(Si|Xi=xi) willbe used as the background intensity correction forgene i(it is applied to all genes in the microarrayin order to minimize the noise from the observed

    signal).

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    26/44

    Useful when having different segmentation of the samegene.

    Combines all segmentation of the same gene into anaverage transformed single unit.

    Can apply T test to work out if the mean of data is sameor different between two conditions.

    Can apply ANOVA to work out if the mean of data issame or different across two or more conditions.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    27/44

    normalization

    Average slide

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    28/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    29/44

    Nave approach

    Establish cut off pointsby logs ratios.

    This has to be done postM vs. A transformation &background correction

    Top and bottom 0.5 of

    the absolute Mvalueshave to be shaven off.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    30/44

    Justifiable approach Establish cut off points using T-

    statistic via Significance Analysisof Microarrays*

    For replicated data, i.e.multiple measurements of the

    same thing, we trust thisapproach more if the deviation(std.dev.) is small.

    T = mean(x) / SE(x) Where

    The M axis is the only one tobe transformed by T.

    If the deviation is large, we donot trust it that much.(stickwith nave approach)

    *R package / Excel Add-In

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    31/44

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    32/44

    For eachgene iwe have the hypothesistest:

    Which genes or groups are (most) differentially

    expressed?H0,i: Mi= 0H1,i: Mi 0

    =5%

    CI= 95%

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    33/44

    Thousands of tests, i.e. each gene is tested

    againstH0: T=0.

    false positives problems are a serious threat.

    need to adjust p-values.

    Different adjustment procedures

    Pairwise comparisons post-hoc test

    Bonferroni (best in linear situations)

    Tukey

    Sidak

    Duncan

    Holm

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    34/44

    Multiple tests: a family of tests They compared a list of significant genes

    Then family-wise error (FWE) = 0.05

    Bonferroni correction: set k=p/m

    Where: k= new p-value; p= original ; m= # of posthoc performed.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    35/44

    To sort and rank data.

    To reduce data set of 1000s genes to 10s or

    100s (via Averaging NormalizationTechniques).

    As a guide in selecting which genes tovalidate more precisely and which no to.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    36/44

    Filter out bad spots.

    Adjust low intensities.

    Normalize background noise and raw data. Calculate average ratios and statistical

    significance values per gene.

    Perform pairwise post hoc comparisons tominimize false positives.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    37/44

    There are many different statistical significancemetrics.

    T-test (P values), SAM (T values), Wilcoxon RST,ANOVA (F-statistics), many more

    Just many variations on a theme!

    Choose one (or more!) wisely.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    38/44

    BUT: dont let it make decisions for you!

    There will always be false positives. (theres no

    post hoc test that can eliminate all!!)

    The most accurate tool in validating the results isthe researchers judgment, with the help of the

    keen point of view of a biostatistician of course!...

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    39/44

    You need replication and statistics to find realdifferences between genes.

    In most cases the nave approach (cutoff points bylog ratios) is notenough.

    Cutoff points by t-statisticsis a much wiser decision.

    Look out for false positives.

    Multiple testing = must adjust the pvalues.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    40/44

    Dchip

    Affymetrix

    R

    Bioconductor

    BRBArray tools (NCI biometric research branch)

    Matlab Bioinformatics Toolbox

    GeneSpring

    Partek

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    41/44

    For further reading regarding the non-linear normalization of

    microarrays please visit:

    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdf

    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdf
  • 8/3/2019 Biostatistics Role in Microarray Analysis

    42/44

    1. Good image analysis is essential. Some software are

    obsolete and not that good.

    2. Normalization is needed. We understand more now

    than a few years ago.

    3. Use at least the t-statistics to identify differentially

    expressed genes. Do not rely exclusively on log-ratios.

    4. Multiple testing must be considered for false positives;

    adjust yourp-values.

    5. Talk to a biostatistician before doing the experiments!They too have a family to feed thanks to your work!.

  • 8/3/2019 Biostatistics Role in Microarray Analysis

    43/44

    Analysis of Microarray Data

    Henrik Bengtsson [email protected]

    Brown,S. (2009). Microarray Data Analysis. September 8, MMXI.

    Retrieved from http://www.docstoc.com/docs/5822653/Microarray-Data-Analysis

    Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP.(2003). Summaries of Affymetrix GeneChip probe level data.

    Nucleic Acids Res. 31:e15.

    The Use of Statistics in Microarray Studies (Dr. Ernst Wit)

    http://www.stats.gla.ac.uk/~microarray

    Wikipedia. MA plots. September 8, MMXI.

    Retrieved from http://en.wikipedia.org/wiki/MA_plot

    mailto:[email protected]://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.stats.gla.ac.uk/~microarrayhttp://en.wikipedia.org/wiki/MA_plothttp://en.wikipedia.org/wiki/MA_plothttp://www.stats.gla.ac.uk/~microarrayhttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysismailto:[email protected]
  • 8/3/2019 Biostatistics Role in Microarray Analysis

    44/44


Recommended