+ All Categories
Home > Documents > Presentation July15 08

Presentation July15 08

Date post: 30-May-2018
Category:
Upload: saritanair
View: 217 times
Download: 0 times
Share this document with a friend

of 33

Transcript
  • 8/14/2019 Presentation July15 08

    1/33

    MeV: An overview ofcontributions/work accomplished

    Prepared by: Sarita Nair

  • 8/14/2019 Presentation July15 08

    2/33

    Outline

    o Annotation model integration

    o Revamped MeV file loaders

    o Addition of two new GEO file format support in MeV

    o GSEA Algorithm and software development status

  • 8/14/2019 Presentation July15 08

    3/33

    What exactly is this Annotation model andDo we really care about it?

    o MeV provided a bare bones option to the user forloading gene annotations.o User could upload any information deemeduseful, as long as it satisfied MeVs annotation file

    format.

    This worked fine TILLwe saw the growingpopularity of Affymetrix as a preferred platform byMeV users for microarray analysis.SO..

    We came up with a solution to provide the LATESTannotation information through our popular web basedresource RESOURCERER

  • 8/14/2019 Presentation July15 08

    4/33

    Simple, step by

    step instructions

    Just Click and Go

    Shows youwhere to findyour file

    Shows the chiptypescorrespondingto the organismselected.

    q User friendly interface

  • 8/14/2019 Presentation July15 08

    5/33

    Individual gene profiles can be

    viewed in the SpotInformation dialog box

    Future scope: Plan on linking each ofthese labels to their respective webresources

    After you load theannotations.

  • 8/14/2019 Presentation July15 08

    6/33

    New!! GEO file loaders

    q Re wrote the existing GEO (Gene Expression Omnibus) file loadersto make it compatible to the new NCBI file format

    q Added support for two new GEO file formats; Series Matrix and GDS

  • 8/14/2019 Presentation July15 08

    7/33

    GEO files can contain multiple samples; loaders aredesigned such that the order of the probes in the sampleswithin a data file DOES NOT matter.

    Flags an error to the user if probes are found to be missing

    in any of the samples.

    Does not require the user to load platform informationprovided by GEO separately.

    User can load files which are data only or platforminformation+data using the GEO SOFT Two Color andAffymetrix file loaders

  • 8/14/2019 Presentation July15 08

    8/33

    Nowfor the revamped loaders

  • 8/14/2019 Presentation July15 08

    9/33

    Better organized and easy tounderstand the flow

    NEW Format

    OLD Format

    click here Look here

  • 8/14/2019 Presentation July15 08

    10/33

    No moreguessing theexperiment

    platform

    Separate panels (File, Annotation, Expression Table etc) == easy navigation + better UI

  • 8/14/2019 Presentation July15 08

    11/33

    Spell check, people!?Affymatrix : Neverheard of that before!

    Versus

  • 8/14/2019 Presentation July15 08

    12/33

    FinallyAccessible color scheme

    q Added a color scheme accessible to people withcolor blindnessq Available under Display Color Scheme menu

    Color palette adapted from the web based

    resource:http://jfly.iam.u-tokyo.ac.jp/color/index.html

    http://jfly.iam.u-tokyo.ac.jp/color/index.htmlhttp://jfly.iam.u-tokyo.ac.jp/color/index.html
  • 8/14/2019 Presentation July15 08

    13/33

    Gene Set Enrichment Analysis

    ( Jiang, Z., Gentleman, R., (2007). Bioinformatics.2007 Feb 1;23(3):306-13.

    Extensions to gene set enrichment)

  • 8/14/2019 Presentation July15 08

    14/33

    What is the goal of GSEA?

    oGSEA has been developed to capture changes in theexpression of pre-defined set of genes (Zhen Jiang and

    Robert Gentleman)

    o

    GSEA aggregates per gene statistics across geneswithin a gene set

    Are there any advantages to usingGSEA over the traditional gene-centricanalysis?

  • 8/14/2019 Presentation July15 08

    15/33

    Advantagescontd

    GSEA has been found to perform reasonably in situationswhere:

    oAll genes in a predefined gene set change in a small butsignificant way.

    oThe effect is due to large changes in a relatively few genes

  • 8/14/2019 Presentation July15 08

    16/33

    GSEA (Gene Set Enrichment Analysis)- Method

    Given a data set with N probes and k samples

    2) Collapse the probes to genes. If multiple probes map to a gene, use max or median.

    I have used Maxhere

  • 8/14/2019 Presentation July15 08

    17/33

    Remove probes with no gene symbols associated

    5) Filter out the genes from your gene set that are

    NOT present in the data.

    .which brings us to What on earth aregene-sets?

    GSEA (Gene Set Enrichment Analysis) Method.contd

  • 8/14/2019 Presentation July15 08

    18/33

    Gene sets can be, say a list of genes, which are:

    C)Present on a chromosomeD)Annotated by the same GO termE)Present in the pathway of your interest

    F) Significant genes from other experiment, ETC

    MIT provides a publicly available collection of gene sets(Molecular Signatures Database --MSigDB)

    ..You can create your own gene sets relevant to theprocess you are investigating

    GSEA (Gene Set Enrichment Analysis) Method.contd

  • 8/14/2019 Presentation July15 08

    19/33

    GSEA (Gene Set Enrichment Analysis) Method.contd

    3) Calculate the per-gene statistic using LinearModel(just like you would calculate t test statistic on a gene by

    gene basis)

    7) Calculate the per gene-set statistic

    .If you are already scratching your head..wait till you see

    the ugly formula!

  • 8/14/2019 Presentation July15 08

    20/33

    Genes Gene-

    sets

    Matrix containing the pergene statistic ; calculatedearlier

    Row sum of A = = number ofgenes/gene-set

    Association Matrix

    MatrixcontainingGene-setstatistic

  • 8/14/2019 Presentation July15 08

    21/33

    GSEA (Gene Set Enrichment Analysis) Method.contd

    6) Estimate significance Permute factor labels Compute per-gene statistic for every permutation Compute the gene-set statistic Calculate and report the unadjusted p values

  • 8/14/2019 Presentation July15 08

    22/33

    GSEA software implementation

  • 8/14/2019 Presentation July15 08

    23/33

    What all would you need to run GSEA ?

    o Expression data

    o Group assignments

    o Gene sets

    o Chip annotations

  • 8/14/2019 Presentation July15 08

    24/33

    GSEA- Expression data

    oExpression data should be in MeV compatible format.

    oCollapse probes to genes. Current implementation uses genesymbols as the gene identifier

    oRemoves probes with no gene symbols associated with them

    oGSEA does NOT impute any missing values or filter out geneswith too many missing values.

    oGSEA does NOT filter any genes with low expression values

  • 8/14/2019 Presentation July15 08

    25/33

    Group assignments

    o Categorical labelsCurrently GSEA assumes that all the group labels are

    categorical.

  • 8/14/2019 Presentation July15 08

    26/33

    GSEA Gene Sets Database

    o

    Available from the MIT MSigDB data base

    o 3000 gene sets available- C1: Cytogenetic bands

    - C2: Curated gene sets- C3: Motif gene sets- C4: Computational gene sets

    o GSEA removes the genes in the gene set thatare not in the expression data

  • 8/14/2019 Presentation July15 08

    27/33

    File extension: .GMX (Tab delimited format)

    MeV recognizes thisformat

  • 8/14/2019 Presentation July15 08

    28/33

    File extension: .GMT (Tab delimited)

    And this one,too

  • 8/14/2019 Presentation July15 08

    29/33

    GSEA Annotation file

    o MeV by default uses annotations provided byResourcerer

    o User can create his/her own file and load it.as long as

    it complies to the standard annotation file format used byMeV.

  • 8/14/2019 Presentation July15 08

    30/33

    Existing screen shots of GSEA

  • 8/14/2019 Presentation July15 08

    31/33

    Screen shots of GSEA

  • 8/14/2019 Presentation July15 08

    32/33

    Current status of the project

    Have completed implementing all the math

    Currently implementing the viewers. Plan to implement thebasic table viewer with gene set names as rows and two columnscontaining p values

    Plan to release a working version some time soon

  • 8/14/2019 Presentation July15 08

    33/33

    References

    o Jiang, Z., Gentleman, R., (2007). Bioinformatics. 2007 Feb1;23(3):306-13.Extensions to gene set enrichment

    o GSEA User Guidehttp://www.broad.mit.edu/cancer/software/gsea/doc/GSEAUse

    http://www.broad.mit.edu/cancer/software/gsea/doc/GSEAUserGuideFrame.htmlhttp://www.broad.mit.edu/cancer/software/gsea/doc/GSEAUserGuideFrame.html

Recommended