Date post: | 30-May-2018 |
Category: |
Documents |
Upload: | saritanair |
View: | 217 times |
Download: | 0 times |
of 33
8/14/2019 Presentation July15 08
1/33
MeV: An overview ofcontributions/work accomplished
Prepared by: Sarita Nair
8/14/2019 Presentation July15 08
2/33
Outline
o Annotation model integration
o Revamped MeV file loaders
o Addition of two new GEO file format support in MeV
o GSEA Algorithm and software development status
8/14/2019 Presentation July15 08
3/33
What exactly is this Annotation model andDo we really care about it?
o MeV provided a bare bones option to the user forloading gene annotations.o User could upload any information deemeduseful, as long as it satisfied MeVs annotation file
format.
This worked fine TILLwe saw the growingpopularity of Affymetrix as a preferred platform byMeV users for microarray analysis.SO..
We came up with a solution to provide the LATESTannotation information through our popular web basedresource RESOURCERER
8/14/2019 Presentation July15 08
4/33
Simple, step by
step instructions
Just Click and Go
Shows youwhere to findyour file
Shows the chiptypescorrespondingto the organismselected.
q User friendly interface
8/14/2019 Presentation July15 08
5/33
Individual gene profiles can be
viewed in the SpotInformation dialog box
Future scope: Plan on linking each ofthese labels to their respective webresources
After you load theannotations.
8/14/2019 Presentation July15 08
6/33
New!! GEO file loaders
q Re wrote the existing GEO (Gene Expression Omnibus) file loadersto make it compatible to the new NCBI file format
q Added support for two new GEO file formats; Series Matrix and GDS
8/14/2019 Presentation July15 08
7/33
GEO files can contain multiple samples; loaders aredesigned such that the order of the probes in the sampleswithin a data file DOES NOT matter.
Flags an error to the user if probes are found to be missing
in any of the samples.
Does not require the user to load platform informationprovided by GEO separately.
User can load files which are data only or platforminformation+data using the GEO SOFT Two Color andAffymetrix file loaders
8/14/2019 Presentation July15 08
8/33
Nowfor the revamped loaders
8/14/2019 Presentation July15 08
9/33
Better organized and easy tounderstand the flow
NEW Format
OLD Format
click here Look here
8/14/2019 Presentation July15 08
10/33
No moreguessing theexperiment
platform
Separate panels (File, Annotation, Expression Table etc) == easy navigation + better UI
8/14/2019 Presentation July15 08
11/33
Spell check, people!?Affymatrix : Neverheard of that before!
Versus
8/14/2019 Presentation July15 08
12/33
FinallyAccessible color scheme
q Added a color scheme accessible to people withcolor blindnessq Available under Display Color Scheme menu
Color palette adapted from the web based
resource:http://jfly.iam.u-tokyo.ac.jp/color/index.html
http://jfly.iam.u-tokyo.ac.jp/color/index.htmlhttp://jfly.iam.u-tokyo.ac.jp/color/index.html8/14/2019 Presentation July15 08
13/33
Gene Set Enrichment Analysis
( Jiang, Z., Gentleman, R., (2007). Bioinformatics.2007 Feb 1;23(3):306-13.
Extensions to gene set enrichment)
8/14/2019 Presentation July15 08
14/33
What is the goal of GSEA?
oGSEA has been developed to capture changes in theexpression of pre-defined set of genes (Zhen Jiang and
Robert Gentleman)
o
GSEA aggregates per gene statistics across geneswithin a gene set
Are there any advantages to usingGSEA over the traditional gene-centricanalysis?
8/14/2019 Presentation July15 08
15/33
Advantagescontd
GSEA has been found to perform reasonably in situationswhere:
oAll genes in a predefined gene set change in a small butsignificant way.
oThe effect is due to large changes in a relatively few genes
8/14/2019 Presentation July15 08
16/33
GSEA (Gene Set Enrichment Analysis)- Method
Given a data set with N probes and k samples
2) Collapse the probes to genes. If multiple probes map to a gene, use max or median.
I have used Maxhere
8/14/2019 Presentation July15 08
17/33
Remove probes with no gene symbols associated
5) Filter out the genes from your gene set that are
NOT present in the data.
.which brings us to What on earth aregene-sets?
GSEA (Gene Set Enrichment Analysis) Method.contd
8/14/2019 Presentation July15 08
18/33
Gene sets can be, say a list of genes, which are:
C)Present on a chromosomeD)Annotated by the same GO termE)Present in the pathway of your interest
F) Significant genes from other experiment, ETC
MIT provides a publicly available collection of gene sets(Molecular Signatures Database --MSigDB)
..You can create your own gene sets relevant to theprocess you are investigating
GSEA (Gene Set Enrichment Analysis) Method.contd
8/14/2019 Presentation July15 08
19/33
GSEA (Gene Set Enrichment Analysis) Method.contd
3) Calculate the per-gene statistic using LinearModel(just like you would calculate t test statistic on a gene by
gene basis)
7) Calculate the per gene-set statistic
.If you are already scratching your head..wait till you see
the ugly formula!
8/14/2019 Presentation July15 08
20/33
Genes Gene-
sets
Matrix containing the pergene statistic ; calculatedearlier
Row sum of A = = number ofgenes/gene-set
Association Matrix
MatrixcontainingGene-setstatistic
8/14/2019 Presentation July15 08
21/33
GSEA (Gene Set Enrichment Analysis) Method.contd
6) Estimate significance Permute factor labels Compute per-gene statistic for every permutation Compute the gene-set statistic Calculate and report the unadjusted p values
8/14/2019 Presentation July15 08
22/33
GSEA software implementation
8/14/2019 Presentation July15 08
23/33
What all would you need to run GSEA ?
o Expression data
o Group assignments
o Gene sets
o Chip annotations
8/14/2019 Presentation July15 08
24/33
GSEA- Expression data
oExpression data should be in MeV compatible format.
oCollapse probes to genes. Current implementation uses genesymbols as the gene identifier
oRemoves probes with no gene symbols associated with them
oGSEA does NOT impute any missing values or filter out geneswith too many missing values.
oGSEA does NOT filter any genes with low expression values
8/14/2019 Presentation July15 08
25/33
Group assignments
o Categorical labelsCurrently GSEA assumes that all the group labels are
categorical.
8/14/2019 Presentation July15 08
26/33
GSEA Gene Sets Database
o
Available from the MIT MSigDB data base
o 3000 gene sets available- C1: Cytogenetic bands
- C2: Curated gene sets- C3: Motif gene sets- C4: Computational gene sets
o GSEA removes the genes in the gene set thatare not in the expression data
8/14/2019 Presentation July15 08
27/33
File extension: .GMX (Tab delimited format)
MeV recognizes thisformat
8/14/2019 Presentation July15 08
28/33
File extension: .GMT (Tab delimited)
And this one,too
8/14/2019 Presentation July15 08
29/33
GSEA Annotation file
o MeV by default uses annotations provided byResourcerer
o User can create his/her own file and load it.as long as
it complies to the standard annotation file format used byMeV.
8/14/2019 Presentation July15 08
30/33
Existing screen shots of GSEA
8/14/2019 Presentation July15 08
31/33
Screen shots of GSEA
8/14/2019 Presentation July15 08
32/33
Current status of the project
Have completed implementing all the math
Currently implementing the viewers. Plan to implement thebasic table viewer with gene set names as rows and two columnscontaining p values
Plan to release a working version some time soon
8/14/2019 Presentation July15 08
33/33
References
o Jiang, Z., Gentleman, R., (2007). Bioinformatics. 2007 Feb1;23(3):306-13.Extensions to gene set enrichment
o GSEA User Guidehttp://www.broad.mit.edu/cancer/software/gsea/doc/GSEAUse
http://www.broad.mit.edu/cancer/software/gsea/doc/GSEAUserGuideFrame.htmlhttp://www.broad.mit.edu/cancer/software/gsea/doc/GSEAUserGuideFrame.html