Date post: | 21-Jan-2016 |
Category: |
Documents |
Upload: | rosamund-chandler |
View: | 220 times |
Download: | 0 times |
A literature network of human genes for high-throughput analysis of gene expression
Speaker : Shih-Te, Yang Advisor : Ueng-Cheng, Yang
The institute of biochemistry, NYMUBioinformatics program and core lab
Tor-Kristian Jenssen, Astrid Laegreid, Jan Komorowski & Eivind HovigNature Genetics. Volume 28. may2001
Goals for system biology
?
Cell., 100(1):57–70 Review, 2000.PNAS, Vol. 95, 14863-14868
How to Find Biologically Significant Events Using Microarray Tech?
Fitting to current knowledge
Sifting out variations
Mapping Gene Expression Data to KEGG Pathways
Linking Molecular Information to Phenotypes Can Provide
Insights to Biological Processes
Pathways: metabolic, signal transduction, etc.
Phenotype: angiogenesis, metastasis
Information Hidden in Literature
Molecular functions Protein-protein interactions Protein-DNA (RNA) interactions
Phenotypic information Physiological and pathological
processes (ex. Angiogenesis, tumor metastasis)
Drug and chemical response
No Efficient Way to Find Genes Related to Angiogenesis
http://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?db=0&form=1&term=angiogenesis
Strategies of Literature Mining
Keyword indexing (a gene) protein annotation
Semantics (語意學 ) (genes) Protein binding and interaction
Keyword co-occurrence (terms and genes) Biomedical terms vs genes ->
biological processes
Medicine and Related Subjects from MeSH Classified by NLM
http://wwwcf.nlm.nih.gov/class/schedule.html
Gene Ontology (GO) Can Provide Links between Biological Processes and Genes
Approach to construct the literature network (part one)
Step One: gene-to-term co-associated to a common set of articles
Articles
Gene Termannotation
Index Index
•MeSH•Gene OntologyTM
Approach to construct the literature network (part two)
Step Two: gene-to-gene co-citation (co-mentioned, co-occurrence)
Articles
Gene B
Gene AIndex
IndexBiological relation
Global approach
Network Extension and Expansion
Linking gene-gene, gene-term, and term-term relations
Term 2(Metastasis)
Gene 5
Term 1 (Angiogenesis)Gene 1
Gene 3Gene 4
Gene 2
Research design step by step logically
Mapping/matching symbol to gene
Filtering procedure
Gene-articles indexTerm-articles index
MeSHGene OntologyTM
Gene-gene networkGene-term network
PubGene Database
Gene network browser
Internet
PubGeneTM Gene Database and Toolshttp://www.pubgene.org/
Automated indexing of named human genes
Gene nomenclature Database(13712)
HUGO(9722)
LocusLink(2729)
GENATLAS(1239)
GDB(358)
•Primary symbol•Gene name•Alternative symbol
63 63 352
14048
13570(142)
Contribution to the gene-to-article index over time
The total number of gene occurrences
The MEDLINE before 1975 don’t contain abstracts
More articles of the years 1999 & 2000 were expected to be include into MEDLINE
Distribution of genes with respect to the number of
articles found to be reverent
Distribution of genes with respect to the number of gene neighbors
•The histogram show ‘smoothed’ values.•The distribution of genes by article ref. is almost exponentially decreasing.
Genes tended to be mentioned in triplets almost as much as
for the ref.
Types of gene relationships found in PubGene To examine over-represented or incorrectly assigned relationship
(40%) (29%)
Symbols belong to more than one gene symbol
Very general symbols coinciding with general acronyms
Very short gene name
DIPC(171,2)OMIMC(6404,2)? 8643?
•DIP: “Number of actual links” “Number of genes”•OMIM: “Number of genes” “Number of actual links” •“Number of actual links” “PubGene” “Number of actual link found in PubGene”•“Number of possible links” “PubGene” “Number of all links found in PubGene”
Comparison of PubGene with manually curated database To examine the under-represented gene pairs
(51%) (45%)
(a) insufficient synonym lists(b) synonym case variation(c) complex gene family with immature or complex naming convention
Reasons for under-representation of DIP derived gene pairs
The sum up from the verification of DIP and OMIM
The numbers of interactions in DIP and OMIM contained in PubGene reflect that PubGene captures substantial amounts of the existing biological information on protein-protein interactions and on gene mapping and disease.
Linking relations to expression profiles (microarray, proteomics
etc.)
Term 2(Metastasis)
Gene 5
Term 1 (Angiogenesis)Gene 1
Gene 3Gene 4
Gene 2
Time series, expression levels, patterns, etc.
Verify the applicability of the tools by analyzing two publicly available microarray data sets
Discrimination analysis: Literature associations highlight background
knowledge for signature genes in patient sample data.
Kinetic & mechanism studyDetection of complex co-regulatory patterns between
biologically related genes.
The “signature gene cluster” from
unsupervised hierarchical clustering analysis
(Nature. 403, 503-511)
•Cell type•Biological process
To explore the correlation between unsupervised clustering and supervised PubGene approach
(Nature. 403, 503-511)
•4062 clones 1032symbol(PubGene) 50(up/down regulated)•(7+14)/50=42%•6%(1302,50) B-cell signature•42/6=7 x significant compare to the random
Network of the genes in the GC-B signature
•GC-B signature 25genes only 20genes map to network+the most important neighbors
•Underlying biological relationship between these genes
•Link signature gene to disease MeSH term Fragile X, Angelman syndrome, lymphoma, leukaemia,…
•Link signature gene to Gene Ontologytranscriptional regulator
Translocation in lymphomas
Immunoglobulinrecombination
To visualize complex co-regulatory patterns of gene expression and simultaneously highlight biological
relationships
1hour 8hour
(from Science. 283, 83-87)
Transcription factors
8613clones 517clones 340 genes + 1hour-expression level superimpose into sub-network of PubGene
Angiogenesis
Rapid profiling of genes through the distribution of MeSH terms6 hour 1 hour
•MeSH indexing: the identification of strong association between genes and biological process•Liking literature network to MeSH-terms•‘angiogenesis’ 10/12 (highest fraction)
(from Science. 283, 83-87)
MeSH index
Summary
With the indexing strategy (gene-gene & gene-term co-citation), rich and varied information content and analytical flexibility, can incorporate more of the available biological knowledge for high-throughput gene expression analysis than any other analytical tool available.
Web-base solution and multiple-query can offer end-user literature information to microarray data by global and systematical view.