Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | amberlynn-lucinda-ramsey |
View: | 219 times |
Download: | 2 times |
The Bytes of biological Data
Artemis G. Hatzigeorgiou
Professor of Bioinformatics
Department of Electrical and Computer Engineering University of Thessaly
Hellenic Institute Pasteur
“Athena” Research Center
What is Bioinformatics?
• Bioinformatics is generally defined as the analysis, prediction, modeling and storage of biological data with the help of computers
Next Generation Sequencing
COSTS
90%
10%
The central dogma
What are microRNAs (miRNAs)?
Gene B
Transcription
DNA
RNA
Translation
PROTEIN
miRNAs are about 22 nt long RNAs.
They post-transcriptionally regulate protein coding gene expression
MicroRNAs are involved in …
Development stem cell proliferationDivision Differentiation
regulation of innate & adaptive immunity
apoptosis cell signaling metabolism
human pathologies
Cancer viral infections cardiovascular diseases metabolic disorders neurological pathologies
psychiatric disorders renal disease hepatological conditions
autoimmune diseases gastroenterological conditions
obesity reproductive disorders
musculoskeletal disorders periodontal pathologies
Superlinear Increase of known miRNAs and relevant Research
Active Pathway Visualization
Citation:WangD,YanK-K,SisuC,ChengC,RozowskyJ,MeyersonW,etal.(2015)Loregic:AMethodtoCharacterizetheCooperativeLogicofRegulatoryFactors.PLoSComputBiol11(4):e1004132.doi:10.1371/journal.pcbi.1004132
Location of miRNAs
miR miRpromoter
Pol2
exon exon
miR miRpromoter
Pol2 70%
30%
Why are the pri-miRNA genes not annotated ?
Fast degradation in the nucleus
Megraw, M., Baev, V., Rusinov, V., Jensen, S.T., Kalantidis, K., Hatzigeorgiou, A.G. MicroRNA promoter element discovery in Arabidopsis (2006) RNA, 12 (9), pp. 1612-1619.
Recognition of Transcription Start Sites
For pri- microRNA genes
• Weight matrices of Transcription Factors• Chip-Seq data of Pol II occupancy • Chip-Seq data of histone modifications (H3K4me3) • Cap Analysis of Gene Expression (CAGE)
ChIP Sequencing Visualization
H3K4me3
Pol2
Drawback: wide range of predictions
Experimental identification of miRNA TSS’s
Drosha null/conditional-null (DroshaLacZ/e4COIN) mouse model has been generated using the conditional by inversion (COIN) methodology from Aris Economides @ REGENERON Pharmaceuticals
Economides, A.N. et al. Conditionals by inversion provide a universal method for the generation of conditional alleles. Proceedings of the National Academy of Sciences Aug 20;110(34):E3179-88 (2013).
Mir17hg
Mir92-1
Mir19b-1
Mir20a
Mir19aMir18
Mir17
GSM973235 WT mESCs 180M reads
Drosha -/- mESCs with 27M reads
Norm
alized
read
cou
nt
()
RNA-seq coverage over the Mir17hg lncRNA locus
Drosha +/+ mESCs with 19M reads
8,856 bp
RNA-seq read depth is essential!
…but ( deep RNA seq is ) not enough
miRNAsputative TSS
RNA-seq coverage
Which one is correct?
ChIP-seq information can effectively reduce putative TSS’s
miRNAs putative TSS
RNA-seq coverage
H3K4me3
Pol2
TF footprints
Algorithm - First step: identify candidate TSS’s
miRNA
coding
Apply a sliding window around miRNAs
mm10
Filter the candidate transcription start sites
putative TSS
mm10
Raw RNA-seq reads
Map reads on the reference genomes
mm10
Reads tend to cluster over the expressed genomic regions
mm10
An algorithm than can learn from examples: machine learning Here we used Support Vector Machines:A supervised machine learning approach.
Training with:
• positive examples (protein coding TSS)
• negative examples (random intergenic locations, flanking positions)
Algorithm - second step: Training of SVMs
Algorithm overview
First step
Second step
Final step
Comparison between microTSS and available algorithmsP
recis
ion Marson et al
S-Peaker
PROmiRNA
microTSS
Distance threshold
Algorithms’ Precision and Sensitivity at 1kbp distance
threshold from validated TSSs in mESC
mESCs (N=47)
Sensitivity
Precision
Marson et al 54% (20/37)
64.5% (20/31)
PROmiRNA 78.7% (37/47)
25.4% (95/373)
S-Peaker 76.5% (36/47)
18.8% (77/409)
microTSS 93.6% (44/47)
100% (44/44)
• No prediction filtering based on distance• Predictions located less than 1,000 bp from the validated TSS are
considered True Positives and the rest are considered False Positives.
• Precision = TP / (TP+FP) • Sensitivity = Correct Predictions / Total Correct
Software on microRNA.gr
Maragkakis M, Vergoulis T, Alexiou P, Reczko M et al. DIANA-microT Web server upgrade supports Fly and Worm miRNA target prediction and bibliographic miRNA to disease association. Nucleic Acids Research, 2011.
• miRNA target predictions (microT)
• miRNA validated targets (TarBase)
• miRNA genomics (miRGen)
• miRNA experimental supported targets on protein coding genes (TarBase)
• miRNA experimental supported targets on Long Non Coding genes (LincBase)
• miRNA genomics (miRGen)
• KEGG pathways analysis (mirPath)
• miRNA targets gene enrichment analysis (mirExTra)
• miRNA to disease associations
• automatic bibliographic searches
• miRNA naming history analysis
• extended connectivity to online databases
Primary data
Meta analysis
Other projects of DIANA lab on microrna.gr
Database of experimentally supported targets: DIANA-TarBase
• Initially released in 2006– The first database to catalog published experimentally
validates miRNA:gene interactions • With more than 500,000 entries, the largest experimentally
validated repository with miRNA:gene interactions• Last update DIANA-TarBase v7 http://www.microrna.gr/tarbase
S. Vlachos, M. D. Paraskevopoulou, D. Karagkouni, G. Georgakilas, T. Vergoulis, I. Kanellos, I-L. Anastasopoulos, S. Maniou, K. Karathanou, D. Kalfakakou, A. Fevgas, T. Dalamagas and A. G. Hatzigeorgiou. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucl. Acids Res. (2014)
Semi – Automatic Curation Pipeline• Automatic Detection of microRNA related articles• Formation of XML-based efficient tree-like structures• Detection of microRNA mentions • Detection of gene mentions • Detection miRNA-gene-interaction triplets• Text Scoring• Meta-Data insertion and mark-up• Score-based ranking and search capabilities
Growth of interactions per method
Evaluation in Poster # 66
Integration in ENSEMBL, the European Browser for Genomes in EBI
Long Non Coding RNAs
LncBase http://www.microrna.gr/LncBase is the largest available repository of miRNA LNC RNA interactions
• The Experimental Module contains more than 5,000 interactions between 2,958 lncRNAs and 120 miRNAs.
• The Prediction Module contains detailed information for more than 10 million interactions, between 56,097 lncRNAs and 3,078 miRNAs.
Integration into RNAcentral ( EBI )
Paraskevopoulou, M.D., Georgakilas, G., Kostoulas, N., Reczko, M., Maragkakis, M., Dalamagas, T.M., Hatzigeorgiou, A.G. DIANA-LncBase: Experimentally verified and computationally predicted microRNA targets on long non-coding RNAs (2013) Nucleic Acids Research, 41 (D1), pp. D239-D245.
miRBase
• Interconnects also entries with external resources:
DIANA-Tools Visit us @ www.microrna.gr!
More than 130,000 visits per year, based on Google Analytics!
Integration of microT & TarBase in miRBase
First release
Discussion
Check the citations of databases / webservers before publishing For example could be a question added to reviewers : Have the researcher cited properly the data used ?
Are the data used for training – testing available ?Can the data be reproduced ? Availability of databases through time – diachronic data Credibility for diachronic databases/web services
Funding: Project “TOM” that is implemented under the "ARISTEIA" Action of the "OPERATIONAL PROGRAMME EDUCATION AND LIFELONG LEARNING" and is co-funded by the European Social Fund (ESF) and National Resources.