+ All Categories
Home > Documents > The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of...

The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of...

Date post: 11-Jan-2016
Category:
Upload: amberlynn-lucinda-ramsey
View: 219 times
Download: 2 times
Share this document with a friend
Popular Tags:
44
The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly Hellenic Institute Pasteur “Athena” Research Center
Transcript
Page 1: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

  The Bytes of biological Data

Artemis G. Hatzigeorgiou

Professor of Bioinformatics

Department of Electrical and Computer Engineering University of Thessaly

Hellenic Institute Pasteur

“Athena” Research Center

Page 2: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

What is Bioinformatics?

• Bioinformatics is generally defined as the analysis, prediction, modeling and storage of biological data with the help of computers

Page 3: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 4: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 5: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 6: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 7: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Next Generation Sequencing

Page 8: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 9: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

COSTS

90%

10%

Author
The costs related to bioinformatics studies are one to ten when you calculate salaries for good bioinformaticians
Page 10: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 11: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

The central dogma

Page 12: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

What are microRNAs (miRNAs)?

Gene B

Transcription

DNA

RNA

Translation

PROTEIN

miRNAs are about 22 nt long RNAs.

They post-transcriptionally regulate protein coding gene expression

Page 13: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 14: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

MicroRNAs are involved in …

Development stem cell proliferationDivision Differentiation

regulation of innate & adaptive immunity

apoptosis cell signaling metabolism

human pathologies

Cancer viral infections cardiovascular diseases metabolic disorders neurological pathologies

psychiatric disorders renal disease hepatological conditions

autoimmune diseases gastroenterological conditions

obesity reproductive disorders

musculoskeletal disorders periodontal pathologies

Page 15: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Superlinear Increase of known miRNAs and relevant Research

Page 16: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Active Pathway Visualization

Page 17: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Citation:WangD,YanK-K,SisuC,ChengC,RozowskyJ,MeyersonW,etal.(2015)Loregic:AMethodtoCharacterizetheCooperativeLogicofRegulatoryFactors.PLoSComputBiol11(4):e1004132.doi:10.1371/journal.pcbi.1004132

Page 18: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Location of miRNAs

miR miRpromoter

Pol2

exon exon

miR miRpromoter

Pol2 70%

30%

Page 19: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Why are the pri-miRNA genes not annotated ?

Fast degradation in the nucleus

Megraw, M., Baev, V., Rusinov, V., Jensen, S.T., Kalantidis, K., Hatzigeorgiou, A.G. MicroRNA promoter element discovery in Arabidopsis (2006) RNA, 12 (9), pp. 1612-1619.

Page 20: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Recognition of Transcription Start Sites

For pri- microRNA genes

• Weight matrices of Transcription Factors• Chip-Seq data of Pol II occupancy • Chip-Seq data of histone modifications (H3K4me3) • Cap Analysis of Gene Expression (CAGE)

Page 21: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

ChIP Sequencing Visualization

H3K4me3

Pol2

Drawback: wide range of predictions

Page 22: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Experimental identification of miRNA TSS’s

Drosha null/conditional-null (DroshaLacZ/e4COIN) mouse model has been generated using the conditional by inversion (COIN) methodology from Aris Economides @ REGENERON Pharmaceuticals

Economides, A.N. et al. Conditionals by inversion provide a universal method for the generation of conditional alleles. Proceedings of the National Academy of Sciences Aug 20;110(34):E3179-88 (2013).

Page 23: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Mir17hg

Mir92-1

Mir19b-1

Mir20a

Mir19aMir18

Mir17

GSM973235 WT mESCs 180M reads

Drosha -/- mESCs with 27M reads

Norm

alized

read

cou

nt

()

RNA-seq coverage over the Mir17hg lncRNA locus

Drosha +/+ mESCs with 19M reads

8,856 bp

RNA-seq read depth is essential!

Page 24: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

…but ( deep RNA seq is ) not enough

miRNAsputative TSS

RNA-seq coverage

Which one is correct?

Page 25: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

ChIP-seq information can effectively reduce putative TSS’s

miRNAs putative TSS

RNA-seq coverage

H3K4me3

Pol2

TF footprints

Page 26: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Algorithm - First step: identify candidate TSS’s

miRNA

coding

Apply a sliding window around miRNAs

mm10

Filter the candidate transcription start sites

putative TSS

mm10

Raw RNA-seq reads

Map reads on the reference genomes

mm10

Reads tend to cluster over the expressed genomic regions

mm10

Page 27: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

An algorithm than can learn from examples: machine learning Here we used Support Vector Machines:A supervised machine learning approach.

Training with:

• positive examples (protein coding TSS)

• negative examples (random intergenic locations, flanking positions)

Algorithm - second step: Training of SVMs

Page 28: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Algorithm overview

First step

Second step

Final step

Page 29: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Comparison between microTSS and available algorithmsP

recis

ion Marson et al

S-Peaker

PROmiRNA

microTSS

Distance threshold

Algorithms’ Precision and Sensitivity at 1kbp distance

threshold from validated TSSs in mESC

  mESCs (N=47)

Sensitivity

Precision

Marson et al 54% (20/37)

64.5% (20/31)

PROmiRNA 78.7% (37/47)

25.4% (95/373)

S-Peaker 76.5% (36/47)

18.8% (77/409)

microTSS 93.6% (44/47)

100% (44/44)

• No prediction filtering based on distance• Predictions located less than 1,000 bp from the validated TSS are

considered True Positives and the rest are considered False Positives.

• Precision = TP / (TP+FP) • Sensitivity = Correct Predictions / Total Correct

Page 30: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 31: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Software on microRNA.gr

Page 32: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Maragkakis M, Vergoulis T, Alexiou P, Reczko M et al. DIANA-microT Web server upgrade supports Fly and Worm miRNA target prediction and bibliographic miRNA to disease association. Nucleic Acids Research, 2011.

• miRNA target predictions (microT)

• miRNA validated targets (TarBase)

• miRNA genomics (miRGen)

• miRNA experimental supported targets on protein coding genes (TarBase)

• miRNA experimental supported targets on Long Non Coding genes (LincBase)

• miRNA genomics (miRGen)

• KEGG pathways analysis (mirPath)

• miRNA targets gene enrichment analysis (mirExTra)

• miRNA to disease associations

• automatic bibliographic searches

• miRNA naming history analysis

• extended connectivity to online databases

Primary data

Meta analysis

Other projects of DIANA lab on microrna.gr

Page 33: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Database of experimentally supported targets: DIANA-TarBase

• Initially released in 2006– The first database to catalog published experimentally

validates miRNA:gene interactions • With more than 500,000 entries, the largest experimentally

validated repository with miRNA:gene interactions• Last update DIANA-TarBase v7 http://www.microrna.gr/tarbase

S. Vlachos, M. D. Paraskevopoulou, D. Karagkouni, G. Georgakilas, T. Vergoulis, I. Kanellos, I-L. Anastasopoulos, S. Maniou, K. Karathanou, D. Kalfakakou, A. Fevgas, T. Dalamagas and A. G. Hatzigeorgiou. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucl. Acids Res. (2014)

Page 34: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Semi – Automatic Curation Pipeline• Automatic Detection of microRNA related articles• Formation of XML-based efficient tree-like structures• Detection of microRNA mentions • Detection of gene mentions • Detection miRNA-gene-interaction triplets• Text Scoring• Meta-Data insertion and mark-up• Score-based ranking and search capabilities

Page 35: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Growth of interactions per method

Evaluation in Poster # 66

Page 36: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

http://www.microrna.gr/tarbase

Page 37: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Integration in ENSEMBL, the European Browser for Genomes in EBI

Page 38: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Long Non Coding RNAs

LncBase http://www.microrna.gr/LncBase is the largest available repository of miRNA LNC RNA interactions

• The Experimental Module contains more than 5,000 interactions between 2,958 lncRNAs and 120 miRNAs.

• The Prediction Module contains detailed information for more than 10 million interactions, between 56,097 lncRNAs and 3,078 miRNAs.

Integration into RNAcentral ( EBI )

Paraskevopoulou, M.D., Georgakilas, G., Kostoulas, N., Reczko, M., Maragkakis, M., Dalamagas, T.M., Hatzigeorgiou, A.G. DIANA-LncBase: Experimentally verified and computationally predicted microRNA targets on long non-coding RNAs (2013) Nucleic Acids Research, 41 (D1), pp. D239-D245.

Page 39: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 40: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

miRBase

• Interconnects also entries with external resources:

Page 41: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

DIANA-Tools Visit us @ www.microrna.gr!

More than 130,000 visits per year, based on Google Analytics!

Integration of microT & TarBase in miRBase

First release

Page 42: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 43: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.
Page 44: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Discussion

Check the citations of databases / webservers before publishing For example could be a question added to reviewers : Have the researcher cited properly the data used ?

Are the data used for training – testing available ?Can the data be reproduced ? Availability of databases through time – diachronic data Credibility for diachronic databases/web services

Funding: Project “TOM” that is implemented under the "ARISTEIA" Action of the "OPERATIONAL PROGRAMME EDUCATION AND LIFELONG LEARNING" and is co-funded by the European Social Fund (ESF) and National Resources.


Recommended