Bioinformatics Topics Not Covered in this Course
BMI 730 Kun Huang
Department of Biomedical InformaticsOhio State University
Non-coding RNA MicroRNA Related Bioinformatics Issues
MicroRNA prediction and recognition Second order structure prediction Target prediction
Microbial Related Bioinformatics Metagenomics
Other Omics
Other Informatics
Non-coding RNA• Non-coding DNA
• Junk DNA• Pseudogenes• Retrotransposons - Human Endogenous
Retroviruses (HERVs)• C-value enigma (e.g., Amoeba dubia genome
has more than 670 billion bases; pufferfish genome is 1/10 of human genome)
• Findings from ENCODE – nearly the entire genome is transcribed
Non-coding RNA (ncRNA)• Any RNA molecule that is not translated into
a protein. • sRNA, npcRNA, nmRNA, snmRNA, fRNA• Also including tRNA, rRNA, snoRNA,
microRNA (miRNA), siRNA, piRNA, long ncRNA (e.g., Xist), shRNA
• Note the difference between siRNA and miRNA
Non-coding RNA (ncRNA)• RNA-induced silencing complex (RISC)• RNA-induced transcriptional silencing (RITS)
MicroRNA (miRNA)• Another level of regulation
a
p
m
1
2
b
E2F1
E2F2
E2F3 Myc
17-5p 17-3p 18a 19a 20a 19b 92-1
c
Myc E2F
mir-17-92
Reviewed by: Coller et al. (2008), PLoS Genet 3(8): e146Figures from Dr. Baltz Agula
MicroRNA (miRNA)
Non-coding RNA
MicroRNA Related Bioinformatics Issues• Secondary structure prediction• MicroRNA prediction and recognition• Target prediction
Databases
Secondary structure prediction• Applications
• RNA folding dynamics• ncRNA discovery• Microarray probe validation/comparison
Wang et al. Genome Biology 2004 5:R65
Secondary structure prediction - Physics-based models
- Minimizing free energy / Dynamical programming / other optimization schemes
- Parameters come from empirical studies of RNA structural energetics (e.g., nearest neighbor interactions in stacking base pairs using synthesized oligonucleotides)
- Restricted from experimental procedure- Scoring models are used- Most ignore sequence dependence of hairpin, bulge,
internal, and multi-branch loop energies- Multi-branch loop energies rely on ad hoc scores- Still top performance- Mfold, ViennaRNA, PKnots, RDfold, etc
Secondary structure prediction - Probabilistic approach- Stochastic context-free grammars (SCFG) – e.g., QRNA
- Specify grammar rules that induce a joint probability distribution over possible RNA structures and sequences
- Parameter easily learnt without experiments- Parameters may not have physical meanings- Performance inferior to physics-model methods
- Extensions: Conditional log-linear model (CLLM) – e.g., CONTRAfold
- Integrate the learning procedure with energy-based scoring systems
Secondary structure prediction
CONTRAfold PKnotRG
Secondary structure prediction - Comparative approach- Single sequence prediction (physics-based, SCFG) have
difficulty in searching all configurations- Structures that have been conserved by evolution are far
more likely to be the functional form
MicroRNA prediction and discovery- Experimental approach - cloning- MicroRNA array (OSU microarray facility)- Massive sequencing
- Select segments in the range of 20-25nt- Using Solexa/SOLiD sequencer- Map to genome- Enrichment analysis / peak calling- Experimental validation
MicroRNA prediction and discovery- Bioinformatics / machine learning approach
Wang et al. Genome Biology 2004 5:R65
MicroRNA prediction and discovery- Bioinformatics / machine learning approach
- Using evolutionary information
Nam, J.-W. et al. Nucl. Acids Res. 2005 33:3570-3581; doi:10.1093/nar/gki668
MicroRNA prediction and discovery- Bioinformatics / machine learning approach
- Support vector machine / need features
• Features: • Sequence features
• Nucleotide frequency counts• Total G/C content
• Folding features• Pairing propensity• Minimum free energy (MFE)
• Topological features• Packing ratio
MicroRNA target Prediction- Experimental / bioinformatics approach
- Blast can identify thousands potential targets – how to pin down the real ones?
MicroRNA target Prediction- Computational / bioinformatics approach
- Mutually exclusive transcription pattern between miRNA and its targets
- Microarray screening- Existing of complementary sequence- Context score – features - Machine learning approaches (e.g., SVM,
regression, etc)
Cell, Volume 136, Issue 2, 215-233, 23 January 2009MicroRNAs: Target Recognition and Regulatory Functions
David P. Bartel
Non-coding RNA
MicroRNA Related Bioinformatics Issues• Secondary structure prediction• MicroRNA prediction and recognition• Target prediction
Databases
Databases• MicroRNA.org:
http://www.microrna.org/microrna/getMirnaForm.do• MirBase: http://microrna.sanger.ac.uk• …
Target prediction• MIRDB• TargetScan (http://targetscan.org)• PicTar (http://pictar.bio.nyu.edu)• miRanda (part of Sanger database)• MirTarget • …
Softwares• List at
http://en.wikipedia.org/wiki/List_of_RNA_structure_prediction_software
Non-coding RNA MicroRNA Related Bioinformatics Issues
MicroRNA prediction and recognition Second order structure prediction Target prediction
Microbial Related Bioinformatics Metagenomics
Other Omics
Other Informatics
Metagenomics study of genetic material recovered directly from
environmental samples a community of spieces – e.g., microbial from the
stomach of cow Challenges:
Who are there? How many?
16S riRNA – universal primer, highly conserved, used for profiling
forward: AGA GTT TGA TCC TGG CTC AG reverse: ACG GCT ACC TTG TTA CGA CTT
Next generation sequencing – more genes (chicken-and-egg)
Community metabolism – identify metabolic pathways within the community
New challenges: comparative study
Non-coding RNA MicroRNA Related Bioinformatics Issues
MicroRNA prediction and recognition Second order structure prediction Target prediction
Microbial Related Bioinformatics Metagenomics
Other Omics
Other Informatics