Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | felicia-barnett |
View: | 226 times |
Download: | 0 times |
1
Bio + InformaticsAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
An Overview
WWW.IBP.IR بيوانفورماتيك پرتالايرانيان
2
Outline
• Introduction• DNA• Definitions• Problems in bioinformatics• Conclusion
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
3
Sciences reach a point where they become
mathematized!“Leonard Adleman”
4
Computing Devices
• Computers→ electronic components (transistors,…)
• Brains→ biological components (neurons, …)
• Cells→ biomolecular components (DNA,…)
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
5
DNA
• Deoxyribonucleic acid: DNA• Four nucleotides (bases), or building blocks:
A, T, G, C• Zips itself up into helixes using base pairs:
→ A with T→ G with C
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
DNA is essentially digital
6
Bioinformatics
• Biomolecular computation→ idea: use biomolecules and biochemical
processes for solving computational problems
• Computational molecular biology→ goal: understand/explain biomolecular
systems and mechanisms
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
7
After going through an age of specialization, the sciences are now reuniting into a common
mode of inquiry. “The next generation could
produce a scientist in the old sense, a real generalist.”
“Leonard Adleman”
8
Biomolecular Computation
• Idea: use biomolecules and biochemical processes for solving computational problems
• Start point: Leonard Adleman, 1994→ solving the Hamilton Path Problem using
liquid-phase DNA chemistry
• Advantages:→ fast→ efficient in energy consumption→ great storage capabilities
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
9
Computational Molecular Biology
• Goal: understand/explain biomolecular systems and mechanisms
• Application of computer technology to the management of biological information.
• Using Computers to gather, store, analyze and integrate biological and genetic information.
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
Bioinformatics
10
Problems in Bioinformatics
11
Sequencing GenomesAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
GAGGGAACACAGTCTGCACACTCCTTCCGATAT
GAGGGAACACA
GTCTGCACACT
CCTTCCGATAT
12
Sequencing GenomesAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
GAGGGAACACAGTCTGCACACTCCTTCCGATAT
GAGGGAACACAGT
AGTCTGCACACTC
CTCCTTCCGATAT
13
Sequencing Genomes
• Concrete problem: Sequence assembly problem→ given: fragments of large DNA sequence with
overlaps (multiple coverage)→ want: entire sequence
• Complicating factors→ computational complexity: can be seen as a
variation of shortest common superstring problem which is known to be NP-hard
→ incorrect/missing nucleotides in fragment data
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
14
Relation btw Organisms
• Concrete problem: Phylogenetic tree inference→ given: homologous DNA sequence from multiple
species→ want: evolutionary tree relating these sequences
• Complicating factors→ errors in sequence→ complexity/quality of multiple sequence
alignment→ limited knowledge of evolutionary processes
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
15
Sequence AlignmentAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
16
DNA-Genes-Proteins
• Basic molecule of life: directly controls the fundamental biology of life
• Proteins determines the biological makeup of humans or any living organisms
• Variations and errors in the genomic DNA may lead to different diseases or disorders
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
DNA → Genes → Proteins
17
DNA → ProteinsAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
DNA (gene)
↓mRNA
↓Protein
18
Computational Gene Finding
• Given: raw sequence data• Predict:
→ coding and non-coding regions→ exons/introns→ splicing patterns→ transcription factors
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
Exon1
Exon2 Exon3
Intron1 Intron2
Exon1
Exon2 Exon3
Pre mRNA
mRNA
19
Structure Prediction
• RNA & Protein• Minimum free energy• RNA structure:
→ primary structure: Single stranded sequence of A, U, G, C
→ secondary structure: Intra-molecular base pairs among its bases
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
20
5’-GAGGGAACACAGUCUGCACACUCCUUC-3’
Secondary Structure
21
Arc Diagram Representation
22
LoopsAAACUGCUGACCGGUAACUGAGGCCUGCCUGCAAUUGCUUAACUUGGC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Hairpin loop
Interior loop
Multi loopExternal loop
Bulge loopStacked pair
23
Pseudoknotted Structure
1 2 3 4 5 6 7 8 9 10 11 12 13
24
Str. Pred. Algorithms
• Dynamic programming algorithms→ restricted class of pseudoknotted structures→ Rivas and Eddy (R&E): O(N^6)
• Heuristic algorithms→ search over the solution space
AAACUGCUGACCGGUAACUGAGGCCUGCCUGCAAUUGCUUAACUUGGC
25
Motif DiscoveryAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
26
Genes and Diseases
• Proteins perform all of life’s essential functions
• Changes in DNA sequence genome can have disastrous consequences
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
27
Real World ApplicationsAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
28
Related Aspects
• Computation models of organisms or biological systems
• Nature-inspired algorithms→ genetic algorithms→ neural networks→ ant colony optimization
• Artificial life→ life-like behavior of artificial systems→ (re)-design or biological organisms
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
29
Conclusion
• Bioinformatics: Using computers for gathering, storing and analyzing biological data
• Analyzing
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
30
Thank you!AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
Baharak Rastegari, [email protected]
Bio Informatics
31
Genetic ProcessAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
33
34
DNA
• Gene expression?• Two genes
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
DNA → Genes → Proteins
35
Genomic Sequence Data Interpretation
• Gene finding• Structure prediction• Pattern discovery• Classification • Clustering
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC
36
Understanding the Cell
• Concrete problem: Gene regulatory relationship inference→ given: expression profiles of two genes A, B→ want: decide if there is a (direct) regulatory
relationship between A and B, and whether its activating or inhibiting one
• Complicating factors→ imprecision/limitation in measuring expression
profiles→ indirect/complex regulatory relationship
AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC