+ All Categories
Home > Documents > 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science...

1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science...

Date post: 19-Jan-2016
Category:
Upload: stewart-wood
View: 212 times
Download: 0 times
Share this document with a friend
42
1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan Universit y
Transcript
Page 1: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

1

Application of Algorithm Research to Molecular Biology

R. C. T. Lee

Dept. Of Computer Science

National Chinan University

Page 2: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

2

• There is one peculiar characteristics of all living organisms: We can reproduce ourselves.

• Yet, it is important that what we reproduce have to be the same as we are.

• That is, wild flowers produce the same kind of wild flowers and birds reproduce the same kind of birds.

Page 3: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

3

• Information about ourselves must be passed to our descendants.

• Question: How is this done?

• Answer: Through DNA.

Page 4: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

4

• DNA(Deoxyribonucleic Acid) can be viewed as two strands of nucleic acids formed as a double helix.

Page 5: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

5

Page 6: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

6

• There are only four types of nucleic acids in every DNA:

• A: Adenine

• G: Guanine

• C: Cytosine

• T: Thymine

Page 7: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

7

• Each strand of a DNA is a sequence of A, G, C and T.

• Yet, in each strand, A is paired with T in the other strand.

• Similarly, G is paired with C.

Page 8: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

8

Human Mitochondrial DNA Control Region

TTCTTTCATGGGGAAGCAAA

AAGAAAGTACCCCTTCGTTT

Page 9: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

9

• DNA exists in cells.

• For each living organism, there are a lot of different kinds of cells. For instance, in human beings, we have muscle cells, blood cells, neural cells etc.

• How can different cells perform different functions?

Page 10: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

10

Genes

• In each DNA sequence, there are subsequences which are called genes.

• Each gene corresponds to a distinct protein and it is the protein which determines the function of the cell.

• For instance, in red blood cells, there must be oxygen carrying protein haemoglobin and the production of this protein is controlled by a certain gene.

Page 11: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

11

Proteins

• Each protein consists of amino acids.

• There are 20 different amino acids

Page 12: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

12

Page 13: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

13

The Relationship between a Gene and its Corresponding Protein

Page 14: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

14

• As shown above, each amino acid is coded by a triplet. For instance, TTC denotes PHE(Phenylalanine).

• Each triplet is called a codon.

• There are three codons, namely TAA, TGA and TAG which represent “end of gene”.

Page 15: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

15

• Protein Rnase A:KETAAAKFER

• Its corresponding DNA sequence is:AAA GAA ACT GCT GCT GCT AAA TTT GAA CGT

Page 16: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

16

How Is a Protein Produced?

• RNA (Ribonucleic Acid)

• Each cell is able to recognize all of the starting points of genes relevant to the proteins important to the functions of the cell.

Page 17: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

17

• The RNA system scans a gene. For each codon being scanned, it produces a corresponding amino acid.

• After all codons have been scanned, the corresponding protein is produced.

Page 18: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

18

Page 19: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

19

• AAA GAA ACT GCT GCT GCT AAA TTT GAA CGT

• KETAAAKFER

• Note that codon AAA corresponds to amino acid K and CGT corresponds to R.

• Remember TAA, TGA and TAG signify “end of gene”.

Page 20: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

20

Problems

1. String Matching Problem

2. Sequence Alignment Problem

3. Evolution Tree Problem

4. RNA Secondary Structure Prediction Problem

5. Protein Structure Problem

6. Physical Mapping Problem

Page 21: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

21

Exact String Matching Problems• Exact String Matching Problems

– Instance: A text T of length n and a pattern P of length m, where n > m.

– Question: Find all occurrences of P in T.

– Example: If T = “ttaptaap” and P = “ap”, then P occurs in T starting at 3 and 7.

• Linear time (O(n+m) time) Algorithms

– Knuth-Morris-Pratt (KMP) algorithm

– Boyer-Moore algorithm

Page 22: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

22

Approximate String Matching Problems

• Approximate String Matching Problems – Instance: A text T of length n, a pattern P of length m and a ma

ximal number of errors allowed k– Question: Find all text positions where the pattern matches the t

ext up to k errors, where errors can be substituting, deleting, or inserting a character.

– Example:• Let T = “pttapa”, P = “patt” and k = 2.• The substrings T[1..2], T[1..3], T[1..4] and T[5..6] are up to 2

errors with P.• Algorithms

– Dynamic Programming approach– NFA approach

Page 23: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

23

Sequence Alignment Problem

• ATTCATTACAACCGCTATGACCCATCAACAACCGCTATG

• It appears that these two sequences are quite different.

• An alignment will produce the following:ATTCATTA-CAACCGCTATGACCCATCAACAACCGCTATG

Page 24: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

24

• Given two sequences, any alignment will have a corresponding score.

• For each exact match, the score is equal to 2.

• For each mismatch, the score is equal to -1.• AGC- AG-CAAAC AAAC2-3=-1 2x2-2x(-1)=2

Page 25: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

25

• The sequence alignment problem: Given two sequences, find an alignment which produces the highest score.

• Approach: Dynamic Programming

• The multiple sequence alignment problem is NP-hard

Page 26: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

26

The Evolution Tree Problem

Page 27: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

27

Page 28: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

28

• The evolution tree problem: Given a distance matrix of n species, find an evolution tree under some criterion.

• Usually, the criteria are such that all of the tree distances reflect the original distances.

• That is, when two species are close to each other in the distance matrix, they should be close in the evolution tree.

Page 29: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

29

• Each criterion corresponds to a distinct evolution tree problem.

• Most of them are NP-complete.

• Algorithms which produce optimal evolution trees in polynomial time are mostly based upon the minimal spanning tree approach.

Page 30: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

30

A Partial Evolution Tree of the Homo Sapien (Intelligent Human Beings, also Modern Men)

Our ancestors are from Africa.

Page 31: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

31

Secondary Structure of RNA

• Due to hydrogen bonds, the primary structure of a RNA can fold back on itself to form its secondary structure.

• Base pairs (formed by hydrogen bonds): 1.AU (Watson-Crick base pair)

2. CG (Watson-Crick base pair)3. GU (Wobble base pair)

Page 32: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

32

AGGCCUUCCU

Page 33: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

33

2D & 3D Structures of Yeast Phenylalanyl-Transfer RNA

2D Structure 3D Structure

Page 34: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

34

Secondary Structure Prediction Problem

• Given an RNA sequence, determine the secondary structure of the minimum free energy from this sequence.

• Approach: Dynamic Programming

Page 35: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

35

Protein Structure Problem

• Each amino acid of a protein can be classified into either of the following two types: – H (hydrophobic, non-polar) (hating water)– P (hydrophilic, polar) (loving water)

• Then the amino acid sequence of a protein can be viewed as a binary sequence of H’s (1’s) and P’s (0’s).

Page 36: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

36

Example

• Instance: 011001001110010

0 1 1 0

0

1

00

1

11

1 0

0

0

0 1 1 0

0

1

00

1

11

1

0

0

0

Score = 5Score = 3

Page 37: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

37

H-P Model

• Instance: A sequence of 1’s (H’s) and 0’s (P’s).

• Question: To find a self-avoiding paths embedded in either a 2D or 3D lattice which maximizes score, where the score is the number of pairs of 1’s that are adjacent in the lattice without being adjacent in the sequence.

• NP-complete even for 2D lattice.

Page 38: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

38

Physical Mapping Problem

Select a subset of cosmid clones of minimum total length that covers the YAC DNA.

C: Full DNA108 bp

Cut C and clone into overlapping YAC clones.106 bp

Fragment assembling

Physical mapping

Physical mapping

Cut the DNA in each YAC clone and clone into overlapping cosmid clones.

104 bp

Duplicate the cosmid and then cut the copies randomly. Select and sequence short fragments and then reassemble them into a deduced cosmid string.

102 bp

Page 39: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

39

Shortest Common Superstring

• Input: A collection F of strings.

• Output: A shortest possible string S such that for every f F, S is a superstring of f.

• For example:

• NP-complete

ACT CTA AGTACTAGT

F

S

Page 40: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

40

• Suppose the target is too long and its contents are unknown.

• What can we do?

• Enzyme A {6, 8, 3, 10}Enzyme B {7, 11, 4, 5}Enzymes A and B {1, 5, 2, 6, 7, 3, 3}

Page 41: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

41

A

B

AB

3

4

3 1

8 6 10

5 11 7

75 2 6 3

This problem is called the two digest problem which is NP-complete.

Page 42: 1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

42

• TAA, TGA, or TAG.

• Do you know what they mean?

• End of Gene.

• Thank you for your patience. Have a good conference.


Recommended