+ All Categories
Home > Documents > . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir...

. Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir...

Date post: 22-Dec-2015
Category:
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
41
. Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at www.cs. huji .ac.il /~nir. Changes made by Dan Geiger. Background Readings : The first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001.
Transcript
Page 1: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

.

Introduction to Algorithms in Computational Biology

Lecture 1

This class has been edited from Nir Friedman’s lecture which is available at www.cs.huji.ac.il/~nir. Changes made by Dan Geiger.

Background Readings: The first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001.

Page 2: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

2

Course InformationMeetings:

Lecture, by Dan Geiger: Mondays 16:30 –18:30, Taub 4. Tutorial, by Ydo Wexler: Tuesdays 10:30 – 11:30, Taub 2.

Grade: 20% in five question sets. These questions sets are obligatory. Each

contains 4-6 theoretical problems. Submit in pairs in two weeks time 80% test. Must pass beyond 55 for the homework’s grade to count

Information and handouts:

www.cs.technion.ac.il/~cs236522

A brochure with zeroxed material at Taub library

Page 3: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

3

Course PrerequisitesComputer Science and Probability Background Data structure 1 (cs234218) Algorithms 1 (cs234247) Probability (any course)

Some Biology Background Formally: None, to allow CS students to take this course. Recommended: Biology 1 (especially for those in the

Bioinformatics track), or a similar Biology course, and/or a serious desire to complement your knowledge in Biology by reading the appropriate material (see the course web site).

Studying the algorithms in this course while acquiring enough biology background is far more rewarding than ignoring the biological context.

Page 4: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

4

Relations to Some Other Courses

Intro to Bioinformatics (cs236523). This course covers practical aspects and hands on experience with web-based bioinformatics Software . Albeit not a formal requirement, it is recommended that you look on the web site http://webcourse.technion.ac.il/234523/ and examine the relevant software.

Algorithms in Computational Biology (cs236522). This is the current course which focuses on modeling some bioinformatics problems and presents algorithms for their solution.

Bioinformatics project (cs5236524). Developing bioinformatics tools under close guidance.

Page 5: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

5

First Homework Assignment

Solve two of the questions for Chapter 2 and two of the questions for Chapter 3.

Due time: During the third tutorial class, or earlier in the teaching assistant’s mail slot. Recall to submit in pairs.

Read carefully the first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001.

Page 6: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

6

Computational Biology

Computational biology is the application of computational tools and techniques to (primarily) molecular biology.  It enables new ways of study in life sciences, allowing analytic and predictive methodologies that support and enhance laboratory work. It is a multidisciplinary area of study that combines Biology, Computer Science, and Statistics.

Computational biology is also called Bioinformatics, although many practitioners define Bioinformatics somewhat narrower by restricting the field to molecular Biology only.

Page 7: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

7

Examples of Areas of Interest

• Building evolutionary trees from molecular (and other) data• Efficiently assembling genomes of various organisms• Understanding the structure of genomes (SNP, SSR, Genes)• Understanding function of genes in the cell cycle and disease• Deciphering structure and function of proteins

Page 8: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

8

Exponential growth of biological information: growth of sequences, structures, and literature.

Page 9: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

9

Four Aspects

Biological What is the task?

Algorithmic How to perform the task at hand efficiently?

Learning How to adapt/estimate/learn parameters and

models describing the task from examples

Statistics How to differentiate true phenomena from

artifacts

Page 10: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

10

Example: Sequence Comparison

Biological Evolution preserves sequences, thus similar genes might

have similar function

Algorithmic Consider all ways to “align” one sequence against

another

Learning How do we define “similar” sequences? Use examples to

define similarity

Statistics When we compare to ~106 sequences, what is a random

match and what is true one

Page 11: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

11

Course Goals

Learning about computational tools for (primarily) molecular biology.

We will cover computational tasks that are posed by modern molecular biology

We will discuss the biological motivation and setup for these tasks

We will understand the kinds of solutions that exist and what principles justify them

Page 12: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

12

Topics I

Dealing with DNA/Protein sequences: Finding similar sequences Models of sequences: Hidden Markov Models Gene finding Genome projects and how sequences are found

Page 13: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

13

Topics II

Models of genetic change: Long term: evolutionary changes among species Reconstructing evolutionary trees from sequences Short term: genetic variations in a population Finding genes by linkage and association

Page 14: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

14

Topics III (One class, if time allows)

Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data How to analyze proteins changes from raw

experimental measurements (MassSpec)

Page 15: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

15

Human Genome

Most human cells contain

46 chromosomes:

2 sex chromosomes (X,Y):

XY – in males.

XX – in females.

22 pairs of chromosomes named autosomes.

Page 16: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

16

DNA OrganizationS

ourc

e: A

lber

ts e

t al

Page 17: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

17

The Double HelixS

ourc

e: A

lber

ts e

t al

Page 18: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

18

DNA Components

Four nucleotide types: Adenine Guanine Cytosine Thymine

Hydrogen bonds(electrostatic connection): A-T C-G

Page 19: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

19

Genome Sizes

E.Coli (bacteria) 4.6 x 106 bases Yeast (simple fungi) 15 x 106 bases Smallest human chromosome 50 x 106 bases Entire human genome 3 x 109 bases

Page 20: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

20

Genetic Information

Gene – basic unit of genetic information. They determine the inherited characters.

Genome – the collection of genetic information.

Chromosomes – storage units of genes.

Page 21: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

21

GenesThe DNA strings include: Coding regions (“genes”)

E. coli has ~4,000 genes Yeast has ~6,000 genes C. Elegans has ~13,000 genes Humans have ~32,000 genes

Control regions These typically are adjacent to the genes They determine when a gene should be

expressed “Junk” DNA (unknown function)

Page 22: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

22

The Cell

All cells of an organism contain the same DNA content (and the same genes) yet there is a variety of cell types.

Page 23: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

23

Example: Tissues in Stomach

How is this variety encoded and expressed ?

Page 24: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

24

Central Dogma

Transcription

mRNA

Translation

ProteinGene

cells express different subset of the genesIn different tissues and under different conditions

שעתוק תרגום

Page 25: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

25

Transcription

Coding sequences can be transcribed to RNA

RNA nucleotides: Similar to DNA, slightly different backbone Uracil (U) instead of Thymine (T)

Sou

rce:

Mat

hew

s &

van

Hol

de

Page 26: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

26

Transcription: RNA Editing

Exons hold information, they are more stable during evolution.This process takes place in the nucleus. The mRNA molecules diffuse through the nucleus membrane to the outer cell plasma.

1. Transcribe to RNA2. Eliminate introns3. Splice (connect) exons* Alternative splicing exists

Page 27: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

27

RNA roles Messenger RNA (mRNA)

Encodes protein sequences. Each three nucleotide acids translate to an amino acid (the protein building block).

Transfer RNA (tRNA) Decodes the mRNA molecules to amino-acids. It connects

to the mRNA with one side and holds the appropriate amino acid on its other side.

Ribosomal RNA (rRNA) Part of the ribosome, a machine for translating mRNA to

proteins. It catalyzes (like enzymes) the reaction that attaches the hanging amino acid from the tRNA to the amino acid chain being created.

...

Page 28: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

28

Translation (Outside the nucleolus)

Translation is mediated by the ribosome Ribosome is a complex of protein & rRNA

molecules The ribosome attaches to the mRNA at a

translation initiation site Then ribosome moves along the mRNA sequence

and in the process constructs a sequence of amino acids (polypeptide) which is released and folds into a protein.

Page 29: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

29

Genetic Code

There are 20 amino acids from which proteins are build.

Page 30: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

30

Protein Structure

Proteins are poly-peptides of 70-3000 amino-acids

This structure is (mostly) determined by the sequence of amino-acids that make up the protein

Page 31: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

31

Protein Structure

Page 32: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

32

Evolution

Related organisms have similar DNA Similarity in sequences of proteins Similarity in organization of genes along the

chromosomes Evolution plays a major role in biology

Many mechanisms are shared across a wide range of organisms

During the course of evolution existing components are adapted for new functions

Page 33: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

33

Evolution

Evolution of new organisms is driven by Diversity

Different individuals carry different variants of the same basic blue print

Mutations The DNA sequence can be changed due to

single base changes, deletion/insertion of DNA segments, etc.

Selection bias

Page 34: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

34

The Tree of Life

Sou

rce:

Alb

erts

et

al

Page 35: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

35

Example for Phylogenetic AnalysisInput: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species.

Question: Which evolutionary tree best explains these sequences ?

AGAAAA

GGAAAG

AAA AAA

AAA

21 1

Total #substitutions = 4

One Answer (the parsimony principle): Pick a tree that has a minimum total number of substitutions of symbols between species and their originator in the evolutionary tree (Also called phylogenetic tree).

Page 36: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

36

Example ContinuedThere are many trees possible. For example:

AGAGGA

AAAAAG

AAA AGA

AAA

11

1

Total #substitutions = 3

GGAAAA

AGAAAG

AAA AAA

AAA

11 2

Total #substitutions = 4

The left tree is “better” than the right tree.

Questions:Is this principle yielding realistic phylogenetic trees ? (Evolution)How can we compute the best tree efficiently ? (Computer Science)What is the probability of substitutions given the data ? (Learning)Is the best tree found significantly better than others ? (Statistics)

Page 37: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

37

Werner’s Syndrome

A successful application of genetic linkage analysis

Page 38: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

38

The Disease

First references in 1960s Causes premature ageing Linkage studies from 1992 WRN gene cloned in 1996 Subsequent discovery of mechanisms involved in

wild-type and mutant proteins

Page 39: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

39

A sample Input

The study used 13 Markers; here we see only one.

The study used 14 families; here we see only one.

2

4

5

1

3

H

A1/A1

D

A2/A2

H

A1/A2

D

A1/A2

H

A2/A2

D DA1 A2

H DA1 A2

H | DA2 | A2

D DA2 A2

Recombinant

Phase inferred

Page 40: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

40

Genehunter Output

position LOD_score information 0.00 -1.254417 0.224384 1.52 2.836135 0.226379 ...[data skipped]...

18.58 13.688599 0.384088 19.92 14.238474 0.401992 21.26 14.718037 0.426818 22.60 15.159389 0.462284 22.92 15.056713 0.462510 23.24 14.928614 0.463208 23.56 14.754848 0.464387

...[data skipped]...

81.84 1.939215 0.059748 90.60 -11.930449 0.087869

distance between markers in centi-

morgans

Most ‘likely’ position

D8S339D8S131

D8S259

Marker’s name

Log likelihood of placing disease

gene at distance, relative to it being

unlinked.

Maximum log likelihood score

Page 41: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.

41

Final Location

Marker D8S131

Marker D8S259

location of marker D8S339

WRN Gene final location

Error in location by genetic linkage of about 1.25M base pairs.


Recommended