CS273 Algorithms for Structure and Motion in Biology

CS273CS273Algorithms for Structure and Algorithms for Structure and

Motion in BiologyMotion in BiologyInstructors:

Serafim Batzoglou and Jean-Claude Latombe

Teaching Assistant: Sam Gross

| serafim | latombe | ssgross | @ cs.stanford.edu

Spring 2006 – http://www.stanford.edu/class/cs273/

http://www.stanford.edu/class/cs273/

Need a Scribe!!

Range of Bio-CS InteractionRange of Bio-CS Interaction

Gene

Molecules

Tissue/Organs

Body system

Robotic surgery

Molecular structures,similaritiesand motions

Soft-tissue simulation andsurgical trainingCells

Simulation ofcell interaction

CS273Sequencealignment

Enormous range over space and time

Focus on Proteins

Proteins are the workhorses of all living organisms

They perform many vital functions, e.g:• Catalysis of reactions• Transport of molecules• Building blocks of muscles• Storage of energy• Transmission of signals• Defense against intruders

Proteins are also of great interest from a computational

viewpoint They are large molecules (few 100s

to several 1000s of atoms) They are made of building blocks

(amino acids) drawn from a small “library” of 20 amino-acids

They have an unusual kinematic structure: long serial linkage (backbone) with short side-chains

Proteins are associated with many challenging

problems Predict folded structures and motion pathways Understand why some proteins misfold or

partially fold, causing such diseases as: cystic fibrosis, Parkinson, Creutzfeldt-Jakob (mad cow)

Find structural similarities among proteins and classify proteins

Find functional structural motifs in proteins Predict how proteins bind against other proteins

and smaller molecules Design new drugs Engineer and design proteins and protein-like

structures (polymers)

Central Dogma Central Dogma of Molecular Biologyof Molecular Biology


transcription

translation

Protein SequenceProtein Sequence

O

N

NN

N

OO

O

Long sequence of amino-acids (dozens to thousands), also called residues

Dictionary of 20 amino-acids (several billion years old)

(residue i-1)

O

N

NN

N

OO

O

Protein SequenceProtein Sequence

Peptide bond(partial double bond character)

T


Physiological conditions: aqueous solution, 37°C, pH 7,atmospheric pressure

Levels of Protein StructuresLevels of Protein Structures

hemoglobin (4 polypeptide chains)

Quaternary

Mostly -helicesMostly -sheets

Mixed

Intermediate states

FoldingFoldingUnfolded (denatured) state

Folded (native) state

Many pathways

http://www-shakh.harvard.edu/ProFold2.html

How (we think) a protein folds ...

G = H - TS




G = H - TS




G = H - TS




G = H - TS




G = H - TS


Motion of Proteins Motion of Proteins in Folded Statein Folded State

HIV-1 protease

Structural variability of the overall ensemble of native ubiquitin structures

[Shehu, Kavraki, Clementi, 2005]

Amylosucrase

Flexible Loop

Loop 7


BindingBinding

Inhibitor binding to HIV protease

Protein-protein binding

Ligand-protein binding

Binding of Pyruvate to LDH

(reduction of pyruvate to lactase)

ASP-195HIS-193

ASP-166

ARG-169

+

+

+

THR-245

C

C

OO

O

CH3

NADH

GLN-101

ARG-106Loop

Lactate dehydrogenase environment

Pyruvate

Nicotinamide adenine dinucleotide (coenzyme)

What is CS273 about?What is CS273 about?

Algorithms and computational schemes for molecular biology problems

Molecular biology seen by computer scientists

y = f(x)

Biologists like experiments, specifics and classifications

They like it better to know many (xi,yi) – i.e., facts – and classify them, than to know f

Computer scientists like simulation, abstractions, and general algorithms

They want to know f – the explanation of the facts – and efficient ways to compute it, but rarely care for any (xi,yi)

One challenge of Computational Biology is to fuse these two cultures

The Shock of Two Cultures

Two Views of a BioComputation Class

Where are IT resources for biology available and how to use them

How to design efficient data structures and algorithms for biology

Main Ideas Behind CS273Main Ideas Behind CS2731. The information is in the sequence

Sequence Structure (shape) Function Sequence similarity Structural/functional similarity Sequences are related by evolution



2. Biomolecules move and bind to achieve their functions Deformation folded structures of proteins Motion + deformation multi-molecule complexes One cannot just “jump” from sequence to function

Protein folding

Ligandprotein binding

Sequence Structure Function

sequencesimilarity

structuresimilarity



2. Biomolecules move and bind to achieve their functions Deformation folded structures of proteins Motion + deformation multi-molecule complexes One cannot just “jump” from sequence to function

CS273 is about algorithms for sequence, structure and

motion- Finding sequence and shape similarities- Relating structure to function- Extracting structure from experimental data - Computing and analyzing motion pathways

Vision Underlying CS273 Goal of computational biology:

Low-cost high-bandwidth in-silico biology

Requirements:Reliable models Efficient algorithms

Algorithmic efficiency by exploiting properties of molecules and processes: Proteins are long kinematic chains Atoms cannot bunch up together Forces have relatively short ranges

Computational Biology is more than using computers to biological problems or mimicking nature (e.g., performing MD simulation)

Tentative Schedule Tentative Schedule 1 April 5 Introduction

2 April 10 Protein geometric and kinematic models

3 April 12 Conformational space

4 April 17 Inverse kinematics and applications

5 April 19 Sequence similarity



8 May 1 Structure comparison9 May 3 Structure comparison10 May 8 Protein phylogeny, clustering, and

classification11 May 10 Protein phylogeny, clustering, and

classification12 May 15 Energy maintenance

13 May 17 Energy maintenance

14 May 22 Structure prediction

15 May 24 Roadmap methods

16 May 31 Structure prediction

17 June 5 Structure prediction

18 June 7 TBA

19 June 12 Project presentations (2 hours)

Instructors and TAsInstructors and TAs

Instructors:– Serafim Batzoglou – Jean-Claude Latombe

TA:– Sam Gross

Emails: | serafim | latombe | ssgross | @ cs.stanford.edu

Class website: http://cs273.stanford.edu

http://cs273.stanford.edu/

Expected Work

Regular attendance to lectures and active participation

Class scribing (assignments will depend on # of students)

Exciting programming project:http://www.stanford.edu/class/cs273/project/project.html - Structure prediction

- Clustering and distance metrics- Protein design- Something else

http://www.stanford.edu/class/cs273/project/project.html

Questions?Questions?

Date post:	06-Feb-2016
Category:	Documents
Upload:	clove
View:	27 times
Download:	0 times

CS273 Algorithms for Structure and Motion in Biology

Documents