Post on 22-Mar-2018
transcript
1
COMS 4761 --2007
Prof. Yechiam Yemini (YY)
Computer Science DepartmentColumbia University
Chapter 1: Bio Primer
1.1 Cell Structure; DNA; RNA; transcription; translation; proteins
COMS 4761 --2007 2
Overview Cell structure and mechanisms DNA; RNA; Transcription; Regulation Translation; protein; sequence & structure References:
B. Alberts et al, “Molecular Biology of The Cell”, 4th edition, GarlandScience.
R. Horton et al, “Principles of Biochemistry”, 3rd Edition, PrenticeHall.
J.D. Watson et al, “Molecular Biology of The Gene”, 5th edition,Pearson Benjamin Cummings.
NCBI Introductory overview:http://www.ncbi.nih.gov/About/primer/index.html
Animation sites:o http://www.johnkyrk.com/o http://vcell.ndsu.nodak.edu/~christjo/vcell/animationSite
2
COMS 4761 --2007 3
Organisms Are Made of Cells
COMS 4761 --2007 4
Prokaryotes & Eukaryotes Have Different Cells
Prokaryotes: single cell organisms without nucleusE.g., Bacteria: E-coli, H-Pylori
Eukaryotes: single/multi-cell organisms with nucleusE.g., Yeast, plants, drosophila, humans
-0.5B yrs
-1.5B yrs
-3.5B yrs
-4.5B yrsEarth formed
Prokaryotic bacteria
NucleatedcellsMulti-cellulareukaryotes
© Pearson; Benjamin Cummings
3
COMS 4761 --2007 5
DNA is tightly packed (chromatin + histones)DNA is loosely organized
Organelles: mitochondria, Golgi, chloroplastsNo organelles
~107-9 base pairs~105-6 base pairs
5-20k protein species1-2k protein speciesProteins ~109 proteins per cell~106 proteins per cell
DNA
Structure
Two or more chromosomesSingle circular DNAGenes have large non-coding regions (introns)Genes code proteins95-97% non-coding DNA90% of DNA encodes proteins
Multiple membranes/compartmentsOne membrane at cell boundary
MitosisCell division through fission
CytoskeletonNo cytoskeleton
NucleusNo nucleusSingle or multi cell; cell size 10-100µmSingle cell; size 0.2-2µm
EukaryotesProkaryotes
COMS 4761 --2007 6
Cells Are Made of Macromolecules
Sugars Polysaccharides
Fatty Acids Fats, Lipids, Membranes
Amino Acids Proteins
Nucleotides Nucleic Acids (DNA, RNA)
0.2%Other small molecules26%Macromolecules (proteins, DNA, RNA, polysaccharides)
1%Fatty acids0.4%Nucleotides0.4%Amino acids
1%Sugars1%Inorganic ions
70%Water% weightMolecules
Small molecules: 3% Macromolecules: 26%
4
COMS 4761 --2007 7
DNA Structure
COMS 4761 --2007 8
The Central Dogma of Biology
DNA stores hereditary information DNA is transcribed into RNA RNA is translated into proteins Proteins perform the key functions of cells
DNA Transcription RNA Translation Protein
5
COMS 4761 --2007 9
DNA Consists of Sequences of Nucleotides DNA strands are sequences of nucleotides
Bases: Adenine, Guanine, Thymine, Cytosine
DNA is organized in complementary double strands Hydrogen bonds hybridize complementary pairs: AT, CG
A C T T A C G C
C G
A C T A A C G CT G A T T GHydrogen bonds
5’-end
3’-end
TSugar Phosphate Base
+Nucleotide
TBackbone
COMS 4761 --2007 10
DNA Forms A Double HelixHelix full turn: 10.5bpVertical hydrogen bonds
support the structureMajor and minor grooves
provide access by proteins(e.g., transcription factors)
6
COMS 4761 --2007 11
DNA Is Tightly PackedDNA is 2m long; needs to fold
into 10-6m nucleusChromatin beads fold around
4 histonesTranscription needs to unpack
the DNA to copy it
COMS 4761 --2007 12
Sample Bioinformatics Challenges
Sequencing the genomeDiscovering sequence similarityDiscovering genesAnalyzing evolutionary relationshipsDiscovering other important structuresDistinguishing exons from intronsRegulatory structures: (promoters & transcription factors)Regions expressing micro RNA….
7
COMS 4761 --2007 13
Transcription
COMS 4761 --2007 14
Schematics
DNA
Transcription
mRNA
Translation
Protein
8
COMS 4761 --2007 15
Overview
A. Assembling transcription complex
B. Transcribing DNA to mRNA
C. Removing introns
COMS 4761 --2007 16
Animation
The Transcription Process
9
COMS 4761 --2007 17
Transcription Detailshttp://cwx.prenhall.com/horton/medialib/
From PDB
COMS 4761 --2007 18
Transcription Factors
TFs bind to promoters regionsand to RNA polymerases
TFs regulate the rate oftranscription (up/down)
Regulation is yet to be wellunderstood
10
COMS 4761 --2007 19
Transcription Is Regulated
http://cwx.prenhall.com/horton/medialib/
COMS 4761 --2007 20
Example The Lac Operon
Lac consists of 3 genes; commonly transcribedUsed by bacteria to transport and metabolize lactose
cAMP activatestranscription toinitiate transport& metabolism oflactose
11
COMS 4761 --2007 21
Lac ActivationLow-level sugar generate cAMP cAMP binds with CRP; adjusts its alpha helix to fit the
DNA grooves and binds with itCRP-cAMP accelerates polymerase binding
LacLac
http://cwx.prenhall.com/horton/medialib/
COMS 4761 --2007 22
Splicing The Introns
http://cwx.prenhall.com/horton/medialib/
12
COMS 4761 --2007 23
From Genes ToNetworks
Regulation is organized innetworks
Top: gene networkregulating the bodydevelopment of sea urchin
Middle: a promoter region
Bottom: interaction of twomodules
COMS 4761 --2007 24
Regulatory Networks Can Be Complex
Genetic regulatory network controlling the development of the body plan of the sea urchin embryoDavidson et al., Science, 295(5560):1669-1678.
13
COMS 4761 --2007 25
Sample Bioinformatics Challenges
Discovering and analyzing transcription factorsEvolutionary analysis; motifs finding
Discovering the structure of regulatory networksAnalyzing the operations of regulatory networksDesigning synthetic regulatory networks
COMS 4761 --2007 26
Translation
14
COMS 4761 --2007 27
RNA Encodes Protein Sequences
Proteins are sequences of amino-acids (AA) Translation uses RNA sequence as a template to construct AA sequence
The coding problem: Code sequence of 20 amino-acids using 4 nucleic acids 2 nucleic acids can code only 42=16 amino-acids Codon: sequence of 3 nucleic acids; encodes amino acid
Translation: translate mRNA codons to amino acids Start/Stop codons define an open reading frame(ORF) Translation requires reading/identifying codons and forming a respective protein
sequence
DNA Transcription RNA Translation Protein
COMS 4761 --2007 28
The Genetic Code
GGU GlycineGGC GlyGGA GlyGGG Gly
GAU AspartateGAC AspGAA GlutamateGAG Glu
GCU AlanineGCC AlaGCA AlaGCG Ala
GUU ValineGUC ValGUA ValGUG Val
G
AGU SerineAGC SerAGA ArgAGG Arg
AAU AsparagineAAC AsnAAA LysineAAG Lys
ACU ThreonineACC ThrACA ThrACG Thr
AUU IsoleucineAUC IleAUA IleAUG Methionine
A
CGU ArginineCGC ArgCGA ArgCGG Arg
CAU HistidineCAC HisCAA GlutamineCAG Gln
CCU ProlineCCC ProCCA ProCCG Pro
CUU LeuCUC LeuCUA LeuCUG Leu
C
UGU CysteineUGC CysUGA StopUGG Tryptophan
UAU TyrosineUAC TyUAA StopUAG Stop
UCU SerineUCC SerUCA SerUCG Ser
UUU PhenylalanineUUC PheUUA LeucineUUG Leu
U
GACU
15
COMS 4761 --2007 29
tRNA Provides Translation Units
Anticodon 3’ CGA 5’ binds to codon 5’ GCU 3’ of mRNA
It translates GCU to Alanine
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html
COMS 4761 --2007 30
Translation Basics Initiation:
Ribosome binds to mRNA; movesin 5’3’ until it finds Start codonAUG
Elongation Ribosome recruits tRNA to match
next codon tRNA binds its AA into peptide
bond with protein Ribosome releases tRNA and
moves to next codob Termination
Until a Stop codon is reached Release factor releases
polypeptide from ribosome
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html
16
COMS 4761 --2007 31
Animation
Translation of RNA into proteins
COMS 4761 --2007 32
Proteins Are Sequences of Amino Acids
Proteins are constructed through peptide bonds Proteins are folded into complex conformations Proteins perform functions by bindingTranscription factors and polymerase bind to DNAEnzymes bind to molecules to accelerate their reactionsGlobins bind to oxygen to transport itAntibodies bind to pathogens
17
COMS 4761 --2007 33
Example: Hemoglobin
COMS 4761 --2007 34
Sickle-Cell Anemia: A Single Nucleotide Change
Sickle structure
Codon 6 in β-globin
18
COMS 4761 --2007 35
Evolution of β-Globin
(α-globin cluster is coded by chromosome 16 )
COMS 4761 --2007 36
The Evolution of α-Globin Across Species
19
COMS 4761 --2007 37
Protein Structures
COMS 4761 --2007 38
Protein Structure Is Of Central Importance Structure is found through complex crystallography
X-ray diffraction; NMR The holy-grail: compute structure from sequence
Ab-initio: compute structure directly from sequence Homology techniques: use similarity to known proteins
Structure is conserved across wide variations Small number of fold families (α-helix, β-sheets…) There are rules (e.g., hydrophobic AA are packed inside) Nature folds proteins very fast
So why is it so difficult to predict structure?
20
COMS 4761 --2007 39
SwissProt vs. PDB Statistics
PDB ~30k structures
COMS 4761 --2007 40
Proteins Interact Via Active Sites
Protein interactions are defined by active sitesE.g., antibody with pathogenE.g., drug design
Proteins use geometry: ligands latch with holes Proteins use physics: electrical fields How can protein-protein interactions be computed?
21
COMS 4761 --2007 41
Sample Bioinformatics Challenges
Analyzing protein sequence similarityEvolutionary conservation/changes
Computing structure from sequencesAnalyzing structure homologiesAnalyzing protein-2-protein interactionsInferring function from structure
COMS 4761 --2007 42
The Cell Cycle
22
COMS 4761 --2007 43
Cells Operate In Cycles G0 Phase
cell is at rest G1 Phase (4hrs)
Cell either progresses into synthesis or leaves cell cycle to differentiate
S Phase (10hrs) DNA Synthesis Checkpoint determines integrity of DNA
G2 Phase (4hrs) Cell prepares for Mitosis Checkpoint determines integrity of DNA DNA is repaired or cell dies (Apoptosis)
Mitosis (2hrs) Chromosomes are separated Cell divides
COMS 4761 --2007 44
The Cell Cycle is RegulatedTransition among
phases is controlled bya regulatory network
Checkpoints are usedto assure quality
23
COMS 4761 --2007 45
Evolution
COMS 4761 --2007 46
Optimizing Functionality
DNA is substantially conserved through evolution Evolution = mutation + selectionMutation = single nucleotide polymorphism (SNP);
duplication of entire DNA segments mating; recombination
Selection = optimize fitness of species ExamplesMetabolic nets learn to optimize energy budget (Alon 05)
Functional similarity Sequence similarity