Post on 16-Jan-2016
transcript
Genes: Regulation and Genes: Regulation and StructureStructure
Many slides from various sources, including S. Batzoglou,
Cells respond to environment
Heat
FoodSupply
Responds toenvironmentalconditions
Various external messages
Genome is fixed – Cells are dynamic
• A genome is static
Every cell in our body has a copy of same genome
• A cell is dynamic Responds to external conditions Most cells follow a cell cycle of division
• Cells differentiate during development
Gene regulation
• Gene regulation is responsible for dynamic cell
• Gene expression varies according to:
Cell type Cell cycle External conditions Location
Where gene regulation takes place
• Opening of chromatin
• Transcription
• Translation
• Protein stability
• Protein modifications
Transcriptional Regulation
• Strongest regulation happens during transcription
• Best place to regulate: No energy wasted making intermediate products
• However, slowest response timeAfter a receptor notices a change:
1. Cascade message to nucleus
2. Open chromatin & bind transcription factors
3. Recruit RNA polymerase and transcribe
4. Splice mRNA and send to cytoplasm
5. Translate into protein
Transcription Factors Binding to DNA
Transcription regulation:
Certain transcription factors bind DNA
Binding recognizes DNA substrings:
Regulatory motifs
Promoter and Enhancers
• Promoter necessary to start transcription
• Enhancers can affect transcription from afar
Regulation of Genes
GeneRegulatory Element
RNA polymerase(Protein)
Transcription Factor(Protein)
DNA
Regulation of Genes
Gene
RNA polymerase
Transcription Factor(Protein)
Regulatory Element
DNA
Regulation of Genes
Gene
RNA polymerase
Transcription Factor
Regulatory Element
DNA
New protein
Example: A Human heat shock protein
• TATA box: positioning transcription start
• TATA, CCAAT: constitutive transcription
• GRE: glucocorticoid response
• MRE: metal response
• HSE: heat shock element
TATASP1CCAAT AP2HSEAP2CCAATSP1
promoter of heat shock hsp70
0--158
GENE
Gene expression
Protein
RNA
DNA
transcription
translation
CCTGAGCCAACTATTGATGAA
PEPTIDE
CCUGAGCCAACUAUUGAUGAA
The Genetic Code
Eukaryotes vs Prokaryotes
• Eukaryotic cells are characterized by membrane-bound compartments, which are absent in prokaryotes.
• “Typical” human & bacterial cells drawn to scale.
BIOS Scientific Publishers Ltd, 1999
Brown Fig 2.1
Prokaryotic genes – searching for ORFs.
- Small genomes have high gene density
Haemophilus influenza – 85% genic - No introns- Operons
One transcript, many genes
- Open reading frames (ORF) – contiguous set of codons, start with Met-codon, ends with
stop codon.
Example of ORFs.
There are six possible ORFs in each sequence for both directions of transcription.
Eukaryotes vs Prokaryotes
• Eukaryotic cells are characterized by membrane-bound compartments, which are absent in prokaryotes.
• “Typical” human & bacterial cells drawn to scale.
BIOS Scientific Publishers Ltd, 1999
Brown Fig 2.1
Gene structure
exon1 exon2 exon3intron1 intron2
transcription
translation
splicing
exon = protein-codingintron = non-coding
Codon:A triplet of nucleotides that is converted to one amino acid
Gene structure
exon1 exon2 exon3intron1 intron2
transcription
translation
splicing
exon = codingintron = non-coding
Finding genes
Start codonATG
5’ 3’
Exon 1 Exon 2 Exon 3Intron 1 Intron 2
Stop codonTAG/TGA/TAA
Splice sites
atg
tga
ggtgag
ggtgag
ggtgag
caggtg
cagatg
cagttg
caggccggtgag
0. We can sequence the mRNA
• Expressed Sequence Tag (EST) sequencing is expensive
• It has some false positive rates (aberrant splicing)
• The method sequences all RNAs and not just those that code for genes
• This is difficult for rare genes (those that are expressed rarely or in low quantities.
• Still this is an invaluable source of information (when available)
Biology of Splicing
(http://genes.mit.edu/chris/)
1. Consensus splice sites
(http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html)
Donor: 7.9 bitsAcceptor: 9.4 bits(Stephens & Schneider, 1996)
2. Recognize “coding bias”
• Each exon can be in one of three framesag—gattacagattacagattaca—gtaag Frame 0ag—gattacagattacagattaca—gtaag Frame 1ag—gattacagattacagattaca—gtaag Frame 2
Frame of next exon depends on how many nucleotides are left over from previous exon
• Codons “tag”, “tga”, and “taa” are STOP No STOP codon appears in-frame, until end of gene Absence of STOP is called open reading frame (ORF)
• Different codons appear with different frequencies—coding bias
2. Recognize “coding bias”
Amino Acid SLC DNA codonsIsoleucine I ATT, ATC, ATALeucine L CTT, CTC, CTA, CTG, TTA, TTGValine V GTT, GTC, GTA, GTGPhenylalanine F TTT, TTCMethionine M ATGCysteine C TGT, TGCAlanine A GCT, GCC, GCA, GCG Glycine G GGT, GGC, GGA, GGG Proline P CCT, CCC, CCA, CCGThreonine T ACT, ACC, ACA, ACGSerine S TCT, TCC, TCA, TCG, AGT, AGCTyrosine Y TAT, TACTryptophan W TGGGlutamine Q CAA, CAGAsparagine N AAT, AACHistidine H CAT, CACGlutamic acid E GAA, GAGAspartic acid D GAT, GACLysine K AAA, AAGArginine R CGT, CGC, CGA, CGG, AGA, AGGStop codons Stop TAA, TAG, TGA
Can map 61 non-stop codons to frequencies & take log-odds ratios
3. Genes are “conserved”
Approaches to gene finding
• Homology Procrustes
• Ab initio Genscan, Genie, GeneID
• Comparative TBLASTX, Rosetta
• Hybrids GenomeScan, GenieEST, Twinscan, SLAM…
HMMs for single species gene finding: Generalized HMMs
HMMs for gene finding
GTCAGAGTAGCAAAGTAGACACTCCAGTAACGC
exon exon exonintronintronintergene intergene
GHMM for gene finding
TAA A A A A A A A A A A AA AAT T T T T T T T T T T T T T TG GGG G G G GGGG G G G GCC C C C C C
Exon1 Exon2 Exon3
duration
Observed duration times
Better way to do it: negative binomial
• EasyGene:
Prokaryotic
gene-finder
Larsen TS, Krogh A
• Negative binomial with n = 3
Splice Site Models
• WMM: weight matrix model = PSSM (Staden 1984)
• WAM: weight array model = 1st order Markov (Zhang & Marr 1993)
• MDD: maximal dependence decomposition (Burge & Karlin 1997) decision-tree like algorithm to take significant pairwise dependencies into
account
Splice site detection
5’ 3’Donor site
Position
-8 … -2 -1 0 1 2 … 17
A 26 … 60 9 0 1 54 … 21C 26 … 15 5 0 1 2 … 27G 25 … 12 78 99 0 41 … 27T 23 … 13 8 1 98 3 … 25