Protein DNA Interactions
Jörg BungertDepartment of Biochemistry and
Molecular Biology
Phone: 352-273-8098Email: [email protected]
Objectives• Know the main factors that contribute to the specificity of protein DNA
interactions: Base Readout and Shape Readout.
• Know the major DNA binding motifs in proteins and how they interact with DNA: Helix-Turn-Helix, Zinc Finger, Leucine Zipper.
• Know the difference between DNA bend and DNA kink.
• Understand the consequence of minor groove narrowing.
• Know different methods for analyzing protein DNA interactions: EMSA, ChIP, Selex, Yeast one Hybrid.
Reading: Lodish 7th edition, chapter 7 (pp. 305-315).
Sliding
IntersegmentTransfer
InterdomainAssociation
InterdomainDissociation
Operator
Repressor
How do sequence-specific DNA binding proteins find their target sites in the genome?
Early work by Art Riggs and others have shown that bacterial transcription factors associate
with their respective binding site in solution much faster than predicted according to
macromolecular interaction kinetics. Binding to the specific target involves hydrogen bonding
between the amino acid residues in the active site of the protein and base pairs in the minor
or major grooves of the DNA. Non-specific interactions are electrostatic. Von Hippel and
colleagues proposed that “the DNA cylinder can be viewed as an isopotential surface along
which the protein can diffuse in a one-dimensional random walk”. This sliding mechanism
differs from chemical energy-dependent translocations by e.g. helicases and polymerases.
Spector, Annu. Rev. Biochem., 2003, von Hippel and Berg, JBC, 1989
How do sequence-specific DNA binding proteins find
their target sites in the genome?
Base Readout and Shape Readout
DNA binding proteins combine multiple readout mechanisms to achieve specificity. The topography of the human
genome (assayed by hydroxyl radical cleavage pattern) is evolutionarily constrained and a better predictor of
functional DNA elements than linear DNA. This suggests that DNA topology is an important contributor to the
specificity of protein DNA interactions. Most of the more than 1500 solved protein/DNA structures reveal a form
of bound DNA that deviates from the ideal B-form configuration.
Sequence-Specific DNA interactions
Proteins recognize chemical and conformational signatures of base pairs. Amino acid (AA) residues can interact
with the edges of the base pairs in the major groove. These interactions involve hydrogen bonds, water
mediated hydrogen bonds, or hydrophobic contacts. A:T versus T:A and C:G versus G:C are indistinguishable
in the minor groove. Shown below are rotational views of the dodecamer d(GACT)3 with hydrogen bond donors
and acceptors, thymidine methyl groups, and base carbon hydrogen as indicated. Note that bidentate hydrogen
bonds (two donors and two acceptors) provide more specificity than bifurcate hydrogen bonds (one donor and
two acceptors).
Sequence-Specific Binding Proteins and DNA
1. Each BASE PAIR presents a unique constellation of sites for chemical interaction in the MAJOR GROOVE.
2. A RECOGNITION HELIX on the protein is positioned within themajor groove.
3. The RECOGNITION HELIX participates in hydrogen bonds, Van der Waals interactions, and hydrophobic effect interactions with the base pairs. This is the basis of SEQUENCE RECOGNITION.
4. The most common (>80%) motifs for positioning the recognition helix are the HELIX-TURN-HELIX, the ZINC FINGERand LEUCINE ZIPPER. There are others.
5. Formation of a stable DNA-Protein complex is most often dependentupon additional non-covalent chemical interactions outside of the recognition helix-base pair contacts.
Helix-Turn-Helix
434 RepressorProtein
High resolutionstructure byco-crystallizationwith syntheticoligonucleotides
Homodimer
INDUCED BENDin axis of DNA
Lodish Fig 7-28
Recognition Helix
Protein-Protein
Protein-DNA
The helix turn helix (HTH) motif is the most commonly used secondary structure for specific DNA recognition.
It was first characterized in prokaryotes. A recognition helix is positioned in the major groove and makes base
specific contacts. A second helix stabilizes the recognition helix and is required for the proper positioning.
Eukaryotic homeodomain proteins are characterized by a three helix bundle in which the 2nd and 3rd represent
the HTH. The winged helix-turn-helix (wHTH) motif has an additional b-sheet that interacts with the minor groove
to make additional DNA contacts.
5’: :A-TC-GA-T: :: :: :: :T-AG-CT-A: :
5’
Homodimeric DNABinding Proteins
Homodimeric proteinsspecifically recognize PALINDROMICDNA sequences.Improved specificity
Binding energy additive,so interaction twice asstrong as a monomer.
Some helix-turn-helixproteins are monomericand do not havepalindromic recognitionsequences
Chemical Bonds Between 434R Protein and DNASequence-Specific
Recognition HelixGln28 H-bonds to A (N6 and N7)Gln29 H-bonds to G (O6 and N7)Gln29 and Thr27 van der Waals
contacts with T (CH3)
Sequence-IndependentOutside Recognition Helix
Arg36 in Minor GrooveElectrostatic Interaction
H-bonds peptide backbone and DNA backbone
Branden andTooze Fig 7.17
DNA -- TrpR Repressor ProteinPosition of the Recognition Helix as a mechanism of gene regulation
-- TrpR controls many genes of trp synthesis-- Absence of trp, the TrpR protein is inactivated
-- Trp binds to TrpR altering conformation ofHelix-Turn-Helix motif allowing binding to DNA
Classical Zn-Finger MotifConsensus Sequence:
…C-X5-C-X3-(F/Y)-X5-L-X2-3-H-X3-4-H …
Zinc Finger Proteins may
have more than one Zn
finger per protein.
Zn coordinated by cysteines
and histidines.
Conserved aromatic amino
acid and leucine form a
“strut” positioning the
recognition helix.
Compact 30AA domain, a-helix plus two
antiparallel b-sheets and a Zn2+ ion that is
coordinated by cysteines or histidines.
Recognition Helix
Lodish Fig. 3-9
strut
Zif 268 Recognition HelixInteractions with DNA
Helix PositionZn # -1 2 3 6 DNA
3 Arg Asp Glu Arg GCG
2 Arg Asp His Thr TGG
1 Arg Asp Glu Arg GCG
Zinc Finger Proteins
GL proteinBinds as a Monomer
Five C2H2 Zn fingers
but only four make
sequence-specific
contacts with DNA.
Binding of one helix facilitates the binding
Of the next helix.
Lodish Fig 7-29
Transcription factor TFIIIA and Zn-Finger Encoding Genes
Generation of Zinc Finger Proteins with Altered
Binding Specificity
The way Zn-fingers bind to DNA is well understood. The DNA-sequence
encoding the a-helix reading head can be changed to generate proteins
with altered binding specificity. These altered reading heads can be
assembled into zinc finger proteins to recognize a desired sequence.
An artificial zinc-finger protein with 6 zinc fingers theoretically binds to
a DNA sequence of 18 bp. Thus this protein will bind to one or a few
sequences in the human genome. Artificial Zn-finger DNA binding domains
can be linked to effector domains (transcription activation domain,
transcription repression domain), and expressed in cells to alter expression
of specific genes.
Leucine Zipper Proteins
Homodimeric or Heterodimeric
Recognition Helices:-- extend from zipper helices-- interact with major grooves-- passing on either side of
DNA helical axis
Leucine Zipper Proteins
Homo- or Heterodimeric DNA Binding Proteins
“Leucine Zipper”-- series of about 7 Leu -- along helical faces
of each protein-- hydrophobic (van der
(Waals) interactionsresult in dimerization
RECOGNITION HELIX
Coiled-Coil
Lodish Fig 7-29
Lodish Fig. 4-5
Most DNA binding proteins use an a-helix that
Recognizes bps in the major groove. Some proteins,
like TBP use b-sheets that face into the minor groove.
This mode of interaction often deforms the DNA
leading to sharp bends.
Global Shape Readout: A-B-Z-DNA
Local Shape Readout: Bend, Kink, Narrow Minor Groove
DNA kinks are characterized by a
complete or partial loss of stacking of
a single base pair step. For example
TpA has the weakest stacking
interaction and is the most flexible of
the 10 unique dinucleotides.
(“hinge”). Proteins can stabilize the
kink by intercalation of hydrophobic
side chains. The width of the minor
groove depends on the roll (relative
rotation between adjacent bases with
respect to base pairing axis), helix
twist (relative rotation between
adjacent bases with respect to helix
axis) or propeller twist. ApT base pair
steps have negative roll angels and
compress the minor groove.
Compression of the minor groove
increases the negative electrostatic
potential, which often recruits
arginine residues.
Hox (homeodomain) binding specificity mediated by local shape recognition
Hox DNA-binding specificity mediated by local shape recognition. All panels show either the fkh250-binding site or the fkh250conbinding site. fkh250, but not fkh250con, has two minor groove minima, which creates a more negative electrostatic potential (minus signs). The capital letterW refers to the Hox YPWM motif, which makes a direct contact with the cofactor Exd. (a) In the absence of Exd, Scr does not bind with high affinity to fkh250 because basic side chains (small bars), in particular, arginines, on the N-terminal arm and linker of Scr are not positioned correctly. (b) Other Hox proteins do not bind well to fkh250 even in the presence of Exd because their N-terminal arms and linker regions have different sequences. (c) The Scr-Exd heterodimer binds well to fkh250 because the Scr N-terminal arm and linker region have the correct residues, and Exd positions them correctly by binding the YPWM motif (W). (d ) Other Hox-Exd heterodimers bind well to fkh250con. This binding site is not as selective because it has a less negative electrostatic potential. Thus, the sequences of the Hox N-terminal arms and linker regions are not as important for binding.
End-labeledoligonucleotide
probe
Probe + Nuclear Extract
Probe + Nuclear Extract
+ Antibody1 2 3
Autoradiogram of EMSA Acrylamide
Gel
Electrophoretic Mobility Shift Assay (EMSA)
“Band Shift”
“Supershift”
HIGHEST MOBILITY
LEAST MOBILITY
Is a protein present that binds a DNA sequence?Does a specific known protein binding a sequence?An in vitro method.
EMSA Gel Autoradiogram
Purified DNA Binding Domain of the Glucocorticoid Receptor (GR)Probe oligonucleotide contains a GR binding site from the Pal gene
Band shift
Probe
Meijsing et al (2009) Science 324, 407 Suppl.
Chromatin Immunoprecipitation Assay (ChIP)
(Protein A sepharose)
Formaldehydetreated cells
(Sonifier)
PCR using gene-specific oligonucleotide primers
PCR products detected byagarose gel electrophoresis Adapted from Lodish Fig 7-37
Does a known protein bindto a specific DNAelement in the cell?
An in vivo approach.
Mechanically shear DNAAdd antibody specific for a DNA Binding Protein
Antibody
Methods for studying DNA-protein interactions
B. Protein-binding microarray
(PBM). All possible 10-base-long
sequences are included on the
array. Primer directed DNA
synthesis creates dsDNA. After
binding of the protein, fluorescently
labeled anitbodies are used for
detection.
C. High-Throughput (HT)-SELEX.
DNA molecules from a random
library are exposed immobilized
transcription factor (TF), unbound
DNA is washed off, and bound DNA
is subjected to sequencing.
D. Bacterial one-hybrid (B1H)
system. The TF is fused to a subunit
RNA polymerase. A sequence from
a randomized library is inserted
upstream of the promoter of the
HIS3 gene. When the TF binds to
the randomized sequence, the HIS3
promoter becomes more active,
which increases the growth rate.
Stormo and Zhao, Nat. Rev. Genetics, 2010,
Stormo and Zhao, Nat. Rev. Genetics, 2010,
Affinity: Binding of protein
to DNA is a bimolecular
process governed by two
rates: on-rate (formation of
complex), and off-rate
(dissociation).
TF + S TF.S
The Kd can be calculated
from the concentration of
the product (TF.S) divided
by the concentration of
protein (TF) and sequence
(S). The sequence binding
probability (PS bound) is a
reflection of Kd and TF
concentration
(concentration of TF
divided by concentration
of TF plus Kd.).
The graph on the bottom
depicts the binding
propabilties of many
different sequences
dependent on TF
concentration.
Affinity and Consensus Sequence
Position Weight Matrix (PWM): a score is assigned
to each possible base at each position. The sum
of the elements that correspond to a specific
sequence gives a total score for this sequence. The
„logo‟ provides a convenient graphical representation
of the PWM.
Distinct dimensions of conservation to consider. There are many ways to assess the conservation of an
observed binding event. In this cartoon example, we consider a binding event in humans (shown in green) to
the orthologous region in mice. (a) The sequence under the binding event can be conserved to mice. (b) If in
vivo binding data are available in mice, we can ascertain (i) when a binding event is species-specific or (ii)
when it is observed in both species. (c) By assigning binding events to nearby genes, we also observe cases of
turnover, where the target of a binding event is maintained. In this case, the loss of one binding event is
compensated by a nearby gain, conserving the gene as a target of the transcription factor.
Transcription factor–DNA interactions.
(a) A transcription factor recognizes a
DNA sequence motif, shown here as a
logo representation of a PSSM where the
height of each letter is proportional to the
score of the nucleotide at that position of
the motif. (b) The location of protein–DNA
interactions can be assayed by ChIP. The
magnitude of the ChIP enrichment signal
(shown in green) correlates broadly with
different levels of transcription factor
occupancy [30] , [31] , [57] and [58] .
Given the ChIP data, one can assess the
relationship between occupancy and
sequence: (i) not all high-affinity
recognition motifs (shown as purple
boxes) are bound, (ii) some binding
events do not have a high-affinity motif
associated (but nearly all sequences
contain low-affinity sites) and (iii) some
binding events are at high-affinity motifs
Transcription factor binding variation in the evolution of gene regulation
Dowell, Trends in Genetics, 2010