TF-DNA binding dependency A progress report March 17, 2010 Hugo Willy.

TF-DNA binding dependencyA progress report

March 17, 2010

Hugo Willy

Outline

• Re-Introduction of my problem

• Current state of affair

• Known dependency factor 1 – Rotamer

• Known dependency factor 2 – Water

• Known dependency factor 3 – DNA flexibility

• Some thoughts on what to do next

Re-Introduction

• I am working on finding dependency model of TF-DNA binding

• What is TF-DNA binding?– If you ask this, you may be in the wrong room

• It is known that different TFs prefer different DNA sequence to bind to.

• Classic example TATA box binding proteins binds the sequence “TATA”.

Re-Introduction (2)

• It is commonly assumed that each position in T-A-T-A contributes independently to the binding energy.

• That is to say, some guys from the TF will bind the first “T”, some other will bind the second “A” and so on.

• If the sequence become CATA, then it depends on how much the guys who binds the 1st position likes the new “C”. If they are OK, the binding energy may change a little but the TF still binds.

• Otherwise, too bad.

Re-Introduction (3)

• One such model, a very popular one, is the PSSM model.

• And it is shown to be very good in estimating the real binding sites of many TF.

• However, some were curious whether the model holds for all TF.

Current state of affair• There are quite a few publications which tries to show

that there are measurable dependencies among the positions.– RECOMB 2003-Modeling dependencies in Protein-DNA binding

sites• Multi PSSM, Tree, Multi Tree. Bayesian network based training.

– Bioinformatics 2004-Modeling within-motif dependence for transcription factor binding site predictions

• PSSM with pairwise correlated position using Bayes Factor. Gibbs sampling based.

– BIBE 2006-Discovering DNA Motifs with Nucleotide Dependency• PSSM with multi-positions, heuristic.

– Bioinformatics 2007-Position dependencies in transcription factor binding sites

• Checks dependencies within a set of aligned binding site with different statistical measures.

Current state of affair (2)

– Bioinformatics 2008-Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors

• Neural network based.

– PLoSCompBio 2008-A Feature-Based Approach to Modeling Protein-DNA Interactions

• Feature based – currently only consider pairwise position dependency feature.

– NAR 2010-On the detection and refinement of transcription factor binding sites using ChIP-Seq data

• Similar to Bioinformatics 2004.

Current state of affair (3)• However, they have a similar framework

– Start with a set of “known” binding sequence– Try to guess a model with and without

dependencies– Train the model using the dataset (possibly

making gradual change on the model during the training)

– Compare which model is better– They will list down the positions with

dependencies – most are consecutive positions, but some have quite distant positions.


• Well, these are just a fitting of a model to a set of sequence known to bind. The binding energy was not really taken into account.

• So others, with more $$$ in their lab, did a huge biological experiments and try to see if the experimental binding energies of some TFs do exhibit some dependency pattern.


• Hence some more paper,

– NAR 2002-Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors

– NAR2002-Additivity in protein-DNA interactions-how good an approximation is it?

– Nature Biotechnology 2006-Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities

– Science 2009-Diversity and Complexity in DNA Recognition by Transcription Factors

– PLoSCompBio 2009-Inferring Binding Energies from Selected Binding Sites


From Science 2009, Protein binding microarray experiment.


• Yet, none of the publication I have read so far gives a concrete evidence on HOW such dependencies could happen.

• We are now trying to find the answer on what happen on the physical level when two positions in the DNA are dependent.

Known dependency factor 1 – Rotamer

• Recently there is an experiment involving the Zinc Finger TF, Zf268 which has been one of the most popular Zinc finger modeling target.


• They tried to change the DNA sequence of the wildtype GCG to ACG, CCG, AAG, and CAG

• We try to see if a program that can change the side chains of the TF to conform to the new DNA sequence can approximate the change in the binding energy.

• We tried FoldX – it does rotamer checks-not sure if it is optimal.

total energy

Backbone Hbond

Sidechain Hbond

Van der

WaalsElectro statics

Solvation Polar

Solvation Hydro

phobic

0 0 0 0 0 0 0

4.23 -0.36 5.01 2.08 2.25 -5.13 0.95

4.28 0 4.37 0.06 1 -1.23 -0.17

-0.02 -0.01 1.96 0.87 0.29 -3.1 -0.1

4 -0.35 6.81 3.14 2.38 -8.67 1.17

4.39 0 5.58 1.28 1.55 -4.15 -0.13

FoldX results


• However, the rotamers that FoldX predict does not coincide with the diagrams.

• Either FoldX is not optimal, or the homology modeling done in the paper is not accurate.

• But given the close agreement on the predicted and experimental difference in the binding affinity, most probably they are (more) correct.

• I am still checking on that.

Known dependency factor 2 – Water

• The thing that is explicitly computed in the NAR paper are the solvation penalties (the circles, rectangles and triangles in the diagram).

• They claim that the water mediated H-bonds are not that crucial.

• We can see that FoldX does compute hydration to a certain extent. Yet the rotamer search may not be good enough.

Different solvation state of polar atoms

Known dependency factor 3 – DNA flexibility

• DNA are not a rigid rod.


A

G

T

C



• G-C will have higher roll angle – making it less stable (weaker stacking energy) and easier to “open”.

• There are several work showing that different dinucleotide steps have different bending and twisting energy.


•TATA binding protein actually binds TATA not because it generates the best binding energy

•The bindings are mostly non-specific.


Conclusion

• Up to now, the 3 factors are the known/most probable factors of DNA dependency.

• The challenge would be to combine all these into one scoring function that is simple enough to run on large dataset.

Thank you for bearing with me.

Q & A

Date post:	13-Dec-2015
Category:	Documents
Upload:	jordan-washington
View:	216 times
Download:	2 times

TF-DNA binding dependency A progress report March 17, 2010 Hugo Willy.

Documents