Clustering the Temporal Sequences of 3D Protein Structure Mayumi Kamada +* , Sachi Kimura, Mikito Toda ‡ , Masami Takata + , Kazuki Joe + + : Graduate School of Humanities and Science, Information and Computer Sciences, Nara Women’s University ‡ : Departments of physics, Nara Women’s University
Transcript
Slide 1
Clustering the Temporal Sequences of 3D Protein Structure
Mayumi Kamada +*, Sachi Kimura, Mikito Toda , Masami Takata +,
Kazuki Joe + + Graduate School of Humanities and Science,
Information and Computer Sciences, Nara Womens University
Departments of physics, Nara Womens University
Slide 2
Outline Motivation Flexibility Docking Feature Extraction using
Motion Analysis Conclusions and Future Work
Slide 3
Motivation Protein in biological molecules Docking Transform
oneself and Combine with other materials Prediction of Docking
Prediction of resultant functions
Slide 4
Existing Docking Simulation Predicted structures from docking
structure A structure B Docking simulation PDB * Rigid structures *
Protein Data Bank Fluctuating in living cells Low prediction
accuracy Docking simulation Considering fluctuations
Slide 5
Flexibility Docking Predicted structures from docking structure
A structure B Docking simulation PDB Flexibility handling
Considering fluctuation of proteins in living cells Extraction of
fluctuated structures Consideration of structural fluctuation of
proteins
Slide 6
Flexibility Handling Flexibility handling MD Filter output file
Representative structure Filtering Selection of representative
structures from similar structures Molecular dynamic simulation(MD)
Simulation of motion of molecules in a polyatomic system output
file output file output file output file Representative structure
Create filters by using RMSD
Slide 7
Filters using RMSD RMSD(Root Mean Square Deviation) Comparison
of the similarity of two structures Propose two filtering
algorithms Maximum RMSD selection filter Below RMSD 1 deletion
filter Result Useful for the heat fluctuation condition RMSD
Unification of topology information Lapse of information Feature
extraction focusing on Protein Motion not Structure
Slide 8
Capture Protein Motion MD Wavelet transform Clustering
Continuous wavelet transform: Morlet wavelet Clustering algorithm:
Affinity Propagation Selection of representative motions Feature
extraction The frequency may change momentarily!
Slide 9
Target Protein 1TIB Residue length: 269 MD simulation Software:
AMBER Simulation run time: 2 nsec Result data files: 200 Space
coordinates of C atoms
Slide 10
Singular Value Decomposition SVD(Singular value decomposition)
Definition: Unitary matrix U: Left-singular vectors Spatial motion
Unitary matrix V: Right-singular vectors Frequency fluctuation
Matrix A: At time step i (t i ) Components column C row Frequency
matrix-size of A: 807199
Slide 11
Singular Value Decomposition SVD(Singular value decomposition)
Definition: Unitary matrix U: Left-singular vectors Spatial motion
Unitary matrix V: Right-singular vectors Frequency fluctuation
Matrix A: At time step i (t i ) Components column C row Frequency
matrix-size of A: 807199
Slide 12
Verification of Reproducibility Singular values and principal
components N=1 N=4 N=6 N=8 M=1 M=4 M=6 M=8 Left Singular Vectors
(Spatial motion) Right Singular Vectors (Frequency
fluctuation)
Slide 13
Reproducibility Using the eight principal components, the
motion expressed by 199 components can be reproduced ! Almost
adjusted !
Slide 14
Examination (1) Each of singular values (2)The first singular
value Accounted for about 30% over Expression of the original
motion Possible by the six singular values The first singular value
is useful
Slide 15
Clustering Analysis Focus on the first principal component
Definition Similarities and Preference Clustering by using the
above values
Slide 16
Similarities (1) For left singular vectors Difference of
spatial directs Inner products Similarity : Same
directionDifferential direction K ij :Value 10 C
Slide 17
Similarities (2) For right singular vectors Difference between
distributions of spectrum Hellinger Distance Similarity:
Slide 18
Clustering Method Affinity propagation(AP) Brendan J. Frey and
Delbert Dueck Clustering by Passing Messages Between Data Points .
Science 315, 972 976. 2007 Obtain Exemplars: cluster centers
Preference Left singular vectors Average of similarities Right
singular vectors minimum of similarities maximum of similarities
minimum
Slide 19
Similarities between Left Singular Vectors
Slide 20
Clustering of Left Singular Vectors
Slide 21
Similarities between Right Singular Vectors
Slide 22
Clustering of Right Singular Vectors
Slide 23
Discussions Each of motions Spatial motion Repetition of
several similar spatial motions in time variation Frequency
fluctuation Repetition of similar frequency patterns in time
variation Relationship Characteristic Frequency fluctuation Group
transition on spatial motion
Slide 24
Conclusions and Future Work Flexibility docking Flexibility
handling: MD and Filter Feature extraction based motion Wavelet
analysis Analysis of motions Clustering Future work Collective
motion Relationship Perform the docking simulation
Slide 25
Conclusions and Future Work Flexibility docking Flexibility
handling: MD and Filter Feature extraction based motion Wavelet
analysis Analysis of motions Clustering Future work Collective
motion Relationship Perform the docking simulation