1
Tutorial
Homology Modeling of Human Cyclin-Dependent Kinase 3
with Multiple Templates
Prof. Dr. Walter Filgueira de Azevedo Jr.
azevedolab.net
In this tutorial we show how to model the
three-dimensional structure of cyclin-
dependent kinase (CDK3), using
available experimental structures. CDKs
have been proposed as protein targets
for development of anticancer drugs.
Specifically for CDK3, this enzyme has
been shown to be overexpressed in
breast cancer (Cue et al., 2015).
There are hundreds of CDK structures
available in the Protein Data Bank, but
no one for human CDK3. We’ll use the
program MODELLER (Sali & Blundell,
1993) to carry out homology modelling
of CDK3 structure. The homology
modelling of human CDK3 has been
reported in 2009 (Perez et al., 2009).
To follow this tutorial you need to have
access to internet and the last version of
MODELLER installed on your computer. 2
Introduction
Homology model of CDK3 (Perez et al., 2009).
In the flowchart shown here, we see the
main steps to homology model a protein
structure, using experimental structures
available in the Protein Data Bank
(PDB).
In the next pages, it is shown a
description of each step.
3
Flowchart
Search PDB for
templates
Multiple alignment of
templates and
sequence to be
modelled using
MUSCLE
Download
sequence(s) and
templates from PDB
Are there
templates
?
Homology modeling
using MODELLER
Analysis of homology
models
Yes
End
End
No
Sequence to be
modelled
First access the Genbank at http://www.ncbi.nlm.nih.gov/genbank/.
4
Download Sequence To Be Modelled
Then choose Protein…
5
Download Sequence To Be Modelled
Type in protein name and click on Search.
6
Download Sequence To Be Modelled
You have the entries for the keywords you used. We click on the first entry, which has
the sequence for human CDK3.
7
Download Sequence To Be Modelled
We have additional information about CDK3, and then we click on FASTA.
8
Download Sequence To Be Modelled
The amino acid sequence for CDK3 is shown below.
9
Download Sequence To Be Modelled
We click on Send to.
10
Download Sequence To Be Modelled
We choose a File in FASTA format.
11
Download Sequence To Be Modelled
Download it. Copy this FASTA file to the directory where you will carry out your
homology modelling.
12
Download Sequence To Be Modelled
Open your FASTA file with an editor, as vi for instance. Copy the sequence that will be
used to search the PDB.
13
Download Sequence To Be Modelled
Go to http://www.rcsb.org/pdb/home/home.do .
14
Search PDB for Templates
Click on Advanced Search.
15
Search PDB for Templates
Choose Sequence (BLAST/FASTA/PSI-BLAST).
16
Search PDB for Templates
We change the Search Tool to PSI-BLAST.
17
Search PDB for Templates
18
Now we can <Ctrl> C the sequence in the field Sequence.
Search PDB for Templates
We have the sequence now and click on Submit Query.
19
Search PDB for Templates
PDB returns all structures that show similarity with the probe sequence.
20
Search PDB for Templates
To see the alignment of the probe sequence with a specific sequence for which there is
a structure, we click on Display Full Alignment.
21
Search PDB for Templates
The alignment is shown below.
22
Search PDB for Templates
We uncheck all structures to pick up only 10 structures which were solved to resolution
better than 2.0 Å. We may choose only one structure if you want or as many templates
you think is necessary.
23
Search PDB for Templates
To download PDB and FASTA files, we click on Filter>Download Checked, as shown
below.
24
Search PDB for Templates
Then we click on Launch Download Application. We follow all the steps to download
PDB files as separated structures and FASTA as one file.
25
Search PDB for Templates
We access MUSCLE at http://www.ebi.ac.uk/Tools/msa/muscle/ to carry out multiple
alignment of the sequence to be modelled against the sequences for the templates.
26
Multiple Alignment with MUSCLE
We <Ctrl> C the sequence to be modelled and the sequences for all templates obtained
from the PDB, as shown bellow.
27
Multiple Alignment with MUSCLE
Then we select FASTA as the output format.
28
Multiple Alignment with MUSCLE
We click on Submit.
29
Multiple Alignment with MUSCLE
We have to wait…
30
Multiple Alignment with MUSCLE
Then we get the aligned sequences, as shown below. We have to save these aligned
sequences to be edited and used as input to run MODELLER for homology modelling.
31
Multiple Alignment with MUSCLE
To run MODELLER we need the PDB files for all templates, the Python input file, and
the multiple alignment file (mult.ali). We have part of the file mult.ali shown below.
32
Homology Modeling with MODELLER
>P1;3ezr
structureX:3ezr:1:A:298 :A:CDK2:Homo sapiens: 1.80:0.20
MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIR-------VPSTAIREISLLKELNH
PNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHS
HRVLHRDLKPQNLLINTEGAIKLADF------------------TLWYRAPEILLGCKYY
STAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSF
PKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL--
------*
>P1;3pxf
structureX:3pxf:1:A:298 :A:CDK2:Homo sapiens: 1.80:0.20
MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIRLDTETEGVPSTAIREISLLKELNH
PNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHS
HRVLHRDLKPQNLLINTEGAIKLADFGLARAFGVPVRTYTHEVVTLWYRAPEILLGCKYY
STAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSF
PKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL--
------*
...... (More sequences)
>P1;HsCDK3
sequence:HsCDK3:::::::0.00: 0.0
MDMFQKVEKIGEGTYGVVYKAKNRETGQLVALKKIRLDLEMEGVPSTAIREISLLKELKH
PNIVRLLDVVHNERKLYLVFEFLSQDLKKYMDSTPGSELPLHLIKSYLFQLLQGVSFCHS
HRVIHRDLKPQNLLINELGAIKLADFGLARAFGVPLRTYTHEVVTLWYRAPEILLGSKFY
TTAVDIWSIGCIFAEMVTRKALFPGDSEIDQLFRIFRMLGTPSEDTWPGVTQLPDYKGSF
PKWTRKGLEEIVPNLEPEGRDLLMQLLQYDPSQRITAKTALAHPYFSS-PEPSPAARQYV
LQRFRH*
Below we have a description of each field for the header of a template sequence, as
shown in the mult.ali file.
33
>P1;3ezr
structureX:3ezr:1:A:298 :A:CDK2:Homo sapiens: 1.80:0.20
All template sequences should start with >P1; followed by PDB access code for the template
The keyword structureX is used to indicate that the following sequence is for a template.
PDB access code for a template.
First residue in the template to be used in the modelling
Chain ID for the first residue in the template to be used in the modelling
Last residue in the template to be used in the modelling
Chain ID for the first residue in the template to be used in the modelling
Protein name
Protein source
Crystallographic resolution in Å
Crystallographic R-factor
Homology Modeling with MODELLER
Below we have a description of each field for the header of the sequence to be
modelled, as shown in the mult.ali file.
34
>P1;HsCDK3
sequence:HsCDK3:1:A:305:A: CDK3:Homo sapiens:2.00: -1.0
All sequences should start with >P1; followed by an identification for the sequence
The keyword sequence is used to indicate that the following sequence is the one to be modelled
Identification for the sequence
First residue in the model
Chain ID for the first residue in the model
Last residue in the model
Chain ID for the first residue in the model
Protein name
Protein source
Leave as 2.00
Leave as -1.00
Homology Modeling with MODELLER
The file model_mult.py is used as input to run homology modelling with multiple
templates. Each line is explained after #.
35
Homology Modeling with MODELLER
# Comparative modeling with multiple templates for Human CDK3
from modeller import * # Load standard Modeller classes
from modeller.automodel import * # Load the automodel class
log.verbose() # request verbose output
env = environ() # create a new MODELLER environment to build this model
a = automodel(env,
alnfile = 'mult.ali', # alignment filename
knowns = ('3ezr','3pxf',
'3pxq','3pxr',
'3pxy','3pxz',
'3py0','3ql8',
'3qqf','3qqg'), # PDB access codes of the templates
sequence = 'HsCDK3') # code of the target to be modelled
a.starting_model = 1 # index of the first model
a.ending_model = 100 # index of the last model
a.make() # do the actual comparative modeling
36
Homology Modeling with MODELLER
There are versions of the program MODELLER for Windows, Mac OS X and Linux.
Here we describe the commands to run on Windows. First you have to click on the
Command Prompt. A Command Prompt is a terminal for typing DOS commands in the
Command Prompt window. At the Command Prompt, you can execute programs by
typing their names. Below we have the Command Prompt.
37
Homology Modeling with MODELLER
All files needed to run MODELLER should be in the same directory. In this tutorial they
are in the directory C:\Users\Walter\Teaching1\Tutorials\HomologyModeling\HsCDK3 .
Type cd C:\Users\Walter\Teaching1\Tutorials\HomologyModeling\HsCDK3 to go to this
directory, as shown below. Don’t forget to press <Enter> after typing the command. The
command cd means “change directory”, it changes from the present directory
C:\User\Walter to the new directory,
C:\Users\Walter\Teaching1\Tutorials\HomologyModeling\HsCDK3 .
38
Homology Modeling with MODELLER
You need to have MODELLER installed on your computer to run this homology
modelling. Type the command dir to check all files in the directory, as shown below. We
have 10 PDB files (templates), the Python file (model_mult.py), and the alignment file
(mult.ali). We are ready to go…
39
Homology Modeling with MODELLER
Type python model_mult.py > model_mult.log, as shown below. This will run
MODELLER using model_mult.py as input file. This command will create a log file,
named model_mult.log, which will be in the same directory and can be used to check
the results.
40
Homology Modeling with MODELLER
Press <Enter> and the command to run MODELLER will be executed, as shown below.
Since we asked to generate 100 models, this may take several minutes. It depends on
your computer.
41
Homology Modeling with MODELLER
We can follow the progress of our modelling opening with the window explorer the
directory C:\Users\Walter\Teaching1\Tutorials\HomologyModeling\HsCDK3, as shown
below. The file HsCDK3.B99990001 is the PDB for the first model.
42
Homology Modeling with MODELLER
We will generate 100 models, we have 8 so far.
Finally, 100 models.
43
Homology Modeling with MODELLER
There several ways to evaluate the quality of the models. MODELLER generates a log
file (model_mult.log) with a table with the molecular PDF for each generated model,
which can be used to select the best mode. Part of this table is show below.
44
Homology Modeling with MODELLER
Model HsCDK3.B99990064.pdb shows the
lowest molecular PDF.
45
Homology Modeling with MODELLER
Homology model HsCDK3.B99990064.pdb .
Colophon
This tutorial was produced in a DELL Inspiron notebook with 6GB of memory, a 700
GB hard disk, and an Intel® Core® i5-3337U CPU @ 1.80 GHz running Windows 8.1.
Text and layout were generated using PowerPoint 2013 and graphical figures were
generated by Molegro Virtual Docker and Print Screen Images captured from
Genbank (http://www.ncbi.nlm.nih.gov/genbank/), Protein Data Bank
(http://www.rcsb.org/pdb/home/home.do), and MUSCLE (
http://www.ebi.ac.uk/Tools/msa/muscle/). This tutorial uses Arial font. Input files use
Courier New font.
46
Author
I graduated in Physics (BSc in Physics) at University of Sao Paulo (USP) in 1990. I
completed a Master Degree in Applied Physics also at USP (1992), working under
supervision of Prof. Yvonne P. Mascarenhas, the founder of crystallography in Brazil.
My dissertation was about X-ray crystallography applied to organometallics compounds
(De Azevedo Jr. et al.,1995).
During my PhD I worked under supervision of Prof. Sung-Hou Kim (University of
California, Berkeley. Department of Chemistry), on a split PhD program with a
fellowship from Brazilian Research Council (CNPq)(1993-1996). My PhD was about the
crystallographic structure of CDK2 (Cyclin-Dependent Kinase 2) (De Azevedo Jr. et al.,
1996). In 1996, I returned to Brazil. In April 1997, I finished my PhD and moved to Sao Jose do Rio Preto (SP,
Brazil) (UNESP) and worked there from 1997 to 2005. In 1997, I started the Laboratory of Biomolecular Systems-
Department of Physics-UNESP - São Paulo State University. In 2005, I moved to Porto Alegre/RS (Brazil), where I
am now. My current position is coordinator of the Laboratory of Computational Systems Biology at Pontifical
Catholic University of Rio Grande do Sul (PUCRS). My research interests are focused on application of computer
simulations to analyze protein-ligand interactions. I'm also interested in the development of biological inspired
computing and machine learning algorithms. We apply these algorithms to molecular docking simulations, protein-
ligand interactions and other scientific and technological problems. I published over 160 scientific papers about
protein structures and computer simulation methods applied to the study of biological systems (H-index: 33).
These publications have over 3700 citations. I am editor for the following journals:
47
Cui J, Yang Y, Li H, Leng Y, Qian K, Huang Q, Zhang C, Lu Z, Chen J, Sun T, Wu R,
Sun Y, Song H, Wei X, Jing P, Yang X, Zhang C. MiR-873 regulates ERa transcriptional
activity and tamoxifen resistance via targeting CDK3 in breast cancer cells. Oncogene.
2015; 34(30):3895-907.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high
throughput. Nucleic Acids Res. 2004; 32(5):1792-7.
Perez PC, Caceres RA, Canduri F, de Azevedo WF Jr. Molecular modeling and
dynamics simulation of human cyclin-dependent kinase 3 complexed with inhibitors.
Comput Biol Med. 2009; 39(2):130-40.
Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J.
Mol. Biol. 1993; 234: 779-815.
Last update on July 3rd 2016
48
References