Download - Homology Modeling of Human Cyclin-Dependent Kinase 3 · 2018-01-18 · Homology Modeling with MODELLER Model HsCDK3.B99990064.pdb shows the lowest molecular PDF. 45 Homology Modeling

1

Tutorial

Homology Modeling of Human Cyclin-Dependent Kinase 3

with Multiple Templates

Prof. Dr. Walter Filgueira de Azevedo Jr.

[email protected]

azevedolab.net

In this tutorial we show how to model the

three-dimensional structure of cyclin-

dependent kinase (CDK3), using

available experimental structures. CDKs

have been proposed as protein targets

for development of anticancer drugs.

Specifically for CDK3, this enzyme has

been shown to be overexpressed in

breast cancer (Cue et al., 2015).

There are hundreds of CDK structures

available in the Protein Data Bank, but

no one for human CDK3. We’ll use the

program MODELLER (Sali & Blundell,

1993) to carry out homology modelling

of CDK3 structure. The homology

modelling of human CDK3 has been

reported in 2009 (Perez et al., 2009).

To follow this tutorial you need to have

access to internet and the last version of

MODELLER installed on your computer. 2

Introduction

Homology model of CDK3 (Perez et al., 2009).

In the flowchart shown here, we see the

main steps to homology model a protein

structure, using experimental structures

available in the Protein Data Bank

(PDB).

In the next pages, it is shown a

description of each step.

3

Flowchart

Search PDB for

templates

Multiple alignment of

templates and

sequence to be

modelled using

MUSCLE

Download

sequence(s) and

templates from PDB

Are there

templates

?

Homology modeling

using MODELLER

Analysis of homology

models

Yes

End

End

No

Sequence to be

modelled

First access the Genbank at http://www.ncbi.nlm.nih.gov/genbank/.

4

Download Sequence To Be Modelled

http://www.ncbi.nlm.nih.gov/genbank/

Then choose Protein…

5


Type in protein name and click on Search.

6


You have the entries for the keywords you used. We click on the first entry, which has

the sequence for human CDK3.

7


We have additional information about CDK3, and then we click on FASTA.

8


The amino acid sequence for CDK3 is shown below.

9


We click on Send to.

10


We choose a File in FASTA format.

11


Download it. Copy this FASTA file to the directory where you will carry out your

homology modelling.

12


Open your FASTA file with an editor, as vi for instance. Copy the sequence that will be

used to search the PDB.

13


Go to http://www.rcsb.org/pdb/home/home.do .

14

Search PDB for Templates

http://www.rcsb.org/pdb/home/home.do

Click on Advanced Search.

15


Choose Sequence (BLAST/FASTA/PSI-BLAST).

16


We change the Search Tool to PSI-BLAST.

17


18

Now we can <Ctrl> C the sequence in the field Sequence.


We have the sequence now and click on Submit Query.

19


PDB returns all structures that show similarity with the probe sequence.

20


To see the alignment of the probe sequence with a specific sequence for which there is

a structure, we click on Display Full Alignment.

21


The alignment is shown below.

22


We uncheck all structures to pick up only 10 structures which were solved to resolution

better than 2.0 Å. We may choose only one structure if you want or as many templates

you think is necessary.

23


To download PDB and FASTA files, we click on Filter>Download Checked, as shown

below.

24


Then we click on Launch Download Application. We follow all the steps to download

PDB files as separated structures and FASTA as one file.

25


We access MUSCLE at http://www.ebi.ac.uk/Tools/msa/muscle/ to carry out multiple

alignment of the sequence to be modelled against the sequences for the templates.

26

Multiple Alignment with MUSCLE

http://www.ebi.ac.uk/Tools/msa/muscle/

We <Ctrl> C the sequence to be modelled and the sequences for all templates obtained

from the PDB, as shown bellow.

27


Then we select FASTA as the output format.

28


We click on Submit.

29


We have to wait…

30


Then we get the aligned sequences, as shown below. We have to save these aligned

sequences to be edited and used as input to run MODELLER for homology modelling.

31


To run MODELLER we need the PDB files for all templates, the Python input file, and

the multiple alignment file (mult.ali). We have part of the file mult.ali shown below.

32

Homology Modeling with MODELLER

>P1;3ezr

structureX:3ezr:1:A:298 :A:CDK2:Homo sapiens: 1.80:0.20

MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIR-------VPSTAIREISLLKELNH

PNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHS

HRVLHRDLKPQNLLINTEGAIKLADF------------------TLWYRAPEILLGCKYY

STAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSF

PKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL--

------*

>P1;3pxf

structureX:3pxf:1:A:298 :A:CDK2:Homo sapiens: 1.80:0.20

MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIRLDTETEGVPSTAIREISLLKELNH

PNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHS

HRVLHRDLKPQNLLINTEGAIKLADFGLARAFGVPVRTYTHEVVTLWYRAPEILLGCKYY

STAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSF

PKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL--

------*

...... (More sequences)

>P1;HsCDK3

sequence:HsCDK3:::::::0.00: 0.0

MDMFQKVEKIGEGTYGVVYKAKNRETGQLVALKKIRLDLEMEGVPSTAIREISLLKELKH

PNIVRLLDVVHNERKLYLVFEFLSQDLKKYMDSTPGSELPLHLIKSYLFQLLQGVSFCHS

HRVIHRDLKPQNLLINELGAIKLADFGLARAFGVPLRTYTHEVVTLWYRAPEILLGSKFY

TTAVDIWSIGCIFAEMVTRKALFPGDSEIDQLFRIFRMLGTPSEDTWPGVTQLPDYKGSF

PKWTRKGLEEIVPNLEPEGRDLLMQLLQYDPSQRITAKTALAHPYFSS-PEPSPAARQYV

LQRFRH*

Below we have a description of each field for the header of a template sequence, as

shown in the mult.ali file.

33

>P1;3ezr

structureX:3ezr:1:A:298 :A:CDK2:Homo sapiens: 1.80:0.20

All template sequences should start with >P1; followed by PDB access code for the template

The keyword structureX is used to indicate that the following sequence is for a template.

PDB access code for a template.

First residue in the template to be used in the modelling

Chain ID for the first residue in the template to be used in the modelling

Last residue in the template to be used in the modelling

Chain ID for the first residue in the template to be used in the modelling

Protein name

Protein source

Crystallographic resolution in Å

Crystallographic R-factor


Below we have a description of each field for the header of the sequence to be

modelled, as shown in the mult.ali file.

34

>P1;HsCDK3

sequence:HsCDK3:1:A:305:A: CDK3:Homo sapiens:2.00: -1.0

All sequences should start with >P1; followed by an identification for the sequence

The keyword sequence is used to indicate that the following sequence is the one to be modelled

Identification for the sequence

First residue in the model

Chain ID for the first residue in the model

Last residue in the model

Chain ID for the first residue in the model

Protein name

Protein source

Leave as 2.00

Leave as -1.00


The file model_mult.py is used as input to run homology modelling with multiple

templates. Each line is explained after #.

35


# Comparative modeling with multiple templates for Human CDK3

from modeller import * # Load standard Modeller classes

from modeller.automodel import * # Load the automodel class

log.verbose() # request verbose output

env = environ() # create a new MODELLER environment to build this model

a = automodel(env,

alnfile = 'mult.ali', # alignment filename

knowns = ('3ezr','3pxf',

'3pxq','3pxr',

'3pxy','3pxz',

'3py0','3ql8',

'3qqf','3qqg'), # PDB access codes of the templates

sequence = 'HsCDK3') # code of the target to be modelled

a.starting_model = 1 # index of the first model

a.ending_model = 100 # index of the last model

a.make() # do the actual comparative modeling

36


There are versions of the program MODELLER for Windows, Mac OS X and Linux.

Here we describe the commands to run on Windows. First you have to click on the

Command Prompt. A Command Prompt is a terminal for typing DOS commands in the

Command Prompt window. At the Command Prompt, you can execute programs by

typing their names. Below we have the Command Prompt.

37


All files needed to run MODELLER should be in the same directory. In this tutorial they

are in the directory C:\Users\Walter\Teaching1\Tutorials\HomologyModeling\HsCDK3 .

Type cd C:\Users\Walter\Teaching1\Tutorials\HomologyModeling\HsCDK3 to go to this

directory, as shown below. Don’t forget to press <Enter> after typing the command. The

command cd means “change directory”, it changes from the present directory

C:\User\Walter to the new directory,

C:\Users\Walter\Teaching1\Tutorials\HomologyModeling\HsCDK3 .

38


You need to have MODELLER installed on your computer to run this homology

modelling. Type the command dir to check all files in the directory, as shown below. We

have 10 PDB files (templates), the Python file (model_mult.py), and the alignment file

(mult.ali). We are ready to go…

39


Type python model_mult.py > model_mult.log, as shown below. This will run

MODELLER using model_mult.py as input file. This command will create a log file,

named model_mult.log, which will be in the same directory and can be used to check

the results.

40


Press <Enter> and the command to run MODELLER will be executed, as shown below.

Since we asked to generate 100 models, this may take several minutes. It depends on

your computer.

41


We can follow the progress of our modelling opening with the window explorer the

directory C:\Users\Walter\Teaching1\Tutorials\HomologyModeling\HsCDK3, as shown

below. The file HsCDK3.B99990001 is the PDB for the first model.

42


We will generate 100 models, we have 8 so far.

Finally, 100 models.

43


There several ways to evaluate the quality of the models. MODELLER generates a log

file (model_mult.log) with a table with the molecular PDF for each generated model,

which can be used to select the best mode. Part of this table is show below.

44


Model HsCDK3.B99990064.pdb shows the

lowest molecular PDF.

45


Homology model HsCDK3.B99990064.pdb .

Colophon

This tutorial was produced in a DELL Inspiron notebook with 6GB of memory, a 700

GB hard disk, and an Intel® Core® i5-3337U CPU @ 1.80 GHz running Windows 8.1.

Text and layout were generated using PowerPoint 2013 and graphical figures were

generated by Molegro Virtual Docker and Print Screen Images captured from

Genbank (http://www.ncbi.nlm.nih.gov/genbank/), Protein Data Bank

(http://www.rcsb.org/pdb/home/home.do), and MUSCLE (

http://www.ebi.ac.uk/Tools/msa/muscle/). This tutorial uses Arial font. Input files use

Courier New font.

46

http://www.ncbi.nlm.nih.gov/genbank/

http://www.rcsb.org/pdb/home/home.do

http://www.ebi.ac.uk/Tools/msa/muscle/

Author

I graduated in Physics (BSc in Physics) at University of Sao Paulo (USP) in 1990. I

completed a Master Degree in Applied Physics also at USP (1992), working under

supervision of Prof. Yvonne P. Mascarenhas, the founder of crystallography in Brazil.

My dissertation was about X-ray crystallography applied to organometallics compounds

(De Azevedo Jr. et al.,1995).

During my PhD I worked under supervision of Prof. Sung-Hou Kim (University of

California, Berkeley. Department of Chemistry), on a split PhD program with a

fellowship from Brazilian Research Council (CNPq)(1993-1996). My PhD was about the

crystallographic structure of CDK2 (Cyclin-Dependent Kinase 2) (De Azevedo Jr. et al.,

1996). In 1996, I returned to Brazil. In April 1997, I finished my PhD and moved to Sao Jose do Rio Preto (SP,

Brazil) (UNESP) and worked there from 1997 to 2005. In 1997, I started the Laboratory of Biomolecular Systems-

Department of Physics-UNESP - São Paulo State University. In 2005, I moved to Porto Alegre/RS (Brazil), where I

am now. My current position is coordinator of the Laboratory of Computational Systems Biology at Pontifical

Catholic University of Rio Grande do Sul (PUCRS). My research interests are focused on application of computer

simulations to analyze protein-ligand interactions. I'm also interested in the development of biological inspired

computing and machine learning algorithms. We apply these algorithms to molecular docking simulations, protein-

ligand interactions and other scientific and technological problems. I published over 160 scientific papers about

protein structures and computer simulation methods applied to the study of biological systems (H-index: 33).

These publications have over 3700 citations. I am editor for the following journals:

47

http://scripts.iucr.org/cgi-bin/paper?S0108270194009868

http://www.ncbi.nlm.nih.gov/pubmed/9552391

http://benthamscience.com/journals/current-drug-targets/editorial-board/#top

http://benthamscience.com/journals/current-drug-targets/editorial-board/#top

http://benthamscience.com/journals/current-bioinformatics/editorial-board/#top

http://benthamscience.com/journals/current-bioinformatics/editorial-board/#top

http://www.eurekaselect.com/73567/article

http://www.eurekaselect.com/73567/article

https://peerj.com/Walter/

https://peerj.com/Walter/

http://computerscience.jacobspublishers.com/

http://computerscience.jacobspublishers.com/

Cui J, Yang Y, Li H, Leng Y, Qian K, Huang Q, Zhang C, Lu Z, Chen J, Sun T, Wu R,

Sun Y, Song H, Wei X, Jing P, Yang X, Zhang C. MiR-873 regulates ERa transcriptional

activity and tamoxifen resistance via targeting CDK3 in breast cancer cells. Oncogene.

2015; 34(30):3895-907.

Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high

throughput. Nucleic Acids Res. 2004; 32(5):1792-7.

Perez PC, Caceres RA, Canduri F, de Azevedo WF Jr. Molecular modeling and

dynamics simulation of human cyclin-dependent kinase 3 complexed with inhibitors.

Comput Biol Med. 2009; 39(2):130-40.

Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J.

Mol. Biol. 1993; 234: 779-815.

Last update on July 3rd 2016

48

References