of 29
7/31/2019 Dissertation Adam
1/29
UNIVERSITEIT VAN AMSTERDAM
A Coarse-Grained Model for Self-Assembling collagen-silk-like
block-copolymer
Bruno Barbosa Rodrigues
Advisor: Peter G. Bolhuis
Co-Advisor: Marieke Schor
July 2009
7/31/2019 Dissertation Adam
2/29
Abstract
We present the results of the sequence design of the adapted off-lattice minimalist model based on
the original Head-Gordon (HG) for a monodisperse, biodegradable, biocompatible and hydrophilic
collagen-like sequence. This sequence is part of a block copolymer made of two hydrophilic collagen-
like blocks flanking a hydrophobic silk-like block. We start with atomistic simulations of short pep-
tides to extract the bond distances, bend and dihedral angles distributions to fit the coarse-grained
(CG) force field (FF) and adapt the HG model for this collagen-like sequence. Due to the high con-
centration of proline and charged groups at low pH in the sequence, different distributions were found
for the dihedral angles according to the relative position of proline in the chain. Thus, we extended
the original three-letters minimalist model combined with a previous adapted shifted and rescaled
non-bonded potential developed for the silk-block sequence. We compared the results of the radius
of gyration (Rg) for different number of residues with the experimental value measured for a 400
residues long sequence, coming up with a scaling law between the R g and the number of residues in
the collagen-like sequence. We concluded that the exponent is close to the Flory exponent.
Keywords: Collagen-silk-like block copolymers, protein fibers, Molecular Dynamics, Coarse-
Grained model.
7/31/2019 Dissertation Adam
3/29
Contents
1 Introduction p. 1
2 Methods p. 4
3 Results p.13
4 Conclusions and Future Perspectives p.19
Acknowledgements p.20
Appendix A -- Dihedral Angles p.21
Bibliography p.23
7/31/2019 Dissertation Adam
4/29
1
1 Introduction
Protein based block copolymers that form fibers upon a certain stimulus (e.g. change in pH)[16]
are potentially very useful as biocompatible nanomaterials. One example is a block copolymer made
of two hydrophilic collagen-like blocks flanking a hydrophobic silk-like block. The collagen block
is responsible for limiting the growth direction of the silk block and drives it to form fibers with
improved transversal strength. The sequence of the collagen block is chosen in a way that, unlike
the most common gelatin in nature that forms a gel in the presence of water, it remains soluble and
unstructured under various conditions of pH and temperature. Previously, a coarse grained model
of the very regular, highly structured silk-like block has been developed [14]. However, a different
strategy is needed to develop such a model for the unstructured collagen-like block.
In the first section of this chapter we give an overview on the biocompatible materials and their
applications; the second section is dedicated to the protein polymers; the two last sections are ded-
icated to introduce the computer simulation techniques as tools to understand and predict proteindynamics, the limitations and possible solutions, ending up with an overview of this work in the last
section.
1.1 Biocompatible Materials
Nature produces a wide range of different materials, but all of them for its own purposes, like
actin, which is the main constituent of cytoskeleton, or collagen, the major component of extracel-
lular matrix. Genes can encode identical amino acid sequences (also called primary structures) with
absolute control over molecular weight, composition, sequence, and stereochemistry. Control of this
protein production can lead us to build materials that could fit human desires[4] such as chemo-
mechanical fibers[5], tissue engineering [6, 7] and drug delivery [8, 9].
Biocompatible materials are synthetic or natural materials intended to interface with biological
systems in intimate contact with living tissue. One way to build such high value materials is via ge-
netic and protein engineering, both components of a new polymer chemistry that provide the tools for
producing macromolecular polyamide copolymers of diversity and precision far beyond the current
capabilities of synthetic polymer chemistry.
7/31/2019 Dissertation Adam
5/29
2
1.2 Protein Polymers
Control and understanding of the behavior of biocompatible materials requires one to look deep
into the atomic compostion of those materials. The function of a protein is intrinsically connected to
its spatial conformation (or tertiary structure) and the process that drives the protein towards this stateis called protein folding.
Proteins can be designed as blocks of repetitive amino acid sequences, also called block copoly-
mers. This term was introduced by Capello in 1990s [2] who built diblock copolymers containing
silk-like and elastin-like amino acid sequences. From that time on, many other block copolymers
have been produced [3], coming from diblock to multiblock sequences. The self-assembly charac-
teristic of the proteins can be used to tune organization in the molecular level [10], wich can build
organized nanoscopic objects that end up in macroscopic materials with interesting properties.
Figure 1.1: Triblock copolymers consisting of a hydrophobic pH-responsive, silk-like block flanked
by hydrophilic, non-responsive collagen blocks self-assembling into m long fibrils. The collagen
block limits the assembly direction of the silk block [11].
The system under study on this dissertation is a collagen-like polypeptide [12] that flanks a silk-
like sequence [13] which self-assembles into a roll forming stacks under low pH [14] (fig. 1.1).
The triblock was produced experimentally by gene encoding of the yeast Pichia pastoris [15, 16]. The
collagen-like part [12] has a hydrophilic sequence that, although the normal collagens that form gels,
it remains soluble under many conditions of pH and temperature. With this random characteristic, the
collagen-like part can drives the silk-block towards the growth direction forming biocompatible fibers
that can have improved their transversal strength upon a stimulus (e.g. a change in pH).
1.3 Computer Simulations
During the last two decades, computer simulations have gradually been recognized as giving
complementary information about the properties of such new materials or to understand conforma-tional transitions in proteins [17, 18]. While most of the relevant dynamics in proteins, like folding
processes, occur on time scale of miliseconds and involve large molecular aggregates, atomistic simu-
7/31/2019 Dissertation Adam
6/29
3
lations can only address hundreds of nanoseconds in simulations for small proteins in explicit solvent,
which means at least four and one orders of magnitude lower in time and in degrees of freedom, re-
spectively.
In this framework, many techinques were developed to overcome the system size and simulation
time. Simplification of the system under study by integrating out details that are not actually important
to the overall result can lead us to address longer length and time scales. In this work we focused on
adapting the Head-Gordon (HG) [25, 26] model to apply for the collagen-like part of the collagen-
silk-collagen block-copolymer. With this improvement, it is possible to simulate big systems long
enough to predict the properties of the collagen-like block, and understand its role on driving the
growth direction of the silk-fiber.
1.4 Outline of this Dissertation
The HG, developed to study protein floding and aggregation, was adapted for the collagen-like
sequence in such a way that the dihedral angles are treated either with a cosine and a harmonic
expansion. The resulting fit of the parameters can be combined with previous results for the silk-
like block copolymer in order to enable a complete description of the collagen-silk-collagen block-
copolymer.
In Chapter 2 we describe the methods in molecular dynamics applied for the system under study,
as well as details concerned to the simulation procedure. We also introduce the adapted Head-Gordon
model to coarse-grain the polypetide lumping together an entire amino acid in one bead. In Chapter
3 we investigate the results of the atomistic as well as the coarse-grained simulations, present the
output distributions that were used to fit the adapted HG model and show the scaling law between
the radius of gyration and the number of residues, appearing to be in very good agreement with the
experimental value. Thus, in Chapter 4 we conclude the work and give the future perspectives.
7/31/2019 Dissertation Adam
7/29
4
2 Methods
In this chapter we discuss the atomistic and CG Molecular Dynamics (MD) simulation te-
chiniques which were used to study the collagen-silk block-copolymers. We performed the atom-
istic simulations using GROMACS version 4.0 [30], which provided the distributions for the bond
distances, bend and dihedral angles between the three and four consecutive C atoms respectively.
With these parameters, we fitted the adapted HG model that permitted the achievement of longer time
and length scales. The CG simulations were performed with the CM3D code[31].
2.1 Molecular Dynamics
The key of MD simulations is to integrate Newtons Law of motion for the N interacting particles
md2ri
dt2=
j=iFri j (2.1)
with accuracy and in such a way that the pairwise additive interactions do not scale as N2. Speed
up of the evaluation of both short-range and long-range forces is possible and for that we have some
techniques. Following the standard procedure to calculate the force by the derivative of a potential,
we can integrate Newtons equation of motion and many algorithms are available. A very simple
is the leap-frog integrator [32, 33] which is a Verlet-like second-order algorithm that evaluates the
velocities at half-integer time steps and uses these velocities to compute the new positions:
r(t+t) = r(t) +tv(t+t/2) (2.2)
v(t+t/2) = v(tt/2) +tf(t)
m(2.3)
The more sofisticated Gear predictor-corrector algorithm falls into the general finite difference
pattern, where the estimate of the positions, velocities etc. at time t+t may be obtained by Taylor
expansion about time t. These values are estimated and do not represent the true trajectory. After
calculating the forces at the new position rp(t+ t), the trajectories are corrected and the predicted
step is fed with the new information to iterate the corrected trajectory and rc(t+t) is now a better
approximation to the true position.
7/31/2019 Dissertation Adam
8/29
5
2.2 Atomistic Models
As described in the previous section, the key point of MD simulations is to solve Newtons equa-
tions of motion. But usually, the systems are defined by the potential energy rather than the forces,
which can then be easily calculated by the negative gradient of the potential: F(ri j) =
V(ri j).For that, many potential energy functions were developed to simulate protein systems, the so-called
force-fields (FF) [35]. The basic idea of a FF relies on mapping all the possible physical interactions
in the system and put them into a potential, like presented in eqs. 2.4 to 2.9:
V = Vnoncov +Vcov = (VLJ+VC) + (Vbond +Vbend +Vdih) (2.4)
VLJ(ri j) = 4i jC(12)i j
i j
ri j 12
C(6)i j
i j
ri j 6
(2.5)VC(ri j) =
1
40
qiqj
rri j(2.6)
Vbond(ri j) =1
2k
(bond)i j
ri jbi j
2(2.7)
Vbend(i jk) =1
2kbendi jk
i jk
0i jk
2(2.8)
Vdih(ijkl ) =1
2[C1(1 + cos()) +C2(1 cos(2))
+ C3(1 + cos(3)) +C4(1 cos(4))] (2.9)
where the potential is divided in bonded (or covalent) and non-bonded (non-covalent). The non-
bonded interactions contain a repulsion term, a dispersion term, and a Coulomb term. The repulsion
and dispersion terms are combined in the Lennard-Jones (or 6-12 interaction). In addition, (partially)
charged atoms act through the Coulomb term. Bonded interactions are based on a fixed list of atoms.
They are not exclusively pair interactions, but include 3- and 4-body interactions as well. There are
bond stretching (2-body), bond angle (3-body), and dihedral angle (4-body) interactions given by eqs.
2.7, 2.8 and 2.9, respectively.
There are many FF codes available nowadays, the most common are: AMBER [36], CHARMM
[37], GROMOS [38] and OPLS-AA [39]. Their potential energy is parametrized against experiments
and ab initio quantum mechanical calculations.
2.2.1 Setup atomistic simulations
The Atomistic simulations can reveal several details of the system, as it treats both the proteinunder consideration and the water. However, it is very difficult to reach long time and length scales
within this framework. In this case, the atomistic simulations were carried out only to extract the pa-
7/31/2019 Dissertation Adam
9/29
7/31/2019 Dissertation Adam
10/29
7/31/2019 Dissertation Adam
11/29
8
that time on many approaches emerged. The CG models evoluted towards different directions, that
differ basically in the relation between the complexity of the representation versus the complexity of
the parametrization. Harmonic models represent the system by beads (usually one per amino acid)
connected by elastic springs [21], and are used basically for the analysis of the principal modes [22],
requiring a previous knowledge of an equilibrium reference configuration. Go-like models [23] alsorequire an a priori knowledge of the native state and lack on representing the most intriguing fact of
protein folding: the dependence on the primary sequence. A lower level of reference dependence can
be found in the Head-Gordon model [25], which represents each amino acid as one bead. Two-bead
models [27] were developed adding a second bead on the centroid of the sidechain, increasing the
independence with a reference configuration but increasing the complexity of the energy terms and
inserting correlations on dihedral angles. Four-six bead models [28, 29] represent the sidechain as
one bead but explicitly consider the coordinates of the three heavy atoms of the backbone.
We chose the Head-Gordon (HG) [25] model to coarse-grain the collagen-like block, as it was
successfully applied to the silk-like sequence before [14]. In this model, in constrast with Go [23]
model, we do not need to know anything about the tertiary structure of the native state. However, we
need to face the more difficult aspect of the protein folding problem, namely its dependence on amino
acid sequence. The C atoms trace are taken to represent the protein backbone and the structural
details of amino acids and aqueous solvent are integrated out and replaced by effective bead-bead
interactions.
2.3.1 Head-Gordon Model
The original Head-Gordon (HG) model is an improvement of previous efforts of Thirumalai and
coworkers [53] that is more general to helical, sheet and / protein topologies. The 20-
letter amino acid sequence is converted to a three-letter code defined by the flavors: hydrophilic (L),
hydrophobic (B) and neutral (N). The idea of describing an amino acid as one bead can be visualized
in the figure below:
Figure 2.2: Schematic description of an amino acid as a bead in the CG model.
7/31/2019 Dissertation Adam
12/29
9
The force field in the original HG model is defined as:
H =
1
2k(0)
2
+ A(1 + cos) +B(1 cos) +C(1 + cos3) +D1 + cos+
4 (2.10)+
i,ji+3
4HS1
ri j
12S2
ri j
6
where we have the bond angle between three consecutive beads with 0 =105 being the equilib-
rium angle and k =20H/rad2. The dihedral angle between four consecutive beads can assume
different conformations depending on the region to be described, with the constants A, B, C and D
defining the shape of the distribution. The LJ potential determines the attraction-repulsion between
the beads of size with the three flavors i and i separated by ri j: B-B interactions are attractive and
represented by S1 = S2 =1; S1 =1/3 and S2 =0 apply for L-L and L-B interactions; and N-L, N-B
and N-N interactions have the constants S1 =1 and S2 =0. In the original HG model the bond lengths
are constrained by the RATTLE algorithm [54]. The non-bonded potentials are plotted in the Figure
below:
Figure 2.3: Non-bonded potential between neutral, hydrophilic, hydrophobic and proline (treated as
neutral) amino acids.
2.4 Adapted HG model
Based on the original Head-Gordon model and the adapted version for the silk part [14] of
the block-copolymer, we developed a four-letters minimalist model for the collagen-like block: hy-
drophilic (L), hydrophobic (B), neutral (N) and proline (P), where we defined proline as a separated
7/31/2019 Dissertation Adam
13/29
10
Figure 2.4: Full collagen sequence transcription from the 20-letters amino acid code to the adapted
four-letters minimalist HG code based on table 2.2.
flavour, due to its key role on the stiffness of the dihedral angles. In the table 2.2 below we show the
sequence mapping between 20-letter amino acid and adapted CG four letter code, which generates a
minimalist full sequence according to the fig. 2.4.
Name 20 4 Name 20 4
Glycine GLY / G N Aspargine ASN / N L
Alanine ALA / A B Proline PRO / P P
Glutamic Acid GLU / E L Glutamine GLN / Q L
Lysine LYS / K L Serine SER / S N
Table 2.2: Sequence mapping between 20-letter amino acid and adapted CG four letter code.
The FF also needs to be modified to cover the new changes in the dihedral angles, bond distances
(which are no longer constrained) and non-local interactions between beads. The new FF is given by
the equations 2.11, 2.12 and 2.13 below:
Hadap = b
1
2kb (bb0)
2 +
1
2k(0)
2
+
Vweak() +Vsti f f()
(2.11)
+ i,ji+3
4HS1
ri j 0
12S2
ri j0
6
Vweak() = h
6
k=0Akcosk()
(2.12)
Vsti f f() = B01
2hB1 (0)
2 (2.13)
where now the bond distances are explicitly described by the spring potential, as CM3D package used
for the CG simulations employs a reversible multiple time-step integrator, the stiffness is given by
kb = 33h and b0 = 3.84A consistent with the measured all-atom CC distance distributions. The
bond angles have the same treatment as in the original HG model, with the parameters given by k =
20h
and 0 = 105 extracted from the atomistic simulations. The parameters A
k, B0, B1 and 0 set
the dihedral angles between four subsequent C-atoms that show either periodic behaviour with two
minima (weak) or harmonic with one minima (stiff) depending on the four beads sequence, as the
7/31/2019 Dissertation Adam
14/29
11
Figure 2.5: Shifting the Lennard-Jones potential has the effect of shortening the range of the potential.The position of the nearest neighbors and the second nearest neighbors between strands in the silk part
are given by the dashed vertical lines. In the traditional L-J potential the second nearest neighbours
still feel the interaction.
new flavor (proline) plays an important role on the stiffness of the dihedral angle. The L-J potential
is shifted and scaled in order to coincide with the previous model optimized for the slik-part [14],
where the scaled parameter sets the range of the interaction and 0 shifts the potential to match
the size of the bead, as can be seen in the fig. 2.5. The strength of the non-bonded interaction kLJ =
4h was previously optimized for the silk part by M. Schor [14] by comparing the potentials of meanforce (PMF) from steered MD (SMD) simulations of the atomistic and the coarse-grained model. The
PMFs from SMD were calculated following the method of Park and Schulten [63, 64].
Simulations with CM3D were carried out for different sequence sizes. We started with the 30
residues structure built in MolMol and kept the positions of the Catoms. From this basic structure
we built sequences of 45, 60, 75, 90, 120, 150 and 200 beads by sticking together the pieces of short
peptides. We used VMD to manipulate the PDB files of the different sizes of collagen-like sequence.
Then the sequences were put in a cubic box (as the CG simulation employs implicit solvent, the shapeof the box is not relevant)with periodic boundary conditions in all x, y and z directions. The time step
used is 2fs.
We started relaxing the protein at low temperature. The first simulation was carried out with
velocities given by a Boltzmann distribution, at 30K for 1ns in a NVT ensemble with Nose-Hoover
chain with 4 units and time step of 1.6ps. Then we performed further 1ns simulations based on
previous velocities and positions and raising gradually the temperature up to 60K, 100K, 200K and,
finally, 300K. Then we equilibrated the system for 1ns in 300K and started a 50ns simulation to
sample the averages. The 1ns simulations were carried out in a local machine and took no more than
30 minutes at most for the 200 residues sequence. For the 50ns long we simulated on LISA cluster
7/31/2019 Dissertation Adam
15/29
12
with 1 processor 2 Intel Xeon 3.4GHz and took at most 20 hours for the longest sequence (200 beads).
2.5 Order Parameters
The configuration space is very high-dimensional and a visualization of its direct quantities is
meaningless. To overcome this problem, we need to project the phase space in one-dimensional
representations. They can monitor the (un)folding transitions, characterize native and unfolded states
and some of them can directly be compared to experimental measurements.
Since a protein chain is not a regular object and because it is subject to dynamic structural equi-
librium that involves motion, it is necessary to consider a statistical measure of a chain size. Then the
end-to-end distance is a key description for the statistical behavior of the chain.
A commonly used order parameter is the Root Mean Square Deviation (RMSD) from a refer-
ence structure, usually obtained experimentally by X-ray or NMR. RMSD was calculated minimizing
under rotations and translations and is defined as:
RMSD =
1
M
N
i=1
mi|ri rre fi |
2
12
(2.14)
A second order moment about the mean chain position is the radius of gyration. It describes the
overall spread of the molecule and it is defined as the root mean square distance of the collection ofatoms from their common centre of gravity:
Rg =
1
M
N
i=1
mi|ri rcm|2
12
(2.15)
where rcm denotes the position of center of mass of the protein. This measure gives a valuable way to
compare our CG method with the experimental data available for the collagen-like system.
Root Mean Square Fluctuation (RMSF) is a measure of the deviation between the position ofparticle i and some reference position.
RMSF =1
T
T
tj=1
ri(tj) ri
2(2.16)
where T is the time over which one wants to average, and ri is the reference position of particle i.
Typically this reference position will be the time-averaged position of the same particle i, ie. ri. Note
that, instead of averaging over the particles (as in RMSD), RMSF averages over the simulation time,
giving a value for each particle i, usually the C atoms.
7/31/2019 Dissertation Adam
16/29
13
3 Results
The results of the MD simulations described in the chapter 2 are presented here. We analyse the
bond distances, bend and dihedral angles and order parameters obtained from the Atomistic simu-
lations and use them to fit the CG adapted model for the collagen-like block copolymer. Thus, we
present the improved CG model and compare the results with the previous atomistic simulations.
Lastly, we summarize the results obtained from the adapted model analysing the order parameters.
3.1 Atomistic Simulations
Atomistic simulations of the short peptides provided enough information about bonds, bends and
dihedral distributions, while the 30 residues collagen-like simulation revealed several details about
the dynamics of the protein. Bond distance distributions (see fig. 3.1 (left)), strongly peaked around
3.84 0.12A representing the distance between C- atoms of subsequent amino acids, justify theuse of a stiff harmonic potential for the bonds. Also based on the distributions calculated from the
atomistic simulations, the rather narrow flexibility of the bend angles (see fig. 3.1 (right)) justifies the
same treatment as in the original HG model. The dihedral angles between four subsequent C-atoms
show periodic behaviour with two minima (flexible) or harmonic with one minima (stiff) leading to
an expansion of the model in such a way that these details can be taken into account.
Figure 3.1: Bond distances (left) and bend angles (right) distributions obtained from an all-atomsimulation of the short peptide A1.
7/31/2019 Dissertation Adam
17/29
14
Analyzing the dihedral distributions shown in the fig. 3.2, we can see three examples of rather
different potentials that were fitted either with cosine expansion or harmonic potentials depending on
the position of proline in the dihedral angle.
Figure 3.2: Negative Logarithm of the dihedrals distibutions of the sequences LNPL (left), PNLP
(center) and LLNL (right) obtained from an all-atom simulation of the short peptides in the minimalistfour-letters description. We can easily see how the distribution changes according to the relative
position of the Proline amino acid in the sequence.
After obtaining all the required parameters for the CG model, we present the results for
the 30 residues peptide which was simulated for 60ns to have a reference system and com-
pare with the new minimalist model. This sequence was simulated under the same procedure
adopted for the small pieces of the collagen-like sequence. We chose an intermediate sequence
|GNEGQPGQPGQNGQPGEPGSNGPQGSQGNP|to sample as many different amino acids as possible and cal-
culated the order parameters (see fig. 3.3) for the RMSD, Rg and RMSF.
Figure 3.3: RMSD (left), Rg (center) and RMSF (right) calculated for a 60ns simulation of the 30
residues sequence. It can be observed that RMSD reaches a plateau after 45ns and also R g does not
change its value, but instead remains flat during the whole simulation. The xaxis in the RMSF
graphic represents the C atoms and it can be seen that none of the atoms is more likely to find a
more stable position related to the others.
7/31/2019 Dissertation Adam
18/29
7/31/2019 Dissertation Adam
19/29
16
Figure 3.4: Comparison between the Atomistic short peptides and 30 residues CG simulations for
the negative logarithm of the dihedrals distibutions of the sequences LNPL (left), PNLP (center) and
LLNL (right). It is observed that the fitting for PNLP and LNPL sequences are good enough but the
agreement for the LLNL potential seems to show some histeric hindrance, or maybe the phase space
was not sampled enough.
3.3 Analysis of the collagen-like block
The analysis of all the dihedral angles in the atomistic simulations showed a strong sequence-
dependence of the distributions, leading us to adapt the original Head-Gordon model to achieve a
more accurate description of our collagen-like proteins. From our simulations, we concluded that
the high concentration of proline randomly spread in the sequence plays an important role on the
stiffness of the dihedral angles between four consecutive C and therefore proline must be taken
into account as a separated flavour. In this way, we characterized the dihedrals distributions using
four flavours: hydrophilic (L), hydrophobic (B), neutral (N) and proline (P). The relation between the
amino acids present in the collagen-like and their four letters minimalist codes are given in Table 2.2.
After taking the negative logarithm of the dihedral distributions, we fitted them either with cosine
expansion (6k=0Akcos
k()
) or parabolic function (B012B1 (0)
2) according to the position of
the proline. The results of the observations of the dihedrals stiffness can be summarized in the table
below, whegre the distributions were divided in groups according to their main characteristcs. The
first group, where the proline is not present, shows a flexible and smooth logarithmic distribution
of the dihedral angles. The second group has a proline at the third position, and it is observed that it
makes the dihedral angles stiff and the logarithmic distribution is therefore very stiff with one minima.
The third group has a proline at the second position and, despite of its stiffness, it still can be fitted
by a cosine expansion. The fourth and last group has proline in one or both flanks, and it makes the
dihedral angles very flexible and the distribution is, therefore, smooth.
Analyzing the table 3.1 and the figs. 3.4 above, it is possible to infer some conclusions about the
role that Proline plays in the dihedrals distributions:
1. Proline makes the dihedral angles stiffer when it is on the second or third position from the first
7/31/2019 Dissertation Adam
20/29
7/31/2019 Dissertation Adam
21/29
18
3.4 Dependence of the Rg with the number of residues
We then calculated the radius of gyration from the output of a CG simulation runned in CM 3D
for many collagen sizes and plotted in fig. 3.6 a logarithm scale curve for Rg as a function of the
number of residues. There is a very good agreement with the experimental value for 400 residueswithin the statistical error, which serves to validate the adapted HG model for the collagen-part block-
copolymer. Calculating the slope of the curve, we can see the dependence of the radius of gyration
with the number of molecules to be Rg = 1.391(N)0.528 in good agreement with the Florys exponent
(0.583) [62] in a good solvent (which means that the particles affectively repel each other), where N
denotes the number of residues.
Figure 3.6: Logarithm dependence between the radius of gyration (in nm) and the number of residues
on the collagen-like sequence. The simulation values are plotted with the experimental result for 400
residues sequence and fitted with a linear function.
7/31/2019 Dissertation Adam
22/29
19
4 Conclusions and Future Perspectives
Finally, we can conclude that the adapted HG model developed for the collagen-like protein
can predict the experimentally observed order parameter value. Therefore, as stated in the begin,
it is confirmed that the high concentrations of proline and the charged/hydrophilic residues in the
sequence play an important role on avoiding the structure to folds into any specific state, but instead
retains its randomness. The radius of gyration has a very good agreement with experiments, behaving
in a logarithmic dependence with the number of residues and providing a good value for the Flory
expoenent.
In the future, this adapted collagen-like CG model will be combined with the adaped CG model
previously developed for the silk-part to be applied for the whole collagen-silk-collagen block copoly-
mer. Thus, it will enable us to study the effect of the collagen-like on silk-like block folding and
self-assembling.
7/31/2019 Dissertation Adam
23/29
20
Acknowledgements
Many people contributed to the accomplishment of this work. I pay here special attention to some
of them, not neccerily the most important, but the ones who were essential, in precise moments, to
consolidate this achievement.
First of all I thank God, for having given me the ability to learn and understand, always supplying
me with vitality to face the challenges and keeping up achieving my goals, never allowing me to
surrender, but instead keeping me humble.
I thank my supervisor Peter Bolhuis, for giving me the opportunity to start this Msc. project at the
University of Amsterdam. I also thank Marieke Schor, who co-supervised me during the project, and
made my life much easier with your expertise on protein folding and coarse-grained simulation. I also
thank the Molsim group, Bernd, Francesco, Anna, Grisell, Murat, Wolfgang, Rosanne and Zerihum,
my friend, with whom I had a great time in the course of this project. I also acknowledge Sara cluster
for the computer power provided for the simulations.
I also thank my friends in Amsterdam Pedro, Dimas, Max, Igor, Raquel, Vinicius, Girry, Anthony,
Adrien, and many other students, thanks you all for the great time we had here in Amsterdam. To
my friends in Lyon Roberto, Diego, Rodrigo, Franck, Dorian, Jakub, Alex, Aion, Jana, that made that
short stay in France one of the best periods in my life.
I would like to thank the coordinators of the AtoSim Programme for accepting my application to
this course and Erasmus Mundus for the scholarship.
Finally, to all of them who contributed directly or indirectly to the accomplishment of this project.
Thank you very much!
7/31/2019 Dissertation Adam
24/29
21
APPENDIX A -- Dihedral Angles
Here we present all the dihedral angles distributions analized in the collagen-like sequence as
well as the fitting parameters. It can be observed that some of them have almost the same behavior
and can be tabulated in four categories defined by the position of the proline. These groups are shown
in the table 3.1. All the sequences are listed in the figures subsequent, with the fitting parameters atthe captions.
Figure A.1: The fitting coefficients are NNLN: A0=-3.34, A1=1.51, A2=2.19, A3=-1.22, A4=-3.34,
A5=-0.69; LLNL: A0=-3.31, A1=-1.99, A2=-0.64, A3=2.17, A4=-0.14, A5=-0.69; NLLN: A0=-4.25,
A1=0.69, A2=2.27, A3=-0.29, A4=-1.32, A5=0.36.
Figure A.2: The fitting coefficients are BNNL: A0=-3.98, A1=0.63, A2=-0.42, A3=-1.98, A4=0.70,
A5=0.89; NLNB: A0=-1.61, A1=-0.33, A2=-4.59, A3=2.08, A4=1.73, A5=-0.81; PBNL: A0=-2.68,
A1=0.08, A2=1.72, A3=0.55, A4=0.33, A5=-0.11.
7/31/2019 Dissertation Adam
25/29
22
Figure A.3: The fitting coefficients are LNPL - NNPL: B0=-5.81, B1=0.051, 0=-50.24; NNPN:
B0=-6.73, B1=0.23, 0=-109.86; NBPN: B0=-6.54, B1=0.28, 0=-115.27.
Figure A.4: The fitting coefficients are BPNL: A0=-2.34, A1=-2.73, A2=-1.08, A3=10.75, A4=-12.81,
A5=5.07; LPNL - LPNN: A0=-2.33, A1=-0.23, A2=-2.84, A3=-0.02; LNLP: A0=-3.73, A1=-2.24,
A2=1.02, A3=1.02, A4=-1.41, A5=2.03.
Figure A.5: The fitting coefficients are NLNP: A0=-4.59, A1=-1.02, A2=2.63, A3=-0.18, A4=-0.43,
A5=0.91; LLNP: A0=-1.61, A1=-0.33, A2=-4.59, A3=2.08, A4=1.73, A5=-0.81; PNLP: A0=-3.72,
A1=-2.25, A2=1.02, A3=1.01, A4=-1.41, A5=2.03.
7/31/2019 Dissertation Adam
26/29
23
Bibliography
[1] I. W. Lyo, P. Avouris, Field-Induced Nanometer-Scale to Atomic-Scale Manipulation of Silicon
Surfaces with the Stm. Science 253 173 (1991).
[2] J. Cappello, J. Crissman, M. Dorman, M. Mikolajczak, G. Textor, M. Marquet and F. Ferrari,
Genetic Engineering of Structural Protein Polymers. Biotechnol. Prog. 6, 198 (1990).
[3] M. Haider, Z. Megeed and H. Ghandehari, Genetically engineered polymers: status and
prospects for controlled release. J. Control. Rel. 95, 1 (2004).
[4] R. Langer and D. A. Tirrell Designing materials for biology and medicine. Nature, 428, 487(2004).
[5] G. A. Silva, C. Czeisler, K. L. Niece, E. Beniash, D. A. Harrington, J. A. Kessler and S. I.
Stupp, Selective Differentiation of Neural Progenitor Cells by HighEpitope Density Nanofibers.
Science, 303, 1352 (2004).
[6] D.W. Urry, Elastic molecular machines in metabolism and soft-tissue restoration. Trends
Biotechnol. 17, 249 (1999).
[7] D. A. Harrington, E. Y. Cheng, M. O. Guler, L. K. Lee, J. L. Donovan, R. C. Claussen, S. I. Stupp
Branched peptide-amphiphiles as self-assembling coatings for tissue engineering scaffolds. J.Biom. Mat. Res. A, 78A, 157 (2006).
[8] J. Cappello, H. Ghandehari, Engineered Protein Polymers for Drug Delivery and Biomedical
Applications. Adv. Drug Deliv. Rev. 54, 1053 (2002).
[9] D. Chitkara, A. Shikanov, N. Kumar, A. J. Domb, Biodegradable Injectable In Situ Depot-
Forming Drug Delivery Systems. Macromol. Biosc. 6, 977 (2006).
[10] C. Parka, J. Yoonb and E. L. Thomas. Enabling nanotechnology with self assembled block
copolymer patterns. Polymer 44, 6725 (2003).
[11] A. A. Martens, Silk-Collagen-like Block Copolymers with Charged Blocks, self-assembly intonanosized ribbons and macroscopic gels. PhD Thesis, Wageningen Universiteit, The Nether-
lands (2008).
[12] M. W. T. Werten, W. H. Wisselink, T. J. J. van den Bosch, E. C. de Bruin and F. A. de Wolf,
Secreted production of a custom-designed, highly hydrophilic gelatin in Pichia pastoris. Protein
Engineering 14, 447 (2001).
[13] M. T. Krejchi, E. D. T. Atkins, A. J. Waddon, M. J. Fournier, T. L. Mason and D. A. Tirrell,
Chemical Sequence Control Of Sheet Assembly In Macromolecular Crystals Of PeriodicPolypeptides. Science 265, 1427 (1994).
[14] M. Schor, B. Ensing and P. G. Bolhuis, A simple coarse-grained model for self-assembling silk-
like protein fibers.Soft Matter 5, 2658 (2009). DOI: 10.1039/b902952d
7/31/2019 Dissertation Adam
27/29
24
[15] M. W. T. Werten, T. J. van den Bosch, R. D. Wind, H. Mooibroek and F. A. de WolfHigh-yield
secretion of recombinant gelatins by Pichia pastoris. Yeast 15, 1087 (1999).
[16] A. A. Martens, G. Portale, M. W. T. Werten, R. J. de Vries, G. Eggink, M. A. C. Stuart and F. A.
de Wolf, Triblock Protein Copolymers Forming Supramolecular Nanotapes and pH-Responsive
Gels. Macromol. 42 1002 (2009).
[17] M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids, Oxford University Press.
[18] D. Frenkel and B. Smith, Understanding Molecular Simulation - From Algorithms to Applica-
tions, Academic Press.
[19] D. Bhella, A. Ralph and R. P. Yeo, Conformational Flexibility in Recombinant Measles Virus
Nucleocapsids Visualised by Cryo-negative Stain Electron Microscopy and Real-space Helical
Reconstruction. J. Mol. Biol. 340, 319 (2004).
[20] M. Levitt, A simplified representation of protein conformations for rapid simulation of protein
folding. J. Mol. Biol, 104, 59 (1976).
[21] M. M. Tirion, Large amplitude elastic motions in proteins from a single-parameter, atomic anal-
ysis. Phys Rev Lett 77, 1905 (1996).
[22] S. Kundy, R. L. Jernigan, Molecular mechanism of domain swapping in proteins: an analysis of
slower motions. Biophys. J. 86, 3846 (2004).
[23] Y. Ueda, H. Taketomi and N. Go, Studies on protein folding, unfolding, and fluctuations by
computer simulation. II. A. Three-dimensional lattice model of lysozyme. Biopolymers, 17, 1531
(1978).
[24] J. A. McCammon, S. H. Northrup, M. Karplus, R. M. Levy. Helix-coil transitions in a simplepolypeptide model. Biopol. 19, 2033 (1980).
[25] S. Brown, N. J. Fawzi, and T. Head-Gordon, Coarse-grained sequences or protein folding and
design. Proc. Natl. Acad. Sci. USA, 2003, 100, 10712-10717.
[26] N. L. Fawzi, E. H. Yap, Y. Okabe, K. L. Kohlstedt, S. P. Brown and T. Head-Gordon, Contrasting
Disease and Nondisease Protein Aggregation by Molecular Simulation. Acc. Chem. Res., 2008,
41 (8), 10371047.
[27] I. Bahar, R. L. Jernigan, Inter-residue potentials in globular proteins and the dominance of
highly specific hydrophilic interactions at close separation. J. Mol. Biol. 266, 195 (1997).
[28] A. V. Smith, C. K. Hall, helix formation: discontinuous molecular dynamics on anintermediate-resolution protein model. Proteins 44, 344 (2001).
[29] A. V. Smith, C. K. Hall, Assembly of a tetrameric a-helical bundle: computer simulations on an
intermediate-resolution protein model. Proteins 44, 376 (2001).
[30] Hess, B., Kutzner, C., van der Spoel, D. and Lindahl, E. (2008) GROMACS 4: Algorithms for
Highly Efficient, Load-Balanced, and Scalable Molecular Simulation, J. Chem. Theory Com-
put., 4, 435-447.
[31] http://www.cmm.upenn.edu/resources/indexsoft.html
[32] R. W. Hockney,S. P. Goel, J. Eastwood, Quiet highresolution computer models of a plasma. J.
Comp. Phys. 14, 148 (1974).
7/31/2019 Dissertation Adam
28/29
25
[33] R. W. Hockney and J. W. Eastwood, Computer Simulations Using Particles. McGraw Hill, New
York (1981).
[34] S. Auerbach and A. Friedman. Long-term behaviour of numerically computed orbits: Small and
intermediate timestep analysis of one-dimensional systems. J. Comput. Phys. 93(1), 189 (1991).
[35] W. Wang, O. Donini, C. M. Reyes, P. A. Kollman1, BIOMOLECULAR SIMULATIONS: Re-cent Developments in Force Fields, Simulations of Enzyme Catalysis, Protein-Ligand, Protein-
Protein, and Protein-Nucleic Acid Noncovalent Interactions. Annu. Rev. Biophiys. Biom. 30,
211 (2001).
[36] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, D. M. Ferguson, D. C.
Spellmeyer, T. Fox, J. W. Caldwell, P. A. Kollman, A Second Generation Force Field for the
Simulation of Proteins, Nucleic Acids, and Organic Molecules. J. Am. Chem. Soc. 117, 5179
(1995).
[37] A. D. MacKerell Jr., D. Bashford, M. Bellott, R. L. Dunbrack Jr., J. D. Evanseck, M. J. Field, S.
Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir, K. Kuczera, F. T. K. Lau, C.Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher, B. Roux, M. Schlenkrich,
J. C. Smith, R. Stote, J. Straub, M. Watanabe, J. Wiorkiewicz-Kuczera, D. Yin and M. Karplus,
All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J.
Phys. Chem. B 102 3586 (1998).
[38] M. Christen, P. H. Hnenberger, D. Bakowies, R. Baron, R. Brgi, D. P. Geerke, T. N. Heinz,
M. A. Kastenholz, V. Krutler, C. Oostenbrink, C. Peter, D. Trzesniak, W. F. van Gunsteren,
The GROMOS software for biomolecular simulation: GROMOS05. J. Comput. Chem. 26 1719
(2005).
[39] G. A. Kaminski, R. A. Friesner J. Tirado-Rives and W. L. Jorgensen, Evaluation andReparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate
Quantum Chemical Calculations on Peptides, J. Phys. Chem. B 105 6474 (2001).
[40] R. Koradi, M. Billeter and K. Wuthrich, MOLMOL: A program for display and analysis of
macromolecular structures. J. Mol. Phys., 14, 51 (1996).
[41] V. Humblot, C. Methivier and C. M. Pradier. Adsorption of L-Lysine on Cu(110): A RAIRS Study
from UHV to the Liquid Phase. Lagmuir 22, 3089 (2006).
[42] D. van der Spoel, P. J. van Maaren and H. J. C. Berendsen, A systematic study of water models
for molecular simulation: Derivation of water models optimized for use with a reaction field. J.
Chem. Phys. 108, 10220 (1998).
[43] B. Hess, H. Bekker, H. J. C. Berendsen, J. G. E. M. Fraaije, LINCS: A Linear Constraint Solver
for Molecular Simulations. J. Comp. Chem. 18, 1463 (1997).
[44] T. Darden, D. York, L. Pedersen, Particle mesh Ewald: An N-log(N) method for Ewald sums in
large systems. J. Chem. Phys. 98, 10089 (1993).
[45] U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, L. G. Pedersen, A smooth particle
mesh ewald potential. J. Chem. Phys. 103, 8577 (1995).
[46] K. Zimmerman, All purpose molecular mechanics simulator and energy minimizer. J. Comp.Chem. 12, 310 (1991).
7/31/2019 Dissertation Adam
29/29
26
[47] M. arrinello, A. Rahman, Polymorphic transitions in single crystals: A new molecular dynamics
method. J. Appl. Phys. 52, 7182 (1981).
[48] Nose, S., Klein, M. L. Constant pressure molecular dynamics for molecular systems. Mol. Phys.
50:10551076, 1983.
[49] S. Nose, A unified formulation of the constant temperature molecular dynamics methods. J.Chem. Phys. 81, 511 (1984).
[50] W. G. Hoover, Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A 31,
1695 (1985).
[51] W. G. Hoover, Constant-pressure equations of motion. Phys. Rev. A, 34, 2499 (1986).
[52] J. Juraszek and P. G. Bolhuis, Sampling the multiple folding mechanisms of Trp-cage in explicit
solvent. Proc. Natl. Acad. Sci. 103, 15859 (2006).
[53] Z. Guo and D. Thirumalai, Kinetics and Thermodynamics of Folding of a de novo Designed four
Helix Bundle J. Mol. Biol. 263, 323 (1996).
[54] H.C.J. Andersen, Rattle: A velocity version of the shake algorithm for molecular dynamics
calculations. J. Comput. Phys. 52, 24 (1983).
[55] A. V. Smith and C. K. Hal, Protein refolding Versus aggregation: computer simulations on an
intermediate-resolution protein model. J. Mol. Biol., 2001, 312, 187-202.
[56] V. Tozzini, Coarse-grained models for proteins. Curr. Opin. Struct. Biol., 2005, 15, 144-50.
[57] H. M. Knig and A. F. M. Kilbinger, Learning from Nature: -Sheet-Mimicking Copolymers. GetOrganized. Angew. Chem. Int. Ed., 2007, 46, 8334-8340.
[58] Nomenclature and Symbolism for Amino Acids and Peptides.
IUPAC-IUB Joint Commission on Biochemical Nomenclature. 1983.
http://www.chem.qmul.ac.uk/iupac/AminoAcid/AA1n2.html. Retrieved on 2008-11-17.
[59] I. W. Lyo and P. Avouris. Field-Induced Nanometer- to Atomic-Scale Manipulation of Silicon
Surfaces with the STM. Science 253, 173 (1991).
[60] Galo J. de A. A. Soler-Illia, Clment Sanchez, Bndicte Lebeau, and Jol Patarin, Chemical Strate-
gies To Design Textured Materials: from Microporous and Mesoporous Oxides to Nanonetworks
and Hierarchical Structures. Chem. Rev. 102, 4093 (2002).
[61] M. W. T. Werten, W. H. Wisselink, T. J. Jansen-van den Bosch, E. C. de Bruin and F. A. de Wolf,
Secreted production of a custom-designed, highly hydrophilic gelatin in Pichia pastoris. Protein
Engineering 14(6), 447 (2001).
[62] P. J. Flory, Principles of Polymer Chemistry. Cornell University Press, Ithaca, New York (1953).
[63] S. Park, F. Khalili-Araghi, E. Tajkhorsid and K.Schulten, Free energy calculation from steered
molecular dynamics simulations using Jarzynskis equality J. Chem. Phys. 119, 3559 (2003).
[64] S. Park and K.Schulten, Calculating potentials of mean force from steered molecular dynamics
simulations. J. Chem. Phys. 120, 5946 (2003).