Structure based computer aided drug design

Prof. Thanh N. Truong

University of Utah

Astonis LLC

Institute of Computational Science and Technology

The Drug Discovery Process

Drug Target Identification

Target Validation

Lead Identification

LeadOptimization

Pre-clinical &Clinical

DevelopmentFDA Review

It takes about 15 years and costs around 880 millions USD, ~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon 2004) to develop a new drug.

25,000

metabolite

Genomics Facts

� Around 99% of our genes have counterparts in mice

� Our genetic overlap with chimpanzees is about 97.5%

� The genetic difference between one person and another is less than 0.1 %

� But because only a few regions of DNA actively encode life functions, the real difference between one person and another is only 0.0003 %

It is becoming increasingly evident that the complexity of biological systems lies at the level of the proteins, and that genomics alone will not suffice to understand these systems.

Structure-based Computer-Aided Drug Design

Drug Target Identification

Target Validation

Lead Identification

LeadOptimization

Pre-clinical &Clinical

DevelopmentFDA Review

� Shorten development time to Lead Identification

� Reduce cost

Past Successes1. HIV protease inhibitor

amprenavir (Agenerase) from Vertex & GSK (Kim et al. 1995)

2. HIV: nelfinavir (Viracept) by Pfizer (& Agouron) (Greer et al. 1994)

3. Influenza neuraminidase inhibitor zanamivir(Relenza) by GSK (Schindler 2000)

Unknown 3D structure for the target proteinKnown 3D structurePDB databank

Target Model

Homology modeling orProtein Structure Prediction

Docking Simulation

MD simulations

TrajectoriesCluster Analysis

Scoring

Analysis

Homology ModelingPrediction of protein tertiary structure from the known sequence

Docking

Determine the optimal binding structure of a ligand (a drug candidate, a small molecule) to a receptor (a drug target, a protein or DNA) and quantify the strength of the ligand-receptor interaction.

The Problem:

1. Where the ligand will bind?2. How will it bind?3. How strong?4. Why?5. What make a ligand binds

to the receptor better than the others?

6. ????

The Challenge

� Ligand and receptor are conformational flexible.

� Receptor may have more than one possible binding site.

� Weak short-range Interactions: hydrogen bonds, salt bridges, hydrophobic contacts, electrostatics, van der Walls repulsions � Surface complementary.

� Binding affinity is the difference to the uncomplexed state – solvation and desolvation play important role.

� Binding affinity describes an ensemble of complexes not a single one.

Large protein conformation change

Flexibility of residues in the binding site

Bound watersOrientation of Ligand

Binding Affinity

lnbs aG H T S RT K∆ = ∆ − ∆ = −

[ ][ ] [ ]

sa

s s

RLK

R L=

Free energy of binding:

Association equilibrium constant

Enthalpy Entropy

( )solvG RL∆

+

+

bgG∆

( )solvG L∆( )solvG R∆

bsG∆

From the thermodynamic cycle:

( ) ( ) ( ){ }b bs g solv solv solvG G G RL G R G L∆ = ∆ + ∆ − ∆ + ∆

Docking Process

Descriptions of the receptor 3D structure, binding site and ligand

Sampling of the configuration space of the binding complex

Evaluating free energy of binding for scoring

Local/global minimum

Ensemble of protein structures and/or multiple ligands

Multiple binding configurations for a single protein structure and a ligand

Description of Receptor 3D Structure

� Known 3D protein structures from Protein Data Bank (PDB) (http://www.rcsb.org/pdb)

� Locations of hydrogen atoms, bound water molecules, and metal ions are either not known or highly uncertain.

� Identities and locations of some heavy atoms (e.g., ~1/6 of N/O of Asn & Gln, and N/C of His incorrectly assigned in PDB; up to 0.5 Å uncertainty in position)

� Conformational flexibility of proteins is not known

� Homology models from highly similar sequences with known structures

Critical analysis of the receptor structure before docking is needed: resolution, missing residues, bound waters and ions, protonation states, etc.

Descriptions of Binding Site

� Known binding site – PDB database has about 6000 protein-ligand complexes� Atomistic based

o Receptor atomic coordinates and location of a binding box

� Descriptor basedo Surfaceo Volumeo Points & distances, bond vectorso grid and various properties such

as electrostatic potential, hydrophobic moment, polar, nonpolar, atom types, etc

� Unknown binding site� Blind docking with the binding box

cover the entire receptor –computationally expensive

� Better method for finding potential binding sites is needed

Ligand Chemical Space

� National Cancer Institute (NCI) public database (http://129.43.27.140/ncidb2/)

� About 250 K 3D structures

� ZINC public database (http://zinc.docking.org/)

� About 8 million 3D structures

� PubChem public database (http://pubchem.ncbi.nlm.nih.gov/)

� About 19 million entries (but no 3D structures)

� Cambridge Structure database (CSD)

� About 3 million crystal structures

� Chemical Abstract Service (CAS) and SciFinder

� Several other smaller databases …

Atomic partial charges from MM force fields or MO calculations must be added to each molecule for evaluation of the score function

Different Approaches in DockingComplete conformation and configuration space are too large. Different approaches were developed for effective sampling of the receptor-ligand configuration space.

Automated Manual

User interactive force feedbacks through haptic devices

Simulation-basedDescriptor Matching

• Use pattern-recognizing geometric methods to match ligand and receptor site descriptors

• Ligand flexibility is limited• Receptor is rigid• Accuracy is not very good

– not discriminative• Fast

• Use simulation methods to sample the local configuration space: MC-Simulated Annealing, Genetic Algorithm. Must run an ensemble of starting orientations for accurate statistics

• Ligand and protein flexibility can be considered

• Free energy of binding is evaluated

• Accuracy is good• Time consuming• Grid map is often used to

speed up energy evaluations

Focus

MC-Simulated Annealing Method

Randomly change the receptor flexible residues, ligand position, orientation, and/or conformation

Evaluate the new energy, Enew

Enew < Eold ?

Accept the new move with P = exp{-∆E/kbT}

Accept the new move

Enew � EoldReduce the temperature

Naccept or reject > Nlimit

Done

NO

NO

YES

YES

Genetic Algorithm

Living organisms�Made up of cells

�Has the same set of chromosomes (DNA)�Genome: A set of all chromosomes

�Chromosome consists of genes�Genotype: A particular set of genes

�Each gene encodes a protein (a trait)�Each gene has a location in the chromosome (locus)

�Reproduction by cross-over and mutation

Darwin Theory of Evolution

Genetic Algorithm for Docking

x1 y1 z1 φ1 ψ1 ω1 τ1 τ2 τ3 τ4

Gene 1 Gene 2 Gene 3

Position Orientation Torsional angles

Chromosome 1

x2 y2 z2 φ2 ψ2 ω2 τ1’ τ2’ τ3’ τ4’ Chromosome 2

A chromosome is a possible solution: binding position, orientation, and values of all rotatable torsional angles

A cell is a set of possible solutions, i.e. chromosomes. Typical population

= 100-200Fitness Test

Translates genotypes to phenotypes (receptor-ligand complex structures) for binding free energy evaluation.

Select best parentsThose with large negative ∆G binding

Generate new generation� Migration: Move the best genes to the next generation� Cross-over: Exchange a set of genes from one parent chromosome to another. Typical cross-over

rate = 80-90%� Mutation: Randomly change a value of a gene, i.e. position, orientation, or torsional values. Typical

mutation rate = 0.5-1%

Two-point Cross-over Operator

x1 y1 z1 φ1 ψ1 ω1 τ1 τ2 τ3 τ4 Parent 1

x2 y2 z2 φ2 ψ2 ω2 τ1’ τ2’ τ3’ τ4’ Parent 2

x2 y2 z2 φ1 ψ1 ω1 τ1 τ2 τ3 τ4 Child 1

x1 y1 z1 φ2 ψ2 ω2 τ1’ τ2’ τ3’ τ4’ Child 2

Swap positions

Lamarkian Genetic Algorithm -- AutoDock

1. Mutation and cross-breeding to generate new genotype � generate new possible ligand binding configuration

2. Transfer to phenotype to evaluate fitness � forming receptor-ligand configuration.

3. Adapt to the local environment to improve fitness � local minimization.

4. Transfer back to genotype for future generations � save the optimized ligand binding configuration for future generations.

Environmental adaptation of an individual’s phenotypic characteristics acquired during lifetime can become heritable traits � Survival of the fittest.

Environmental adaption

Transfer to genotype for future generations,

i.e. heritable traits.

Morris et al., J. Comp. Chem. 1998, 19, 1639

Genetic Algorithm – Local Search

GA

LS

Scoring Functions

Complex configurations

-Sco

re

GOAL: Fast & Accurate Experimentally observed complex

Force Field based function•Score = -∆Gbinding•Has physical basis•Fast with pre-computed grid

Empirical functionMultivariate regression fit physically motivated structural functions to experimentally known complexes with measured binding affinity

Knowledge-based functionStatistical pair potential derived from known complex structures

Descriptor based functionBased on chemical properties, pharmacaphore, contact, shape complementary

Focus

Force Field Based Scoring Function

bs vdw vdw ele ele hb hb tor tor solv solvG C G C G C G C G C G∆ = ∗ ∆ + ∗ ∆ + ∗ ∆ + ∗ ∆ + ∗ ∆

Coefficients are empirically determined using linear regression analysis from a set of protein-ligand complexes from LPDB with known experimental binding constants.

bsScore G= −∆

Analyses

Energy histogram Clustering analysis

Distribution of binding energies � average binding energy

Distribution of binding modes �different binding sites and ligand

binding orientations

Docking with Science Community Laboratory

Identify a target

Millions of molecules from ZINC database

Docking simulation with AutoDock-Vina

Rank according to binding energy

Date post:	16-May-2015
Category:	Education
Upload:	thanh-truong
View:	1,716 times
Download:	6 times

Structure based computer aided drug design

Education