+ All Categories
Home > Documents > Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä...

Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä...

Date post: 20-Jan-2018
Category:
Upload: chester-hubbard
View: 214 times
Download: 0 times
Share this document with a friend
Description:
Ligand Flexibility Studies Procedure ä Multiple sampling techniques chosen: Catalyst-best / Catalyst-fast / Confort / Omega / DOCK ä Variety of sampling levels ä Starting from Concord structure, conformers generated and superimposed onto pdb ligand conformation. ä Conformation with lowest heavy atom RMS to used as quality measure
22
Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols Structure-Based Virtual Screening (SBVS) is a proven Structure-Based Virtual Screening (SBVS) is a proven technique for lead discovery technique for lead discovery Still many areas for improvement Still many areas for improvement Many efforts focussed on scoring function Many efforts focussed on scoring function Often with little consideration of the assumptions underpinning SBVS Often with little consideration of the assumptions underpinning SBVS Here we consider a number of these processes in detail Here we consider a number of these processes in detail from the perspective of our primary SBVS tool (DOCK) from the perspective of our primary SBVS tool (DOCK) Ligand conformational search protocols Ligand conformational search protocols Varying site points definitions Varying site points definitions Alteration of sampling variables Alteration of sampling variables Determine their impact on hit enrichment and search speed Determine their impact on hit enrichment and search speed Analyze implications for future research Analyze implications for future research
Transcript
Page 1: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Know More Before You Score: An Analysis of Structure-Based Virtual

Screening Protocols

Structure-Based Virtual Screening (SBVS) is a proven technique for Structure-Based Virtual Screening (SBVS) is a proven technique for lead discoverylead discovery

Still many areas for improvementStill many areas for improvement Many efforts focussed on scoring functionMany efforts focussed on scoring function

Often with little consideration of the assumptions underpinning SBVSOften with little consideration of the assumptions underpinning SBVS Here we consider a number of these processes in detail from the Here we consider a number of these processes in detail from the

perspective of our primary SBVS tool (DOCK) perspective of our primary SBVS tool (DOCK) Ligand conformational search protocolsLigand conformational search protocols Varying site points definitionsVarying site points definitions Alteration of sampling variablesAlteration of sampling variables

Determine their impact on hit enrichment and search speedDetermine their impact on hit enrichment and search speed Analyze implications for future researchAnalyze implications for future research

Page 2: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Ligand Flexibility StudiesStrategy

SBVS CPU intensiveSBVS CPU intensive Conformational searching of ligand clearly importantConformational searching of ligand clearly important

Sampling limited to allow search completion in reasonable time frameSampling limited to allow search completion in reasonable time frame Test required to compare different conformational sampling Test required to compare different conformational sampling

methodsmethods Ability to reproduce bioactive conformation testedAbility to reproduce bioactive conformation tested

145 ligands from a 1995 analysis of pdb complexes (Gschwend UCSF 145 ligands from a 1995 analysis of pdb complexes (Gschwend UCSF unpublished)unpublished)

30 compound subset chosen for analysis- selection based on visual and 30 compound subset chosen for analysis- selection based on visual and numerical inspection of diversity in ligand flexibility and functionality numerical inspection of diversity in ligand flexibility and functionality

Relatively small sample of molecules used, many peptidic in natureRelatively small sample of molecules used, many peptidic in nature Peptidic moieties are among the better parameterized systems, so this is Peptidic moieties are among the better parameterized systems, so this is

in some ways a best case scenario in some ways a best case scenario

Page 3: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Ligand Flexibility StudiesProcedure

Multiple sampling techniques chosen:Multiple sampling techniques chosen:Catalyst-best / Catalyst-fast / Confort / Omega / DOCKCatalyst-best / Catalyst-fast / Confort / Omega / DOCK

Variety of sampling levels Variety of sampling levels Starting from Concord structure, conformers generated Starting from Concord structure, conformers generated

and superimposed onto pdb ligand conformation. and superimposed onto pdb ligand conformation. Conformation with lowest heavy atom RMS to used as quality Conformation with lowest heavy atom RMS to used as quality

measure measure

Page 4: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Ligand Flexibility StudiesSearch Settings Employed

Dock - Dock - conformation_cutoff_factor=3/5/10 clash_overlapconformation_cutoff_factor=3/5/10 clash_overlap==0.7 times 0.7 times vdW radius for clash overlap with customized rules for bond increment vdW radius for clash overlap with customized rules for bond increment settingssettings

Confort - Confort - Rough (0.10 kcal) convergence, diverse conformer selection, Rough (0.10 kcal) convergence, diverse conformer selection, boat ring search on - sampling at 5/10 confs per single bond + 500 max boat ring search on - sampling at 5/10 confs per single bond + 500 max

Catalyst- Best/Fast Catalyst- Best/Fast Default settings - sampling at Default settings - sampling at 5/10 confs per 5/10 confs per single bond + 100 max single bond + 100 max

Omega: Omega: Defaults +Defaults + RMS_CUTOFF=1.0, GP_ENERGY_WINDOW=5.0, RMS_CUTOFF=1.0, GP_ENERGY_WINDOW=5.0, sampling at 100 maxsampling at 100 max

In addition Concord generated and Sybyl minimized ligand xray structures In addition Concord generated and Sybyl minimized ligand xray structures also analyzed as “controls”also analyzed as “controls”

Page 5: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Ligand Flexibility Results Overall Performance - RMS/ Rank

0.76 0.81 0.88 0.92 0.870.97 0.96 0.99 0.99 1.00 1.03 1.13

1.76

0.002.004.006.008.00

10.0012.0014.00

Min

xra

yCO

NFOR

T 50

0FA

ST 1

00CO

NFOR

T 10

BEST

100

FAST

5DO

CK 1

0BE

ST 5

OM

EGA

100

CONF

ORT

5DO

CK 5

DOCK

3Co

ncor

d

Ave

rage

inte

rnal

rank

0.000.200.400.600.801.001.201.401.601.80

Ave

rage

RM

S de

viat

ion

Average internal rankAverage rms deviation

Page 6: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Ligand Flexibility ResultsPerformance vs Flexibility

0

0.5

1

1.5

2

2.5

Ave

rage

RM

S D

evia

tion

3 to 5 single bonds (15)6 to 8 single bonds (7)9 to 14 single bonds (8)

Page 7: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Ligand Flexibility Results The Pain Gain Ratio

Does extra noise introduced to scoring functions outweigh this Does extra noise introduced to scoring functions outweigh this improvement? Is it worth the extra CPU?improvement? Is it worth the extra CPU?

425

0.81 0.87 0.88 0.92 0.96 0.97 1.03 1.125

0102030405060708090

100

Search Types

Con

form

atio

ns /

mol

ecul

e

0.000.200.400.600.801.001.201.401.601.80

RM

S de

viat

ion

Average conformations / moleculeAverage rms deviation

Page 8: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Ligand Flexibility ResultsVisual Analysis

Even at lower RMS, deviation in hydrogen positions an issueEven at lower RMS, deviation in hydrogen positions an issue As RMS rises (0.9) we begin to see more significant deviations in heavy As RMS rises (0.9) we begin to see more significant deviations in heavy

atom positions - large enough to possibly prove troublesome to atom positions - large enough to possibly prove troublesome to standard force fieldsstandard force fields

RMS=0.65 RMS=0.90

Page 9: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Ligand Flexibility ResultsVisual Analysis

As RMS rises further, hydrogen bond mapping begins to partially break downAs RMS rises further, hydrogen bond mapping begins to partially break down Significant deviation begins to be seen although general shape Significant deviation begins to be seen although general shape

complementarity is still reasonablecomplementarity is still reasonable DOCKing tricky, pharmacophore searches possible with loose tolerances, although DOCKing tricky, pharmacophore searches possible with loose tolerances, although

site point vector definitions (DISCO / Catalyst) a no nosite point vector definitions (DISCO / Catalyst) a no no

RMS=2.19RMS=1.55

Page 10: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Ligand FlexibilityConclusions

At current sampling levels used in virtual screeningAt current sampling levels used in virtual screening Rough search techniques perform comparably to more exhaustive methodsRough search techniques perform comparably to more exhaustive methods

Dock performs quite well, and Fast does slightly better than comparable Best runDock performs quite well, and Fast does slightly better than comparable Best run Results highlight the need for “forgiving” scoring functions and pharmacophore Results highlight the need for “forgiving” scoring functions and pharmacophore

constraint tolerances (especially for flexible molecules)constraint tolerances (especially for flexible molecules) Generating function directly from crystal structure data may not be optimumGenerating function directly from crystal structure data may not be optimum

Use the conformation closest to the biologically relevant structure with chosen sampling Use the conformation closest to the biologically relevant structure with chosen sampling techniquetechnique

May be better to ignore more flexible molecules when possible (~>8 bonds)May be better to ignore more flexible molecules when possible (~>8 bonds)

Analysis of more extensive data set might provide basis for determining if Analysis of more extensive data set might provide basis for determining if optimum sampling settings exist (Best/Omega/Confort)optimum sampling settings exist (Best/Omega/Confort) Coarseness of poling values for exampleCoarseness of poling values for example

Page 11: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Structure-Based Search ProtocolsAn Analysis of DOCK

Working within current DOCK paradigm, what search Working within current DOCK paradigm, what search protocols provide optimum search criterion?protocols provide optimum search criterion? Site point definitionsSite point definitions Alteration of sampling variablesAlteration of sampling variables Different scoring grids Different scoring grids

Comparisons illustrated for 5 test systems with Comparisons illustrated for 5 test systems with diverse active data sets diverse active data sets

Analysis based on ranking within list that includes Analysis based on ranking within list that includes ~10000 “noise” compounds ~10000 “noise” compounds

““Random” selection within bounds of size and flexibility Random” selection within bounds of size and flexibility distribution seen in in-house databasedistribution seen in in-house database

Page 12: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Structure-Based Search ProtocolsDOCK variables

Contains many variables that effect performance Contains many variables that effect performance Ligand sampling within the site being the primary variantLigand sampling within the site being the primary variant

nodesnodes 3/4 3/4distance_tolerance 0.5/1.0distance_tolerance 0.5/1.0distance_minimum 3.0distance_minimum 3.0bump_filter 4bump_filter 4conformation_cutoff_factor 5conformation_cutoff_factor 5clash_overlap 0.7clash_overlap 0.7maximum_orientations 500/5000maximum_orientations 500/5000

Page 13: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Structure-Based Search ProtocolsDOCK and pharmacophoric constraints

It is possible to assign fairly sophisticated pharmacophoric It is possible to assign fairly sophisticated pharmacophoric (henceforth also known as chemical) definitions(henceforth also known as chemical) definitions

name acidname acid# deprotonated carboxyl# deprotonated carboxyldefinition O.co2 ( C )definition O.co2 ( C )# tetrazole# tetrazoledefinition N.pl3 ( H ) ( N.2 ( N.2 ( N.2 ( C.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( N.2 ( C.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ( N.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ( N.2 ) ) ) )definition N.2 ( N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ) ) ) )definition N.2 ( N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ) ) ) )# acyl sulphonamide # acyl sulphonamide definition N.am ( S ( 2 O.2 ) ) ( C.2 ( O.2 ) )definition N.am ( S ( 2 O.2 ) ) ( C.2 ( O.2 ) )definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )definition O.2 ( S ( O.2 ) ( N.am ( H ) ( C.2 ( O.2 ) ) )definition O.2 ( S ( O.2 ) ( N.am ( H ) ( C.2 ( O.2 ) ) )

Current types:heavy atom

donor

acceptor

hydrophobe

aromatic

aromatic_hydrophobic

acid

base

donor_and_acceptor

special (e.g. metal chelator)

Page 14: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Structure-Based Search ProtocolsSite Points Used in Kinase Search

Region 3

Hydrophobic /

Any heavy atom

Region 1 ( + 4)

acceptor / donor

Region 2

Hydrophobic + 2 donors

Page 15: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Structure-Based Search ProtocolsTest Sets and Site Points Used

Sphgen used to generate site points for “generic” DOCK searchesSphgen used to generate site points for “generic” DOCK searches Pharmacophore points derived from a mixture of non-data set bound ligands and in-house Pharmacophore points derived from a mixture of non-data set bound ligands and in-house

programs that process GRID maps and Connolly surfaces (plus plenty of human intervention)programs that process GRID maps and Connolly surfaces (plus plenty of human intervention)

Active data sets broken down into chemotypes to prevent the problem of common analogue Active data sets broken down into chemotypes to prevent the problem of common analogue bias - an under appreciated issue in all validationsbias - an under appreciated issue in all validations

Target Active ChemotypeDefinitions

PharmacophorePoints / Critical

Regions2 Serineproteases

P1 substituent / P1-P4 linker substituent

P1 (base /hydrophobe) + P4(hydrophobe) pockets

2 Fatty acidbindingproteins

Core linking acidmoiety to remainingsubstituents

Acid binding pocket

Kinase Moiety mimicingadenine / main coreof molecules

Adenine bindingpocket(donor/acceptor) [+rear hydrophobicpocket]

Page 16: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Results - kinaseNo. of hits after 50% of chemotypes located

by at least one search ( 400 compounds processed from 96 actives / 18 chemotypes)

Search type key: a_b_c(_d) e.g. cc_f_c_3 ***** NOTE poor 1 crit perform - premature terminationa: s=sphgen / c=critical / cc=chemical-critical b: s=single conf / f=flexi dockc: m=mm score / c = contact score d: = nXcr(a.b) - n node search with X critical regions and a.b distance tolerance

05

10152025

Search Type

Com

poun

ds

0246810

Che

mot

ypes

ChemotypesCompounds

Page 17: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Results - fatty acid binding protein 2 No. of hits after 7 chemotypes located by at least one search ( 500

compounds processed from 28 actives / 8 chemotypes)

Search type key: a_b_c(_d) e.g. cc_f_c_3 a: s=sphgen / c=critical / cc=chemical-critical b: s=single conf / f=flexi dockc: m=mm score / c = contact score d: 3=3 node search / 1.0=1.0 distance tolerance / 1.02crit/32crit = 1.0 distance tolerance or 3 node search with 2nd critical region ( hydrophobic binding pocket) / esp = electrostatic potential included in mm score / acid=all non acids removed from search lists

0

5

10

15

20

Search Types

Com

poun

ds

0

2

4

6

8

Che

mot

ypes

ChemotypesCompounds

Missing chemotype a citrazinate - not covered in chemical definitions -easy to fix - another advantage over electrostatics

Page 18: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Results-OverallCompounds processed for 50% Chemotype Coverage for All Systems

Search type key: a_b_c(_d) e.g. cc_f_c_3

a: s=sphgen / c=critical / cc=chemical-critical b: s=single conf / f=flexi dockc: m=mm score / c = contact score d: 3=3 node search / 1.0=1.0 distance tolerance

s_s_

cs_

s_m

c_s_

cc_

s_m

cc_s

_ccc

_s_m

s_f_

cs_

f_m

c_f_

cc_

f_m

cc_f

_ccc

_f_m

cc_f

_c_3

cc_f

_c_1

.0

0

200

400

600

800

1000

1200

1400C

ompo

unds

Search TypeBest hit rateMean hit rateWorst hit rate

Page 19: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Results Analysis: DOCK Scoring Functions - Shape

Contact generally a little more robust than vdW non bonded Contact generally a little more robust than vdW non bonded functionfunction More controllable bump penalty (no rMore controllable bump penalty (no rnn repulsion) repulsion)

Better able to deal with docking inaccuraciesBetter able to deal with docking inaccuracies More important in tight binding sites with pharmacophore constraints and flexible More important in tight binding sites with pharmacophore constraints and flexible

moleculesmolecules controllable max. vdW repulsion value mitigates this somwhatcontrollable max. vdW repulsion value mitigates this somwhat

Still useful with less flexible molecules for a more rigorous shape complementarity Still useful with less flexible molecules for a more rigorous shape complementarity scorescore

Page 20: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Results Analysis: DOCK Scoring Functions - H Bonding

ElectrostaticsElectrostatics Many intuitive reasons for caution in explicit treatmentMany intuitive reasons for caution in explicit treatment

Poor charge models / coarse conformations /inability to control ionization Poor charge models / coarse conformations /inability to control ionization statesstates

Pharmacophore centers provides better vehicle for h bonding descriptionPharmacophore centers provides better vehicle for h bonding description Spread points to allow for search approximations / set critical regions based Spread points to allow for search approximations / set critical regions based

on biological and structural information / faster searches (30-100 times)on biological and structural information / faster searches (30-100 times)

Page 21: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

For maximum impact impact current methodology, scoring functions should either

Be designed/utilized with these limitations in mind Forgiving / targeted at less flexible molecules

Improve results by such a high degree that additional sampling (and CPU) is warranted

In the mean time, utility of pharmacophoric hypotheses {critical region(s) with pharmacophoric constraints} is clear

Better results faster / less sensitivity to model coarseness / allows constraints based on known biology

Conclusions

Page 22: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based…

Acknowledgements

Thank youThank you to my BMS CADD colleagues to my BMS CADD colleagues


Recommended