Date post: | 18-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 0 times |
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
B cell epitopes and predictions
Pernille Haste Andersen,
Ph.d. student
Immunological Bioinformatics
CBS, DTU
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
B cells and antibodies
Antibodies are produced by B lymphocytes (B cells)
Antibodies circulate in the blood
They are referred to as “the first line of defense” against infection
Antibodies play a central role in immunity by attaching to pathogens and recruiting effector systems that kill the invader
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
What is a B cell epitope?
Antibody Fabfragment
B cell epitope
B cell epitopes
Accessible and recognizable structural feature of a pathogen molecule (antigen)
Antibodies are developed to bind the epitope with high affinity by using the complementarity determining regions (CDRs)
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Motivations for prediction of B cell epitopes
Prediction of B cell epitopes can potentially guide experimental epitope mapping
Predictions of antigenicity in proteins can be used for selecting subunits in rational vaccine design
Predictions of B cell epitopes may also be valuable for interpretation of results from experiments based on antibody affinity binding such as ELISA, RIA
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Computational Rational Vaccine Design
>PATHOGEN PROTEINKVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNF
Rational Vaccine Design
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
B cell epitopes, linear or discontinuous?
Classified into linear (~10%) and discontinuous epitopes (~90%)
Databases: AntiJen, IEDB, BciPep, Los Alamos HIV database, Protein Data Bank
Large amount of data available for linear epitopes
Few data available for discontinuous epitopes
In general, B cell epitope prediction methods have relatively low performances
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Discontinuous B cell epitopes
•
SLDEKNSVSVDLPGEMKVLVSKEKNKDGKYDLIATVDKLELKGTSDKNNGSGVLEGVKADKCKVKLTISDDLGQTTLEVFKEDGKTLVSKKVTSKDKSSTEEKFNEKGEVSEKIITRADGTRLEYTGIKSDGSGKAKEVLKG
• ..\Discotope\1OSP_epitope\1OSP_epitope.psw
An example: The epitope of the outer surface protein A from Borrelia Burgdorferi (1OSP)
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
The binding interactions
Salt bridges Hydrogen bonds Hydrophobic interactions Van der Waals forces
Binding strength
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
B-cell epitopes are dynamic
Many of the charged groups and hydrogen bonding partners are present on highly flexible amino acid side chains.
Most crystal structures of epitopes and antibodies in free and complexed forms have shown conformational rearrangements upon binding.
“Induced fit” model of interactions.
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
B-cell epitope classification
Linear epitopes
One segment of the amino acid chain
Discontinuous epitope (with linear determinant)
Discontinuous epitope
Several small segments brought into proximity by the protein fold
B-cell epitope – structural feature of a molecule or pathogen, accessible and recognizable by B-cells
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
B-cell epitope annotation
• Linear epitopes:– Chop sequence into small pieces and measure
binding to antibody
• Discontinuous epitopes:– Measure binding of whole protein to antibody
• The best annotation method : X-ray crystal structure of the antibody-epitope complex
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
B-cell epitope data bases
• Databases: AntiJen, IEDB, BciPep,
Los Alamos HIV database, Protein
Data Bank
• Large amount of data available for
linear epitopes
• Few data available for discontinuous
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
B cell epitope prediction
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Sequence-based methods for prediction of linear epitopes
Protein hydrophobicity – hydrophilicity algorithms Parker, Fauchere, Janin, Kyte and Doolittle, ManavalanSweet and Eisenberg, Goldman, Engelman and Steitz (GES), von Heijne
Protein flexibility prediction algorithm Karplus and Schulz
Protein secondary structure prediction algorithms GOR II method (Garnier and Robson), Chou and Fasman, Pellequer
Protein “antigenicity” prediction :Hopp and Woods, Welling
TSQDLSVFPLASCCKDNIASTSVTLGCLVTGYLPMSTTVTWDTGSLNKNVTTFPTTFHETYGLHSIVSQVTASGKWAKQRFTCSVAHAESTAINKTFSACALNFIPPTVKLFHSSCNPVGDTHTTIQLLCLISGYVPGDMEVIWLVDGQKATNIFPYTAPGTKEGNVTSTHSELNITQGEWVSQKTYTCQVTYQGFTFKDEARKCSESDPRGVTSYLSPPSPL
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Propensity scales: The principle
• The Parker hydrophilicity scale
• Derived from experimental data
D 2.46E 1.86N 1.64S 1.50Q 1.37G 1.28K 1.26T 1.15R 0.87P 0.30H 0.30C 0.11A 0.03Y -0.78V -1.27M -1.41I -2.45 F -2.78L -2.87W -3.00 Hydrophilicity
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Propensity scales: The principle
….LISTFVDEKRPGSDIVEDLILKDENKTTVI….
(-2.78 + -1.27 + 2.46 +1.86 + 1.26 + 0.87 + 0.3)/7 = 0.39
Prediction scores:
0.38 0.1 0.6 0.9 1.0 1.2 2.6 1.0 0.9 0.5 -0.5
Epitope
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Evaluation of performance
• A Receiver Operator Curve (ROC) is useful for finding a good threshold and rank methods
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Turn prediction and B-cell epitopes
• Pellequer found that 50% of the epitopes in a data set of 11 proteins were located in turns
•Turn propensity scales for each position in the turn were used for epitope prediction.
Pellequer et al.,
Immunology letters, 1993
1
2
3
4
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Blythe and Flower 2005
• Extensive evaluation of propensity scales for epitope prediction
• Conclusion: – Basically all the classical scales
perform close to random!– Other methods must be used for
epitope prediction
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
BepiPred: CBS in-house tool
• Parker hydrophilicity scale • Hidden Markov model• Markov model based on linear epitopes
extracted from the AntiJen database• Combination of the Parker prediction scores
and Markov model leads to prediction score• Tested on the Pellequer dataset and
epitopes in the HIV Los Alamos database
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
ROC evaluation
Evaluation on HIV Los Alamos data set
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
BepiPred performance
• Pellequer data set:– Levitt AROC = 0.66– Parker AROC = 0.65 – BepiPred AROC = 0.68
• HIV Los Alamos data set – Levitt AROC = 0.57– Parker AROC = 0.59– BepiPred AROC = 0.60
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
BepiPred
• BepiPred conclusion: – On both of the evaluation data sets,
Bepipred was shown to perform better– Still the AROC value is low compared to T-
cell epitope prediction tools!– Bepipred is available as a webserver:
www.cbs.dtu.dk/services/BepiPred
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
How can we get information about the three-dimensional structure?
Structural
determination • X-ray crystallography • NMR spectroscopy
Both methods are time consuming and not easily done in a larger scale
Structure prediction
• Homology modeling• Fold recognition
Less time consuming, but there is a possibility of incorrect predictions, specially in loop regions
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Protein structure prediction methods
• Homology/comparative modeling
>25% sequence identity (seq 2 seq alignment)• Fold-recognition/threading
<25% sequence identity (Psi-blast search/ seq 2 str alignment)
• Ab initio structure prediction
0% sequence identity
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
A data set of 3D discontinuous epitopes
A data set of 75 discontinuous epitopes was compiled from structures of antibodies/protein antigen complexes in the PDB
The data set has been used for developing a method for predictions of discontinuous B cell epitopes
Since about 30 of the PDB entries represented Lysozyme, I have used homology grouping (25 groups of non-homologous antigens) and 5 fold cross-validation for training of the method
Performance was measured using ROC curves on a per antigen basis, and by weighted averaging of AUC values
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Epitope log-odds ratios Frequencies of amino acids in epitopes
compared to frequencies of non-epitopes
Several discrepancies compared to the Parker hydrophilicity scale which is often used for epitope prediction
Both methods are used for predictions using a sequential average of scores
Predictive performance of B cell epitopes:Parker 0.614 AUCEpitope log–odds 0.634 AUC
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
3D information: Contact numbers
• Surface exposure and• structural protrusion can• be measured by residue• contact numbers
The predictive performance:
Parker 0.614 AUCEpitope log–odds 0.634 AUCContact numbers 0.647 AUC
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
DiscoTope : Prediction of Discontinuous epitopes using 3D structures
• A combination of:– Sequentially averaged epitope log-odds values of
residues in spatial proximity– Contact numbers
-0.145
+0.346+1.136
+0.691+0.346+1.136+1.180+1.164
Contact number : K 10
DiscoTope prediction value
Sum of log-odds values
.LIST..FVDEKRPGSDIVED……ALILKDENKTTVI.
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
DiscoTope : Prediction of Discontinuous epiTopes
Improved prediction of residues in discontinuous B
cell epitopes in the data set
The predictive performance on B cell epitopes:
Parker 0.614 AUC
Epitope log–odds 0.634 AUC
Contact numbers 0.647 AUC
DiscoTope 0.711 AUC
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Evaluation example AMA1
• Apical membrane antigen 1 from Plasmodium falciparum (not used for training/testing)
• Two epitopes were identified using phage-display, point-mutation (black side chains) and sequence variance analysis (side chains of polyvalent residues in yellow)
• Most residues identified as epitopes were successfully predicted by DiscoTope(green backbone)
DiscoTope is available as web server: http://www.cbs.dtu.dk/services/DiscoTope/
..\Discotope\1Z40_epitope\1Z40_movie.mov
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Future improvements
Add epitope predictions for protein-protein complexes
Visualization of epitopes integrated in web server
Testing a score for sequence variability fx based on entropy of positions in the antigens
Combination with glycosylation site predictions
Combination with predictions of trans-membrane regions
Assembling predicted residues into whole epitopes
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Presentation of the web server
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Presentation of the web server output
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Use the CEP server
• Conformational epitope server
http://202.41.70.74:8080/cgi-bin/cep.pl
• Uses protein structure as input• Finds stretches in sequences which
are surface exposed
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Use the DiscoTope server
• CBS server for prediction of discontinuous epitopes
• Uses protein structure as input• Combines propensity scale values
of amino acids in discontinuous epitopes with surface exposure
• www.cbs.dtu.dk/services/DiscoTope
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Rational vaccine design
>PATHOGEN PROTEINKVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNF
Rational Vaccine Design
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Rational B-cell epitope design
• Protein target choice
• Structural analysis of antigen Known structure or homology model Precise domain structure Physical annotation (flexibility,
electrostatics, hydrophobicity) Functional annotation (sequence
variations, active sites, binding sites, glycosylation sites, etc.)
Known 3D structure
Model
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Rational B-cell epitope design
Surface accessibility Protrusion index Conserved sequence Glycosylation status
• Protein target choice• Structural annotation• Epitope prediction and ranking
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Rational B-cell epitope design• Protein target choice• Structural annotation• Epitope prediction and ranking• Optimal Epitope presentation
Fold minimization, or Design of structural mimics Choice of carrier (conjugates, DNA
plasmids, virus like particles) Multiple chain protein engineering
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Multi-epitope protein design
Rational optimization of epitope-VLP chimeric proteins:
Design a library of possible linkers (<10 aa)
Perform global energy optimization in VLP (virus-like particle) context
Rank according to estimated energy strain
B-cellepitope
T-cellepitope
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Conclusions
• Selection of protective B-cell epitopes involves structural, functional and immunogenic analysis of the pathogenic proteins
• When you can: Use protein structure for prediction • Structural modeling tools are helpful in prediction of
epitopes, design of epitope mimics and optimal epitope presentation