Objectives Morten Nielsen,Tcell.pdf · One day in the fall of 1906, the British scientist Fracis...

Prediction of T cell epitopes usingartificial neural networks

Morten Nielsen,

CBS, BioCentrum,

DTU

Objectives

• How to train a neural network to predict peptideMHC class I binding

• Understand why NN’s perform the best– Higher order sequence information

• The wisdom of the crowd!– Why enlightened despotism does not work even for

Neural networks

Outline

• MHC class I epitopes

– Why MHC binding?

• How to predict MHC binding?– Information content

– Weight matrices

– Neural networks

• Neural network theory– Sequence encoding

• Examples

Prediction of HLA binding specificity

Simple Motifs– Allowed/non allowed amino acids

Extended motifs– Amino acid preferences (SYFPEITHI)– Anchor/Preferred/other amino acids

Hidden Markov models– Peptide statistics from sequence alignment (previous

talk)

Neural networks– Can take sequence correlations into account

SYFPEITHI predictions

Extended motifs based on peptides from the literatureand peptides eluted from cells expressing specific HLAs( i.e., binding peptides)

Scoring scheme is not readily accessible. Positions defined as anchor or auxiliary anchor positions

are weighted differently (higher) The final score is the sum of the scores at each position Predictions can be made for several HLA-A, -B and -DRB1

alleles, as well as some mice K, D and L alleles.

BIMAS

Matrix made from peptides with a measured T1/2 for theMHC-peptide complex

The matrices are available on the websiteThe final score is the product of the scores of each

position in the matrix multiplied with a constant,different for each MHC, to give a prediction of the T1/2

Predictions can be obtained for several HLA-A, -B and -Calleles, mice K,D and L alleles, and a single cattle MHC.

How to predict

The effect on the binding affinity ofhaving a given amino acid at oneposition can be influenced by theamino acids at other positions in thepeptide (sequence correlations).– Two adjacent amino acids may for

example compete for the space in apocket in the MHC molecule.

Artificial neural networks (ANN) areideally suited to take suchcorrelations into account

Higher order sequence correlations

Neural networks can learn higher order correlations!– What does this mean?

S S => 0

L S => 1

S L => 1

L L => 0

No linearfunction canlearn this (XOR)pattern

Say that the peptide needs one and onlyone large amino acid in the positions P3and P4 to fill the binding cleft

How would you formulate this to test ifa peptide can bind?

Neural network learning higher ordercorrelations

• How is mutual information calculated?• Information content was calculated as

• Gives information in a single position

• Similar relation for mutual information• Gives mutual information between two positions

Mutual information

!

I = paa

" log(pa

qa)

!

I = paba,b

" log(pab

pa # pb)

Mutual information. Example

ALWGFFPVA

ILKEPVHGV

ILGFVFTLT

LLFGYPVYV

GLSPTVWLS

YMNGTMSQV

GILGFVFTL

WLSLLVPFV

FLPSDFFPS

P1 P6

P(G1) = 2/9 = 0.22, ..P(V6) = 4/9 = 0.44,..P(G1,V6) = 2/9 = 0.22, P(G1)*P(V6) = 8/81 = 0.10

log(0.22/0.10) > 0!

I = paba,b

" log(pab

pa # pb)

Knowing that you have G at P1 allows you tomake an educated guess on what you will findat P6.P(V6) = 4/9. P(V6|G1) = 1.0!

313 binding peptides 313 random peptides

Mutual information

SLLPAIVEL YLLPAIVHI TLWVDPYEV GLVPFLVSV KLLEPVLLL LLDVPTAAV LLDVPTAAV LLDVPTAAV

LLDVPTAAV VLFRGGPRG MVDGTLLLL YMNGTMSQV MLLSVPLLL SLLGLLVEV ALLPPINIL TLIKIQHTL

HLIDYLVTS ILAPPVVKL ALFPQLVIL GILGFVFTL STNRQSGRQ GLDVLTAKV RILGAVAKV QVCERIPTI

ILFGHENRV ILMEHIHKL ILDQKINEV SLAGGIIGV LLIENVASL FLLWATAEA SLPDFGISY KKREEAPSL

LERPGGNEI ALSNLEVKL ALNELLQHV DLERKVESL FLGENISNF ALSDHHIYL GLSEFTEYL STAPPAHGV

PLDGEYFTL GVLVGVALI RTLDKVLEV HLSTAFARV RLDSYVRSL YMNGTMSQV GILGFVFTL ILKEPVHGV

ILGFVFTLT LLFGYPVYV GLSPTVWLS WLSLLVPFV FLPSDFFPS CLGGLLTMV FIAGNSAYE KLGEFYNQM

KLVALGINA DLMGYIPLV RLVTLKDIV MLLAVLYCL AAGIGILTV YLEPGPVTA LLDGTATLR ITDQVPFSV

KTWGQYWQV TITDQVPFS AFHHVAREL YLNKIQNSL MMRKLAILS AIMDKNIIL IMDKNIILK SMVGNWAKV

SLLAPGAKQ KIFGSLAFL ELVSEFSRM KLTPLCVTL VLYRYGSFS YIGEVLVSV CINGVCWTV VMNILLQYV

ILTVILGVL KVLEYVIKV FLWGPRALV GLSRYVARL FLLTRILTI HLGNVKYLV GIAGGLALL GLQDCTMLV

TGAPVTYST VIYQYMDDL VLPDVFIRC VLPDVFIRC AVGIGIAVV LVVLGLLAV ALGLGLLPV GIGIGVLAA

GAGIGVAVL IAGIGILAI LIVIGILIL LAGIGLIAA VDGIGILTI GAGIGVLTA AAGIGIIQI QAGIGILLA

KARDPHSGH KACDPHSGH ACDPHSGHF SLYNTVATL RGPGRAFVT NLVPMVATV GLHCYEQLV PLKQHFQIV

AVFDRKSDA LLDFVRFMG VLVKSPNHV GLAPPQHLI LLGRNSFEV PLTFGWCYK VLEWRFDSR TLNAWVKVV

GLCTLVAML FIDSYICQV IISAVVGIL VMAGVGSPY LLWTLVVLL SVRDRLARL LLMDCSGSI CLTSTVQLV

VLHDDLLEA LMWITQCFL SLLMWITQC QLSLLMWIT LLGATCMFV RLTRFLSRV YMDGTMSQV FLTPKKLQC

ISNDVCAQV VKTDGNPPE SVYDFFVWL FLYGALLLA VLFSSDFRI LMWAKIGPV SLLLELEEV SLSRFSWGA

YTAFTIPSI RLMKQDFSV RLPRIFCSC FLWGPRAYA RLLQETELV SLFEGIDFY SLDQSVVEL RLNMFTPYI

NMFTPYIGV LMIIPLINV TLFIGSHVV SLVIVTTFV VLQWASLAV ILAKFLHWL STAPPHVNV LLLLTVLTV

VVLGVVFGI ILHNGAYSL MIMVKCWMI MLGTHTMEV MLGTHTMEV SLADTNSLA LLWAARPRL GVALQTMKQ

GLYDGMEHL KMVELVHFL YLQLVFGIE MLMAQEALA LMAQEALAF VYDGREHTV YLSGANLNL RMFPNAPYL

EAAGIGILT TLDSQVMSL STPPPGTRV KVAELVHFL IMIGVLVGV ALCRWGLLL LLFAGVQCQ VLLCESTAV

YLSTAFARV YLLEMLWRL SLDDYNHLV RTLDKVLEV GLPVEYLQV KLIANNTRV FIYAGSLSA KLVANNTRL

FLDEFMEGV ALQPGTALL VLDGLDVLL SLYSFPEPE ALYVDSLFF SLLQHLIGL ELTLGEFLK MINAYLDKL

AAGIGILTV FLPSDFFPS SVRDRLARL SLREWLLRI LLSAWILTA AAGIGILTV AVPDEIPPL FAYDGKDYI

AAGIGILTV FLPSDFFPS AAGIGILTV FLPSDFFPS AAGIGILTV FLWGPRALV ETVSEQSNV ITLWQRPLV

Neural network training

• Sequence encoding– Sparse

– Blosum

– Hidden Markov model

• Network ensembles– Cross validated training

– Benefit from ensembles

Sequence encoding

• How to represent a peptide amino acidsequence to the neural network?• Sparse encoding (all!amino acids are equally

disalike)

• Blosum encoding (encodes similaritiesbetween the different amino acids)

• Weight matrix (encodes the position specificamino acid preference of the HLA bindingmotif)

Evaluation of prediction accuracy

PSSM

Neural network training. Cross validation

Cross validation

Train on 4/5 of dataTest on 1/5=>Produce 5 differentneural networks eachwith a differentprediction focus

Neural network training curve

Maximum test set performanceMost cable of generalizing

Network ensembles

The Wisdom of the Crowds

The Wisdom of Crowds. Why the Many areSmarter than the Few. James Surowiecki

One day in the fall of 1906, the British scientist FracisGalton left his home and headed for a country fair… He

believed that only a very few people had thecharacteristics necessary to keep societies healthy. He

had devoted much of his career to measuring thosecharacteristics, in fact, in order to prove that the vastmajority of people did not have them. … Galton came

across a weight-judging competition…Eight hundred peopletried their luck. They were a diverse lot, butchers,

farmers, clerks and many other no-experts…The crowdhad guessed … 1.197 pounds, the ox weighted 1.198

Network ensembles

• No one single network with a particulararchitecture and sequence encoding scheme,will constantly perform the best

• Also for Neural network predictions willenlightened despotism fail– For some peptides, BLOSUM encoding with a four

neuron hidden layer can best predict thepeptide/MHC binding, for other peptides a sparseencoded network with zero hidden neurons performsthe best

– Wisdom of the Crowd• Never use just one neural network• Use Network ensembles

Evaluation of prediction accuracy

ENS: Ensemble of neural networks trained using sparse, Blosum, and weight matrix sequence encoding

T cell epitope identification

Lauemøller et al., reviews in immunogenetics 2001

NetMHC-3.0 update

• IEDB + more proprietary data• Higher accuracy for existing ANNs

• More Human alleles

• Non human alleles (Mice + Primates)

• Prediction of 8mer binding peptides for somealleles

• Prediction of 10- and 11mer peptides for allalleles

• Outputs to spread sheet

NetMHC Output

53

49

94

289

529

M

Prediction of 10- and 11mers using9mer prediction tools

Approach:

For each peptide of length L create 6pseudo peptides deleting a sliding windowof L- 9 always keeping pos. 1,2,3, and 9

Example:

MLPQWESNTL = MLPWESNTL

MLPQESNTL

MLPQWSNTL

MLPQWENTL

MLPQWESTL

MLPQWESNL

L P

Q

W E S N T L



Final prediction = average of the 6 logscores:

(0.477+0.405+0.564+0.505+0.559+0.521)/6 = 0.505

Affinity:Exp(log(50000)*(1 - 0.505)) = 211.5 nM

Prediction using ANN trained on10mer peptides


Examples. Hepatitis C virus. Epitope predictions

Hotspots

SARS T cell epitope identification

Peptide binding affinity

A01 predicted peptides offered to rA*0101

0.000

0.500

1.000

1.500

2.000

2.500

A1

6929

A1

6930

A1

6931

A1

6932

A1

6933

A1

6934

A1

6935

A1

6936

A1

6937

A1

6938

A1

6939

A1

6940

A1

6941

A1

6942

A1

6943

Pept ides tested

Pep

tid

e

aff

init

y

(K

D)

µM

Peptides tested: 15/15 (100 %)

Binders (KD < 500 nM): 14/15 (93%)

More SARS CTL epitopes


A03 predicted peptides offered to

rA*1101

0.000

0.500

1.000

1.500

2.000

2.500

A3

-69

59

A3

-69

60

A3

-69

61

A3

-69

62

A3

-69

63

A3

-69

64

A3

-69

65

A3

-69

66

A3

-69

67

A3

-69

68

A3

-69

69

A3

-69

70

A3

-69

71

A3

-69

72

A3

-69

73

Pepti

de aff

init

y (K

D)

µM


B7 predicted peptides offered to rB*0702

0.000

0.500

1.000

1.500

2.000

2.500

B7

-69

89

B7

-69

90

B7

-69

91

B7

-69

92

B7

-69

93

B7

-69

94

B7

-69

95

B7

-69

96

B7

-69

97

B7

-69

98

B7

-69

99

B7

-70

00

B7

-70

01

B7

-70

02

B7

-70

03

Peptides tested

Pepti

de aff

init

y (K

D)

µM



0.000

0.500

1.000

1.500

2.000

2.500

B58-7035

B58-7036

B58-7037

B58-7038

B58-7039

B58-7040

B58-7041

B58-7042

B58-7043

B58-7044

B58-7045

B58-7046

B58-7047

B58-7048

B58-7049

Peptides tested

Pepti

de af

finit

y (K

D)

µM



0.000

0.500

1.000

1.500

2.000

2.500

79.H

LA-B

62 7050

79.H

LA-B

62 7051

79.H

LA-B

62 7052

79.H

LA-B

62 7053

79.H

LA-B

62 7054

79.H

LA-B

62 7055

79.H

LA-B

62 7056

79.H

LA-B

62 7057

79.H

LA-B

62 7058

79.H

LA-B

62 7059

79.H

LA-B

62 7060

79.H

LA-B

62 7061

79.H

LA-B

62 7062

79.H

LA-B

62 7063

79.H

LA-B

62 7064

Peptides tested

Pepti

de aff

init

y (K

D)

µM

11/15 14/15 10/15

13/15 12/14

A0301 A1101 B0702

A0201 B5801 B1501

A2 supertype:A2 supertype:

Molecule usedMolecule used::

rA0201/ human rA0201/ human !!22mm


A02 predicted peptides offered to rA*0201

0.000

0.500

1.000

1.500

2.000

2.500

A2

6944

A2

6945

A2

6946

A2

6947

A2

6948

A2

6949

A2

6950

A2

6951

A2

6952

A2

6953

A2

6954

A2

6955

A2

6956

A2

6957

A2

6958

Peptides tested

Pepti

de aff

init

y (K

D) µM

?

12/15

Vaccine design. Polytope optimization

• Successful immunization can be obtained only if theepitopes encoded by the polytope are correctlyprocessed and presented.

• Cleavage by the proteasome in the cytosol,translocation into the ER by the TAP complex, as well asbinding to MHC class I should be taken into account in anintegrative manner.

• The design of a polytope can be done in an effectiveway by modifying the sequential order of the differentepitopes, and by inserting specific amino acids that willfavor optimal cleavage and transport by the TAPcomplex, as linkers between the epitopes.

Vaccine design. Polytope construction

NH2 COOH

Epitope

Linker

M

C-terminal cleavage

Cleavage within epitopes

New epitopescleavage

Polytope starting configuration

Immunological Bioinformatics, The MIT press.

Polytope optimization Algorithm

• Optimization of four measures:

1. The number of poor C-terminal cleavage sites of epitopes(predicted cleavage < 0.9)

2. The number of internal cleavage sites (within epitopecleavages with a prediction larger than the predicted C-terminal cleavage)

3. The number of new epitopes (number of processed andpresented epitopes in the fusing regions spanning theepitopes)

4. The length of the linker region inserted between epitopes.

• The optimization seeks to minimize the above four terms by useof Monte Carlo Metropolis simulations [Metropolis et al., 1953]

Polytope optimal configuration

Immunological Bioinformatics, The MIT press.

Summary

• MHC class I binding can be veryaccurately predicted using ANN

• Higher order sequence correlations areimportant for peptide:MHC-I binding

• ANN can can be trained withoutoverfitting• Using multiple sequence encoding schemes• Wisdom of the crowd

• Optimization can generate polytopes withhigh likelihood for antigen presentation

Date post:	11-Aug-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Objectives Morten Nielsen,Tcell.pdf · One day in the fall of 1906, the British scientist Fracis...

Documents