+ All Categories
Home > Documents > Predicting virus mutations through relational learning … · Predicting virus mutations through...

Predicting virus mutations through relational learning … · Predicting virus mutations through...

Date post: 15-Sep-2018
Category:
Upload: doananh
View: 218 times
Download: 0 times
Share this document with a friend
24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion Predicting virus mutations through relational learning AIMM 2012 E Cilia 1 , S Teso 2 , S Ammendola 3 , T Lenaerts 1 , and A Passerini 2 September 9 th , 2012 1 - D´ epartement d’Informatique, FS, Universit´ e Libre de Bruxelles 2 - Department of Computer Science and Information Engineering, University of Trento 3 - Ambiotec sas E Cilia 1 , S Teso 2 , S Ammendola 3 , T Lenaerts 1 , and A Passerini 2 — Predicting virus mutations through relational learning 1/24
Transcript

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Predicting virus mutations through

relational learningAIMM 2012

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and APasserini2

September 9th, 2012

1 - Departement d’Informatique, FS, Universite Libre de Bruxelles2 - Department of Computer Science and Information Engineering, University of Trento

3 - Ambiotec sas

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 1/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Motivations

Mining relevant features from protein mutation data

understanding the properties of functional sites

developing novel proteins with useful/relevant function

Rational Design

engineering technique modifying existing proteins by sitedirected mutagenesis

assumes knowledge (or intuition) about the e↵ects ofspecific mutations

involves extensive trial-and-error experiments

also serves to improve understanding protein function

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 2/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

IntroductionAn artificial system mimicking rational design

Goal

To build an artificial system mimicking the rational designprocess

A relational learning approach to:

1 mine rules from mutation data describing mutationsrelevant to a certain behavior

2 use the rules to infer novel mutations that may induce asimilar behavior

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 3/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

A Relational Learning Approach

backgroundknowledge

dataset of mutations / mutants

rank of novel

relevant mutations

hypothesis

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 4/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Step 1: Relational Learning PhaseLearning in First Order Logic

data D, background knowledge B and features inducedduring learning are represented in first order logicres against(M,nnrti) mut(M,P) AND close to site(P)

head body

searching for a set of clauses (hypothesis) covering all ormost positive examples, and none or few negative ones.

Advantages

expressivity and interpretability of the learned model

possibility to make use of specific background knowledge

ability to learn rules from description of complex,structured entities

the learnt rules constrain the rational design space

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 5/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Step 2: Generative PhaseMutation Generation Algorithm

Algorithm Mutation generation1: input: background knowledge B, learned model H, k2: output: rank of the most relevant mutations R3: procedure GenerateMutations(B,H, k)4: Initialize DM ;5: A find all mutations m that satisfy at least one clause c

i

2 H6: for m 2 M do

7: score SM

(m) . number of clauses ci

satisfied by m8: DM DM [ {(m, score)}9: end for

10: R RankMuts(DM,B,H, k) . rank relevant mutations11: return R12: end procedure

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 6/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

HIV-1 RT Drug Resistance

mining rules from HIV mutation data

understand the virus adaptation mechanism

design drugs that e↵ectively counter potentiallyresistant mutants

Datasets1 Reverse Transcriptase (RT) mutations from the Los Alamos

National Laboratories HIV resistance database

NRTI ! 95 mutationsNNRTI ! 56 mutations

2 RT mutants from the Stanford HIV drug resistance

database

NRTI ! 639 mutantsNNRTI ! 747 mutants

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 7/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Learning settingsLearning from mutations

Mutation -based learning

Input examples: single amino-acid mutations conferringresistance to a class of drugs

aa(Pos,AA)

mut(MutationID,AA,Pos,AA1)

Target concept: a model (i.e. set of rules) describing amutation conferring resistance to a certain classof drugs

res against(MutationID,Drug)

Learning setting: learn from positive examples only(annotation on mutations NOT conferringresistance is scarce)

Output: generated resistance mutations

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 8/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Background Knowledge

Background Knowledge Predicates(excerpt)

typeaa(T,AA)

same type aa(R1,R2,T)

same type mut t(MutID,Pos,T)

close to site(Pos)

location(L,Pos)

catalytic propensity(AA,CP)

(Betts and Russell, 2003)

Background Knowledge Rules (example)

same type aa(R1,R2,T) typeaa(T,R1) AND typeaa(T,R2)

different type mut t(MutID,Pos) mut(MutID,R1,Pos,R2)AND NOT same type aa(R1,R2,T)

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 9/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Learned HypothesisModel for the resistance to NNRTI

>wt ...AGLKKKKSVTVLDVG...YQYMDDLYVG...WETWWTEY...WIPEWEFVN...

| | | | | | | |

98 112 181 190 398 405 410 418

D DD W W

mut(A,B,C,D) AND position(C,190)

mut(A,B,C,D) AND position(C,190) AND typeaa(polar,D)

mut(A,y,C,D) AND typeaa(aliphatic,D)

mut(A,B,C,a) AND position(C,106)

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 10/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Experimental Setting

Aleph ILP system (one-class classification setting )

30 random training/test set splits (70/30) (for each ofthe 2 learning tasks)

enrichement in the test mutations (recall)

comparison against the random generator

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 11/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Experimental Results

Mean recall % on 30 splits

Algorithm Random Generator

NNRTI 86 • 58NRTI 55 • 46

Mean n. generated mutations n. test mutationsNNRTI 5201 17NRTI 5548 28

(•) significant improvement evaluated with a paired Wilcoxon test(↵=0.01)

0"

10"

20"

30"

40"

50"

60"

70"

80"

90"

100"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10"

mean%recall%

number%of%sa.sfied%clauses%per%generated%muta.on%

NNRTI"

NNRTI"(rand)"

NRTI"

NRTI"(rand)"

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 12/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Learning settingsLearning from mutants

Mutant -based learning

Input examples: mutant resistant or not to a class of drugs

aa(Pos,AA)

mut(MutantID,AA,Pos,AA1)

Target concept: a model (i.e. set of rules) describing amutant resistant to a certain class of drugs

res against(MutantID,Drug)

Learning setting: binary classification setting

Output: generated resistant mutants with a single aminoacid mutation

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 13/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Experimental Setting

Aleph ILP system (binary classification setting )

30 random training/test set splits (for each of the 2learning tasks)

enrichment in test set mutations as performance measure(recall)

comparison against the random generator

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 14/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Experimental Results

Mean recall % on 30 splits

Algorithm Random Generator

NNRTI 17 • 1NRTI 7 • 3

Mean n. generated mutations mean n. test mutationsNNRTI 236 26NRTI 420 40

0"

2"

4"

6"

8"

10"

12"

14"

16"

18"

1" 2"

mean%recall%

number%of%sa.sfied%clauses%%per%generated%muta.on%

NNRTI"

NNRTI"(rand)"

NRTI"

NRTI"(rand)"

(•) significant improvement evaluatedwith a paired Wilcoxon test (↵=0.01)

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 15/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Results

NNRTI rules (excerpt)

res against(A,nnrti) mut(A,B,C,D) AND position(C,177) AND

catalytic propensity(D,medium) AND same type mut t(A,C,polar)

res against(A,nnrti) mut(A,B,C,D) AND catalytic propensity(D,high) AND

typeaa(aromatic,B) AND same typeaa(D,B,neutral)

NRTI rules (excerpt)

res against(A,nrti) mut(A,B,C,D) AND position(C,33)

res against(A,nrti) mut(A,B,C,r) AND typeaa(tiny,B) AND typeaa(polar,B)

NNRTI prediction highlights

Identified resistance survaillance mutations(53%): 103N, 106A, 181C, 181I, 181V, 188C,188H, 190A, 190E, 190S

Other identified resistance mutations (29% ofDataset 1): 98G, 227C, 190C, 190Q, 190T,190V

Other identified mutations (from the literature):238N

Other key positions from the rules are: 177

Highly scored not reported as resistancemutations: 181N, 181D, 318C, 232C

NRTI prediction highlights

Identified resistance survaillance mutations(18%): 67E, 67G, 67N, 116Y, 184V, 184I

Other identified resistance mutations (18% ofDataset 1): 44D, 62V, 67A, 67S, 69R, 184T

Other identified mutations (from the literature):219H

Other key positions from the rules are: 33, 194,218

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 16/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Summary

Relational learning approach mimicking the rational designprocess:

HIV RT mutations/mutants

we built a relational knowledge base

we mined relevant relational features for modelingresistance mutations/mutants

we generated candidate mutations satisfying the learnedrules

promising results, both in the mutation-based and in themutant-based learning settings, suggest a potential inguiding mutant engineering or predicting virus evolution

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 17/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Future WorkWork in progress

extend the background knowledge

single nucleotide change(a,d).

aamutations single(R1,R2) mut(M,R1,P,R2) AND

(single nucleotide change(R1,R2) OR

single nucleotide change(R2,R1))

post-processing involving mutant evaluation by statisticallearning approaches and stability predictors or searchagainst HIV genome databases

and generalize the approach to jointly generate sets ofrelated mutations (mutants with multiple mutations)

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 18/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Future WorkFrom single to multiple amino acid mutations

Observations

multiple mutations are often required in order to a↵ectprotein function

neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)

...EYIQAKVQM...LDNLLNIEVAY...

...EYIQAKVQM...LDNLLDIEVAY...

...EYIQAKVQM...LENLLDIEVAY...

...EYIQAKVQM...LENLLNIEVAY...

Motivation

Multiple point mutations (MPM)

Multiple mutations are often required in order to a↵ectprotein function

Neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)

Andrea Passerini — Frankenstein Junior 4/14

Motivation

Multiple point mutations (MPM)

Multiple mutations are often required in order to a↵ectprotein function

Neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)

Andrea Passerini — Frankenstein Junior 4/14

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 19/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Future WorkFrom single to multiple amino acid mutations

Predicting multiple mutations

predicting single mutations does not consider the jointe↵ect of multiple mutations

trying all possible combinations is computationallyinfeasible (and not enough data)

...EYIQAKVQM...LDNLLNIEVAY...

...EYIQAKVQM...LDNLLDIEVAY...

...EYIQAKVQM...LENLLDIEVAY...

...EYIQAKVQM...LENLLNIEVAY...

Motivation

Multiple point mutations (MPM)

Multiple mutations are often required in order to a↵ectprotein function

Neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)

Andrea Passerini — Frankenstein Junior 4/14

Motivation

Multiple point mutations (MPM)

Multiple mutations are often required in order to a↵ectprotein function

Neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)

Andrea Passerini — Frankenstein Junior 4/14

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 20/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Predicting multiple mutations

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 21/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Predicting multiple mutations

>m542 PISPIET FAIKKKSSS PLDKDFRKY ELREHLLKWGFY EIQKQGPGQWT IVGAETF>wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF>m2012 PISPIET FAIKKKDST PLDESFRKY KLREHLLRWGFT EVQKQGPDQWT IPGAETY ******* ******.*: ***:.**** :**:***:*** *:**** .*** * ****: - - - - | | | | 67 123 207 334

mut(A,B,C,p),pos(C,334),correlated_mut(A,D,E),pos(D,207),typeaa(A,E,negative).

>m2006 PMSPIET FAIKKKDST PLHEDFRKY ELREHLLKWGLT EVQKQGPDQWT IAGAETY>wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF>m1288 PISPIDT FAIKKKNSD PLDESFRKY ELREHLLKWGFF EIQKQGPGQWT IPGAETY *:***:* ******:* ** *.**** ***:***:**: *:**** .*** * ****:

- - - - - | | | | | 67 121 123 207 334

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 22/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Predicting multiple mutations

>m542 PISPIET FAIKKKSSS PLDKDFRKY ELREHLLKWGFY EIQKQGPGQWT IVGAETF>wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF>m2012 PISPIET FAIKKKDST PLDESFRKY KLREHLLRWGFT EVQKQGPDQWT IPGAETY ******* ******.*: ***:.**** :**:***:*** *:**** .*** * ****: - - - - | | | | 67 123 207 334

mut(A,B,C,p),pos(C,334),correlated_mut(A,D,E),pos(D,207),typeaa(A,E,negative).

67 122 328 | | | - - -

PISPIET...FAIKKKDST...PLDNDFRKY...ELREHLLRWGFT...EIQKQGPGQWT...IVGAETF PISPIET...FAIKKKDST...PLDEDFRKY...ELRDHLLRWGFT...QIQKQGPGQWT...IVGAETF PISPIET...FAIKKKSST...PLDEDFRKY...ELRDHLLRWGFT...EIQKQGPGQWT...IVGAETF

>m2006 PMSPIET FAIKKKDST PLHEDFRKY ELREHLLKWGLT EVQKQGPDQWT IAGAETY>wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF>m1288 PISPIDT FAIKKKNSD PLDESFRKY ELREHLLKWGFF EIQKQGPGQWT IPGAETY *:***:* ******:* ** *.**** ***:***:**: *:**** .*** * ****:

- - - - - | | | | | 67 121 123 207 334

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 23/24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Thank you

Questions ?

Elisa [email protected]

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 24/24


Recommended