Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 2 times |
Mapping Influenza A Virus Transmission Networks with
Whole Genome Comparisons(Methods)
Adrienne Breland
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Goal
- to characterize global Influenza A Virus
transmission as a complex network
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Russell (2008) The global circulation of seasonal influenza A (H3N2) viruses
Proposed global H3N2 circulation
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
• Motivation
• Major Questions
• Data
• Genome Comparison Method
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Outline
• Motivation
• Major Questions
• Data
• Genome Comparison Method
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Outline
Motivation
• Delineating real disease networks is difficult– Infection tracing: Detecting exact
transmission links– Contact tracing: All potential
transmission contacts– Diary Based: Subject records all contacts
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Motivation
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Infection tracing Contact tracing Diary Based
Keeling M & K Eames (2005) Networks and epidemic models. J. R. Soc. Interface 2:295-307
Motivation
• Delineating real disease networks is very useful
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Motivation
• Delineating real disease networks is very useful
-targeting an attack
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Motivation
• Delineating real disease networks is very useful
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGTError and attack tolerance of complex networks. Réka Albert, Hawoong Jeong and Albert-László Barabási
Motivation
• Delineating real disease networks is very useful
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGThttp://prblog.typepad.com/strategic_public_relation/images/2007/06/22/simple_social_network.png
Motivation
• Delineating real disease networks is very useful
-correlation coefficients
ji,ijji
ii
ABD [DB] BD typeof pairs
D [D] D typeof singles
correlatednot B and D if 1
[A][B]n
N[DB]CDB
Motivation
• Delineating real disease networks is very useful
-detecting more probable global routes
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Motivation
• Global routes
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Motivation
• Global routes
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Breland A, S Nasser, K Schlauch, M Nicolescu, F Harris (2008) Efficient Influenza A Virus Origin Detection. Journal of Electronics and Computer Science, 10;1-12
Motivation
• Delineating real disease networks is very useful
-examine with other spatial data
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Motivation
• Spatial data
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Motivation
• Spatial data
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
VEGETATION
Motivation
• Spatial data
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
POPULATION
Motivation
• Spatial data
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
CLIMATE CHANGE
• Motivation
• Major Questions
• Data
• Genome Comparison Method
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Outline
Major questions
• Location and degree of host jumping
• Underlying structure (small world, power law..)
• Subtype independence
• Re-assortment
• Geographic routes
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
• Motivation
• Major Questions
• Data
• Genome Comparison Method
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Outline
Data
• http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Data
• ≈ 4000 sequences• 1999-2009• Global regions (i.e. China, U.S., Africa, India...)• All subtypes (i.e. H5N1, H1N1, ..)• All hosts species (Domestic Avian, Wild Avian, etc..)
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Data
• ≈ 374 per year
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Data
• Multiple host types
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Data
• Multiple sub types
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
• Motivation
• Major Questions
• Data
• Genome Comparison Method
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Outline
Genome Comparisons
• Similarity matrix, N sequences:
N(N-1)/2 comparisons
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
-0.40.10.970.10.82N
--0.30.60.70.9.
---0.30.50.02.
----0.020.01.
-----0.932
------1
N. .. 21
- 1 N
-- 1.
--- .
---- .
-----12
------1
N. .. 21
Romanova,J (2006) The fight against new types of influenza virus. Biotechnology J,1:1381-1392
Genome Comparisons
• 8 segments
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
- 1 N
-- 1.
--- .
---- .
-----12
------1
N. .. 21
- 1 N
-- 1.
--- .
---- .
-----12
------1
N. .. 21
- 1 N
-- 1.
--- .
---- .
-----12
------1
N. .. 21
- 1 N
-- 1.
--- .
---- .
-----12
------1
N. .. 21
- 1 N
-- 1.
--- .
---- .
-----12
------1
N. .. 21
- 1 N
-- 1.
--- .
---- .
-----12
------1
N. .. 21
- 1 N
-- 1.
--- .
---- .
-----12
------1
N. .. 21
- 1 N
-- 1.
--- .
---- .
-----12
------1
N. .. 21
HA ≈ 1750bp
NS ≈ 900bp
M ≈ 1000bp NA ≈ 1300bp NP ≈ 1500bp
PA ≈ 2100bp PB1 ≈ 2200bp PB2 ≈ 2300bp
Genome Comparisons
• 8 segments
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
Genome Comparisons
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
• Alignment, O(n2), n = max sequence length
.....AAAACTTGAACC.....
.....GGACTTGACCT.....
Genome Comparisons
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
AAGAACCTTTATGACAAGGTTCGACTACA GCTTAGGGATAATGCAAAGGAGCTGGT
• Alignment-free k-mers, O(n)
∑ = {A,C,G,T/U}
4k possible k-mers, k≥0
TT
TG
.
.
.
AG
AC
AA
frequencyk-word
Genome Comparisons
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
• Feature Frequency Profiles (FFP)
Ck = <c1,...,c4k>
Fk = <c1/∑,...,c4k/∑> = <f1,...,f4k>
Sims GE, Jun SR, Wu GA, Kim SH (2009) Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc Natl Acad Sci U S A. ,106(8):2677-82 .
Genome Comparisons
• Jensen-Shannon Divergence (JS)
compare(s1,s2)
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
Pk = FFP(s1), Qk = FFP(s2), Mk = (Pk + Mk)/2
JS(Pk,Mk) = 1/2KL(Pk,Mk) + 1/2KL(Qk,Mk)
KL =
k
i ik
ikik m
pp
4
1 ,
,2,
log
Genome Comparisons
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
• k=?
Genome Comparisons
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
• k=?
k s.t. N(k) ≥ N(k+1)
k ≈ 4
Genome Comparisons
TTGTGGATTCTTGATCGTCTTTTCTTCAAATGTAT TTATCGTCGCCTTAAATACGGA
• Actual & Predicted times
• Questions/Comments?
• Thanks