Computational prediction and characterization of genomic islands: insights into bacterial...

Post on 10-May-2015

2,507 views 0 download

Tags:

transcript

Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Morgan G.I. Langille

Department of Molecular Biology & BiochemistrySimon Fraser University

http://tinyurl.com/genomic-islands

2

Genomic Island History

Early 1990’s clusters of virulence genes were found in E. coli (Hacker, et al.,1990)

Pathogenicity Islands (PAIs) Clusters of genes that are associated with bacterial

virulence

Genomic Islands (GIs) (Hacker, et al., 2000)

Segments of a genome that are thought to have originated from a horizontal transfer event

3

Genomic Island Interest

Pathogenicity Islands Adhesins

Fimbriae, intimin, etc. Secretion Systems

Type III and Type IV Toxins

Hemolysins, Pertussis toxin Invasins, Modulins, and Effectors

Antibiotic Resistance Islands Metabolic Islands

4

Genomic Island Interest

5

6

Methods for Predicting GIs

1. Sequence based Abnormal sequence composition

GC% bias, dinucleotide bias, codon bias, etc

Genomic features associated with mobile genetic elements Direct repeats, IS elements, presence of tRNA and

mobility genes (Integrases, transposases, etc.)

Methods of Predicting GIs

2. Comparative genomics based Identify genomic regions with anomalous

phylogenetic patterns Requires multiple genomes

8

Previous state of GI identification

1. Sequence based methods Numerous methods and constant improving of

algorithm design Not very user friendly and accuracy of various

methods not well described

2. Comparative based methods Used by many researchers, but with no

established method (only in-house scripts) Limited access to user friendly tools for this type

of analysis

9

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

10

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

11

12

Mauve-whole genome aligner

Allows genome arrangements and inversions Fast – Aligns two genomes < 15 minutes Command line accessible http://gel.ahabs.wisc.edu/mauve/

(Darling, et al., 2004)

13

IslandPick: Outline

Run Mauve

Mauve (A & B)

Extract unique regions

Mauve (A & C) Mauve (A & D)

Genome D

Putative Genomic IslandsBLAST

Identify overlapping unique regions

Query Genome AGenome B Genome C

Genome D

14

Selecting Comparative Genomes

14

Run Mauve

Mauve (A & B)

Extract unique regions

Mauve (A & C) Mauve (A & D)

Genome D

Putative Genomic IslandsBLAST

Identify overlapping unique regions

Genome B Genome CGenome D

Comparative Genome Selection (using CVTree distances)

Query Genome A

15

What genomes to use?

We want to compare the query genome to other comparative genomes within certain evolutionary distances

Need a phylogenetic tree or a distance matrix for all sequenced bacteria species

16

CVTree

Uses matching K-strings between the proteomes of two organisms

Constructs phylogenetic trees without alignment

Avoids choosing genes for phylogenetic reconstruction

Web Server http://cvtree.cbi.pku.edu.cn

Downloadable command line executable

(Qi, et al., 2004)

Example: Pseudomonas Tree

17

0.227

0.256

0.397

0.393

0.411

0.428

0.430

0

0.481

P. fluorescens Pf-5

P. putida KT2440

P. fluorescens PfO-1

P. syringae tomato DC3000

P. syringae phaseolicola 1448A

P. syringae syringae B728a

P. aeruginosa PAO1

P. aeruginosa PA14

Acinetobacter ADP1

Tree built using conserved genes, Omp85 & CarB, and maximum parsimony

CVTree distances from P.syringae B728a are shown

18

Determining Distance Cutoffs

Given the distances between any two species, how do we choose comparison genomes?

Maximum Distance Cutoff Eliminates the use of genomes that have diverged too

much (noise)

Minimum Distance Cutoff Eliminates the use of genomes that have not diverged

enough (very closely related strains)

Minimum Number of Genomes Eliminates the use of too few comparative genomes

0.227

0.256

0.397

0.393

0.411

0.428

0.430

0

0.481

P. fluorescens Pf-5

P. putida KT2440

P. fluorescens PfO-1

P. syringae tomato DC3000

P. syringae phaseolicola 1448A

P. syringae syringae B728a

P. aeruginosa PAO1

P. aeruginosa PA14

Acinetobacter ADP1

19

Example: Pseudomonas Tree

Minimum Distance Cutoff = 0.10

Maximum Distance Cutoff = 0.42

Minimum Number of Genomes = 3

20

Predicting Similar Aged GIs

GI I

nser

tion

Query Genome

1 genome < distance X

Query Genome

GI I

nser

tion

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

21

Accuracy of GI methods Sequence based GI prediction methods

Only require a single genome Can easily make false predictions

Highly expressed genes May miss predictions

Amelioration of DNA to host genome Source genome has same composition as host genome

Usually evaluate accuracy using simulated horizontal gene transfer events or small datasets of verified GIs

IslandPick is independent of sequence composition methods generated a “positive” dataset of islands

22

Developing a Negative Dataset

To identify false positives we need a “negative” dataset that does not contain GIs

Identify regions that are conserved across several genomes using Mauve whole genome alignment

Use the same genomes as selected by IslandPick with one additional cutoff

23

24

Negative Dataset

Query Genome

1 genome > distance X

GI I

nser

tion

Query Genome

GI I

nser

tion

IslandPick Cutoffs

25

26

•118 chromosomes •771 GIs• ~100 genes/strain

173 chromosomes

736 chromosomes

(Langille, et al., 2008)

GI Prediction Accuracy

27

PositiveDataset

NegativeDataset

PredictedDataset

TP FP

FN

Precision = TP / (TP + FP)Recall = TP / (TP + FN)

TN

28

GI Prediction Accuracy

Tool

Average number of nucleotides in GIs per genome

(kb)

Precision RecallOverall

Accuracy

SIGI-HMM 233 92 33.0 86

IslandPath/Dimob

171 86 36 86

PAI IDA 163 68 32 84

Centroid 171 61 28 82

IslandPath/Dinuc

444 55 53 82

Alien Hunter 1265 38 77 71

Literature* 639 100 87 96

(Langille, et al.,2008)

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

29

IslandViewer (Langille, et al., 2009)

Website that integrates the most accurate GI prediction programs SIGI-HMM, IslandPath-DIMOB, and IslandPick

Genomic island prediction pre-calculated for all genomes Automatically updated monthly

User genome submission available

IslandPick can be run using manually selected comparison genomes

Download data for a genomic island, a chromosome, or entire dataset

http://www.pathogenomics.sfu.ca/islandviewer/

30

31

32

33

34

IslandPick – Manual genome selection

35

User Genome Submission

36

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

37

Pseudomonas aeruginosaLiverpool Epidemic Strain (LES)

Highly successful at colonizing cystic fibrosis (CF) patients

Has replaced previously established strains

Caused infections of non-CF patients

Can cause greater morbidity in CF than other strains of P. aeruginosa

(Salunkhe, et al., 2005)38

LES Analysis

39

Genome sequenced by Sanger Centre

I led annotation of the genome and analysis of GIs

6 Prophages

5 Genomic Islands

(Winstanley, Langille, et al., 2008)

Signature-tagged mutagenesis (STM) STM is a

method to identify genes associated with pathogenesis

LES used in a chronic rat lung infection model

47 genes identified by STM

5 of these genes are within GIs and prophage regions

http://www.traill.uiuc.edu/uploads/porknet/papers/LitchtensteigerPaper.pdf

LES Prophage

41

PLES 15491 PLES 15961

4

PLES 25021 PLES 25661

5

Duplication 2

Duplication 1PLES 13201 PLES 13711

3

Duplication 2

2PLES 8321PLES 7891

Duplication 1

PLES 6091 PLES 6271

1

PLES 41181 PLES 41281

6

Pseudomonas Phage F10

Pseudomonas Phage D3112

Pyocin R2 Pseudomonas Phage D3

STM Mutations

Pseudomonas Phage Pf1

5 kb

PLES 15491 PLES 15961

4

PLES 25021 PLES 25661

5

Duplication 2

Duplication 1PLES 13201 PLES 13711

3

Duplication 2

2PLES 8321PLES 7891

Duplication 1

PLES 6091 PLES 6271

1

PLES 41181 PLES 41281

6

Pseudomonas Phage F10

Pseudomonas Phage D3112

Pyocin R2 Pseudomonas Phage D3

STM Mutations

Pseudomonas Phage Pf1Pseudomonas Phage F10Pseudomonas Phage F10

Pseudomonas Phage D3112Pseudomonas Phage D3112

Pyocin R2Pyocin R2 Pseudomonas Phage D3Pseudomonas Phage D3

STM Mutations

Pseudomonas Phage Pf1Pseudomonas Phage Pf1

5 kb5 kb

(Winstanley, Langille, et al., 2008)

LES Genomic Islands

42

(Winstanley, Langille, et al., 2008)

LES in-vivo competitive index

Mutants grown for 7 days in rat lung with the wild type LES

A CI of less than 1 indicates attenuation of virulence

4 genes within prophage and GIs had strong impact on competitiveness

43

(Winstanley, Langille, 2008)

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

44

Overview of CRISPRs

45

CRISPRs: Clustered regularly interspaced short palindromic repeats

Able to provide phage resistance and block conjugation

Thought to be similar to RNAi, except DNA (instead of RNA) is thought to be the target

CRISPRs and HGT

Previous studies have shown some evidence of HGT of CRISPRs Phylogenetic profiles of CAS genes

(Haft, et al., 2005) CRISPRs within 10 megaplasmids

(Godde, et al., 2006) CRISPRs within two prophage in Clostridium

difficile (Sebaihia, et al., 2006)

Analysis of CRISPRs and GIs had not been conducted previously

46

CRISPRs within GIs

Domain of Life

Number of Genomes

Number of GIs

Proportion of Genome in GIs

Total Number of CRISPRs

Expected CRISPRs in GIs

Observed CRISPRs in GIs

Significance (Chi-square Test)*

Archaea 49 298 3.7% 206 7.7 14 0.020

Bacteria 306 4874 6.4% 837 53.3 114 8.1x 10-18

Archaea & Bacteria

355 5172 6.1% 1043 64.0 128 1.6x 10-16

47

CRISPRs predictions were obtained from CRISPRdb, http://crispr.u-psud.fr/crispr/CRISPRHomePage.php

GI predictions were taken from the union of IslandPick, IslandPath-DIMOB, and SIGI-HMM

Number of CRISPRs inside and outside GIs were compared

CRISPRs are over-represented in GIs

Phage genes within GIs

Many GIs are known to contain phage genes What proportion of GI genes have links to phage? Identified genes with “phage” in their annotation within GIs

48

Genomic Regions

Number of ‘phage genes’Total number of genes in

region

Chi- Square

TestObserved Expected3

Inside GIs1 6990 1264.22 165784~0

Outside GIs1 12868 18593.78 2438303

35% of all ‘phage genes’ are within GIs (6% expected)

Phage genes are over-represented in GIs

Archaea and CRISPRs

Archaea Bacteria

Genomes containing a CRISPR 90% 40%

Proportion of phage genes 0.10% 0.79%

Proportion of GIs with a phage gene 5.1% 17.6%

49

Prevalence of CRISPRs in Archaea genomes could result in reduced

phage genes

GIs with CRISPRs and phage genes

Is there evidence supporting that some CRISPRs are being transferred by phage?

50

Genomic Regions

Number of ‘phage genes’Total number of genes in

region

Chi- Square

TestObserved Expected3

GIs containing CRISPR(s)2 13 4.5 1500

5.7 x 10-5

Outside GIs2 812 820.5 274073

GIs containing CRISPR(s) also contain an over-representation of phage genes -> suggesting that some CRISPRs are transferred by phage

CRISPR conclusions

CRISPR over-representation in GIs suggest that they are being horizontally transferred

Some GIs that contain CRISPRs may have phage origins

CRISPRs in Archaea could be limiting HGT by increasing resistance to phage

51

Conclusions

Several advances in GI computational prediction IslandPick, a novel automated comparative genomics

based GI prediction program Analysis of the accuracy of several sequenced based GI

prediction methods IslandViewer: An integrated interface for computational

identification and visualization of genomic islands

Insights into GI evolution and their pathogenicity P. aeruginosa LES – evidence that genomic islands and

prophage regions contain genes that provide a competitive advantage for infection in a chronic rat infection model.

CRISPRs and their association with genomic islands

52

53

Acknowledgements

SupervisorDr. Fiona Brinkman

Supervisor CommitteeDr. BaillieDr. Pio

P. aeruginosa LESCraig WinstanleyRoger LevesqueBob HancockNick Thomson