1
Preferential deletion events in the direct repeat locus of Mycobacterium tuberculosis 1
2
Running title: Harlingen DR locus 3
4
Anita C. Schürch1,2
, Kristin Kremer1, Albert Kiers
3, Martin J. Boeree
4, Roland J. Siezen
2, and 5
Dick van Soolingen1,4*
6
7
Tuberculosis Reference Laboratory1, National Institute for Public Health and the Environment 8
(RIVM), Centre for Infectious Disease Control (CIb), Laboratory for Infectious Disease and 9
Perinatal Screening, P.O. Box 1, 3720 BA Bilthoven, and 10
Radboud University Nijmegen Medical Centre/ NCMLS2, Centre for Molecular and 11
Biomolecular Informatics, P.O. Box 9101, 6500 HB Nijmegen, and 12
Department of Tuberculosis Control GGD Fryslân3, P.O. Box 601, 8901 BK Leeuwarden, and 13
University Centre for Chronic Diseases, Department of Pulmonary Disease, Department of 14
Medical Microbiology4, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 15
HB Nijmegen, The Netherlands 16
17
*Corresponding author: 18
Prof. Dr. Dick van Soolingen 19
National Mycobacteria Reference Laboratory, National Institute for Public Health and the Environment (RIVM) 20
Centre for Infectious Disease Control, (CIbpSH-D2), Laboratory for Infectious Disease and Perinatal 21
Screening,P.O. box 1, 3720 BA Bilthoven, The Netherlands 22
and Departments of Pulmonary Diseases and Medical Microbiology, Radboud University Nijmegen Medical 23
Centre, Nijmegen, The Netherlands 24
Tel: +31-30-2742363, Fax: +31-30-2744418, E-mail: [email protected] 25
Copyright © 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.J. Clin. Microbiol. doi:10.1128/JCM.01848-10 JCM Accepts, published online ahead of print on 16 February 2011
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
2
Abstract 26
27
The “Harlingen” IS6110 restriction fragment length polymorphism (RFLP) cluster has linked 28
over hundred tuberculosis cases in The Netherlands since 1993. Four Mycobacterium 29
tuberculosis isolates that were epidemiologically linked to this cluster had different 30
spoligotype patterns as well as slightly divergent IS6110 profiles, compared to the majority of 31
the isolates. Sequencing of the direct repeat (DR) locus revealed sequence polymorphisms at 32
the putative deletion sites. These deletion footprints provided evidence for independent 33
deletions of the central region of the DR locus in three isolates, while the different genotype 34
of the fourth isolate was explained by transmission. Our finding suggests that convergent 35
deletions in the DR locus occur frequently. However deletion footprints are not suitable to 36
detect convergent deletions in the DR because they seem to be exceptional. Deletion 37
footprints in the DR have not been described earlier and we did not observe them in any 38
public M. tuberculosis complex sequences. We conclude that preferential deletions in the DR 39
locus of closely related strains are usually an unnoted event that interferes with clustering of 40
closely related strains. 41
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
3
Introduction 42
43
DNA fingerprinting techniques have gained recognition as important epidemiological tools 44
(31) for the analysis of Mycobacterium tuberculosis, the etiological agent of tuberculosis. 45
These techniques detect DNA polymorphisms associated with repetitive or mobile genetic 46
elements in the genome. While it is unclear if fingerprinting of repeats can be applied to infer 47
evolutionary relationships between deep branches (4) they are generally considered to be 48
useful in clustering of genetically closely related isolates (2, 7, 8, 27, 28). IS6110 restriction 49
fragment length polymorphism (RFLP) depends on transposition of the insertion sequence 50
IS6110 in the genome of M. tuberculosis and on mutations of PvuII restriction sites (29). 51
Variable numbers of tandem repeats (VNTR) typing (27) monitors expansion and contraction 52
of stretches of tandem repeats. Spoligotyping is a relatively easy and fast PCR-based method 53
with a good portability between mycobacteriological laboratories, which is used to detect the 54
presence of 43 unique DNA spacer sequences (15) that are interspaced between 36-bp repeats 55
in the direct repeat (DR) locus. This locus is a member of the bacterial clustered regularly 56
interspaced short palindromic repeats (CRISPR) which provides acquired immunity against 57
viruses and plasmids (13). The DR locus is a hot spot region for integration of IS6110 58
insertion sequences (11, 20, 34) and several preferential insertion sites in this genomic region 59
have been identified (22, 34). 60
61
IS6110 RFLP typing identified the so-called Harlingen cluster in The Netherlands, which is, 62
with over a hundred tuberculosis patients identified since 1993, one of the largest tuberculosis 63
clusters disclosed in this country (16, 17, 25, 26). The initial outbreak took place in the 64
harbour town of Harlingen but the strain soon spread through The Netherlands. The bacterial 65
isolates of this cluster exhibited a nearly identical IS6110 RFLP pattern and re-typing by 24-66
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
4
locus VNTR typing did not enhance the resolution of the cluster. Four isolates of patients that 67
were epidemiologically linked to the cluster had spoligotype patterns that were different from 68
the main spoligotype that was observed in all other strains, as well as slightly different IS6110 69
RFLP profiles. Therefore we subjected the DR locus of these four isolates to DNA sequencing 70
and compared the obtained sequences to the sequence of the DR locus of the precursor strains. 71
This data was combined with contact tracing information, IS6110 RFLP and single nucleotide 72
polymorphism (SNP) typing (25), providing evidence for three independent convergent 73
deletions in the DR locus. 74
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
5
Methods 75
76
Contact tracing and M. tuberculosis strains 77
Within the framework of the national surveillance of tuberculosis, all M. tuberculosis complex 78
strains isolated in The Netherlands were subjected to IS6110 RFLP typing from 1993 to 2009. 79
“Clusters” were defined as a group of bacterial isolates with 100% identity of IS6110 RFLP 80
patterns, but occasionally isolates with slightly different IS6110 RFLP patterns were added to 81
a cluster based on feed-back of the municipal health services on confirmed epidemiological 82
links between patients (19). Contact tracing between patients of the Harlingen outbreak was 83
performed according to the stone-in-the-pond principle as reported elsewhere (17, 19, 33). 84
The M. tuberculosis isolates used in this study were part of the Harlingen IS6110 RFLP 85
cluster and were received at the RIVM between 1993 and 2009. DNA was isolated according 86
to the previously published protocol (32). Strains SH1, which was the index case of the 87
Harlingen cluster, and strains SH5 and SH9 were subject to comparative genome sequencing 88
and SNP identification as described in previous studies (25, 26). The mycobacterial isolate 89
strain Harlingen SH-A was previously described as SH71 (25). M. tuberculosis strain H37Rv 90
was used as a control throughout the study. 91
92
Molecular typing 93
24-locus VNTR typing, IS6110 RFLP typing and spoligotyping were carried out according to 94
standardized methods as described earlier (15, 27, 29). Direct-repeat RFLP typing was carried 95
out by hybridization of a membrane containing PvuII restriction fragments used for IS6110 96
RFLP typing, with the 36-bp probe complementary to the DR locus repeats (11, 18). DNA 97
fingerprint patterns were analysed using Bionumerics 6.0 (Applied Maths, Sint-Maartens-98
Latem, Belgium). Eight SNPs that were identified by comparative genome sequencing of M. 99
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
6
tuberculosis strains SH1, SH5 and SH9 were used for SNP typing of the Harlingen cluster as 100
described previously (25) 101
102
Analysis of the DR locus sequence 103
The sequence of the DR locus of the precursor strains was determined with 454 sequencing 104
(Roche, 454 Life Sciences, Brandford, CT, USA) and an assembly of the sequence reads using 105
the Genome Sequencer software, Version 1.1.03. Contigs of the Harlingen isolate SH5 were 106
compared with BLASTN (1) to the assembled sequences of Harlingen isolates SH1 and SH9 107
as described earlier (26) and to the sequence of the DR locus of reference strain H37Rv as 108
present in the NCBI bacterial genomes database. The orientation of IS6110 insertions in the 109
DR region was determined by extracting reads that were adjacent to the IS6110 insertion sites 110
and comparing them to the published 5’ and 3’ ends of the IS6110 sequence using the IS 111
Finder (http://www-is.biotoul.fr/). The DR locus in the Harlingen strain was annotated in 112
Artemis version 12 (23). 113
114
To determine the sequence of the central region of the DR locus of the four Harlingen strains 115
with differing spoligotype patterns, a PCR product was amplified using a primer specific for 116
spacer 18 (sequence: 5’-‘CAGATGGTCCGGGAGGTC-3’) and a primer specific for spacer 117
32 (sequence: 5‘-GGTCTGACGACTTGAACACG-3’) with a standard PCR. Subsequent 118
sequencing was done using the same primers with standard chemistry according to the ABI 119
protocols on an ABI3730xl sequencer (Applied Biosystems, Foster City, CA, USA). The 120
resulting sequences were aligned with CLUSTALW 2.0.7 (3). 121
122
The sequences of spacers 1 to 43 (spacer numbering and DNA sequences according to van 123
Embden et al. (30)) were compared with BLASTN to the sequence of the DR locus of all M. 124
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
7
tuberculosis complex strains present in the NCBI nucleotide database on July 7, 2010, 125
including all fully sequenced genomes (strains M. tuberculosis KZN1435, H37Rv, H37Ra, 126
CDC1551, F11, Mycobacterium bovis AF2122/97, BCG strain Tokyo 172, BCG strain 127
Pasteur 1173P2) and the sequences previously published by Warren et al. (34), van Embden et 128
al. (30), Groenen et al. (10), Hermans et al. (11) and Fang et al. (7). 129
130
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
8
Results 131
132
Polymorphisms in the DR locus among Harlingen isolates 133
IS6110 RFLP typing of the M. tuberculosis isolates of the Harlingen cluster resulted for the 134
majority of cases in identical patterns consisting of 12 bands (represented by the isolate of the 135
index case of the cluster (strain SH1) and that of patient H9 (strain SH9) in Figure 1). 136
Previously, slightly different IS6110 RFLP patterns had been added to the cluster because the 137
respective cases had confirmed epidemiological links with patients in the cluster, as discerned 138
by contact tracing (33) (Figure 1). Some isolates in the cluster had one additional IS6110 band 139
in the RFLP pattern compared to the dominant pattern (exemplified by SH5, Figure 1). The 140
identification of the insertion site revealed a transposition of an IS6110 element in a putative 141
gene (26). Three isolates exhibited an IS6110 RFLP pattern with two missing bands (isolates 142
SH-B, SH-C, SH-D) and another isolate had an IS6110 RFLP pattern with the same two bands 143
missing and one additional band (SH-A, Figure 1). 144
145
All, except four, isolates of the Harlingen cluster exhibited one characteristic spoligotype 146
pattern, i.e. the precursor spoligotype, with spacers 1-19, 21-32, and 37-43 present (Figure 1, 147
exemplified by strains SH1, SH5 and SH9). The four isolates with the different spoligotype 148
patterns were the same isolates missing two bands in their IS6110 RFLP patterns. Their 149
spoligotype patterns differed from the dominant precursor spoligotype pattern by the lack of 150
spacers 21-25 (isolates SH-A and SH-B) and 19-25 (isolates SH-C and SH-D), respectively. 151
Three of these isolates (SH-A, SH-C and SH-D) were isolated from patients with 152
pulmonary disease that were treated by the standard regimen. Strain SH-B was isolated 153
from a patient that had been non-compliant to the prophylactic treatment with isoniazid, 154
but, on outbreak of the disease, followed standard treatment. It was unclear whether 155
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
9
there is an association between the genotype of isolate SH-B and the non-compliance of 156
the patient of which the isolate was derived from. RFLP typing with the DR probe of 157
strains with the dominant Harlingen IS6110 RFLP pattern revealed that the two IS6110 copies 158
that were missing in the four exceptional strains were located in the DR region (data not 159
shown). This finding explains the association between the differences in IS6110 RFLP and 160
spoligotype patterns. 161
162
Among the four strains with a divergent spoligotype, two different SNP types were observed. 163
Strains SH-B, SH-C and SH-D exhibited the same SNP type, which was identical to the SNP 164
type of the isolate of index case (SH1). Strain SH-A showed a SNP type with four 165
polymorphic positions, identical to the SNP type of strain SH5. 166
167
Contact tracing had identified an epidemiological link between the two patients from whom 168
strains SH-C and SH-D were isolated. Thus, the strain was presumably transmitted from 169
patient H-C to patient H-D after the deletion of spacers 19-25 in SH-C. In contrast, contact 170
between the patients from whom strains SH-A and SH-B were isolated was unlikely. 171
Moreover, besides a one-band difference in the IS6110 RFLP pattern, these isolates exhibited 172
different SNP typing patterns as determined in a previous study (26). A single IS6110 173
transposition could be suggested to explain the differing RFLP types; however, the different 174
SNP types confirmed the absence of an epidemiological link between SH-A and SH-B (25). 175
Given the identical IS6110 RFLP patterns of isolates SH-C, SH-D and SH-B a link between 176
these three patients could be suggested. Because isolate SH-B was much more recent than SH-177
C and SH-D, this would require restoration or re-acquisition of spacer 19 in a predecessor of 178
isolate SH-B in order to accomplish the more complete spoligotype pattern of SH-B in 179
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
10
comparison to SH-C and SH-D. Thus, the combination of molecular and epidemiological data 180
suggested three independent deletions in the DR region. 181
182
The DR region of strains with the dominant Harlingen spoligotype (precursor spoligotype) 183
was reconstructed by using previously obtained genome sequencing data of Harlingen isolates 184
SH1, SH5 and SH9 (26). No sequence differences were identified in the partially assembled 185
DR loci of these strains. The DR locus of these Harlingen isolates with the precursor 186
spoligotype contained two inversely oriented IS6110 elements, one between spacers 19 and 21 187
and the second between spacers 24 and 25 (Figure 2). Spacer 20 was interrupted by one of the 188
IS6110 elements, and is therefore not visible in the spoligotype pattern. The non-amplification 189
of a spacer because of the disruption by an IS6110 element was previously observed in other 190
M. tuberculosis isolates (9, 20, 22). 191
192
Deletion footprints in the DR region 193
The sequences of the central region of the DR locus of the four strains with exceptional 194
spoligotype patterns were compared to the sequence of the DR locus of strains with the 195
precursor spoligotype. The deletion of spacers 21 to 25 in isolates SH-A and SH-B, and of 196
spacers 19 to 25 in isolates SH-C and SH-D included the deletion of the two IS6110 elements 197
from the DR locus in all four strains (Figure 2). However, while the PCR products amplified 198
with primers specific for spacer 32 and spacer 18 were of the same size for isolates SH-A and 199
SH-B (both 610 bases) and also for isolates SH-C and SH-D (both 538 bases), respectively, 200
sequencing and multiple alignment revealed that the sequences of SH-A and SH-B contained 201
a three basepair (3-bp) polymorphism at the 3’ end of spacer 19 (Figure 2B). The 3-bp found 202
in SH-A were identified as remnants of spacer 25, while the 3-bp in SH-B were those usually 203
found at the 3’end of spacer 19. These sequence polymorphisms, together with the 204
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
11
information of the other fingerprinting results, confirmed that the deletion of the central region 205
of the DR locus occurred independently in isolates SH-A, SH-B and SH-C and represents 206
convergent evolution. A schematic evolutionary scenario for the Harlingen strains studied 207
here is summarized in Figure 3. 208
209
In order to estimate the frequency of such deletion footprints in the DR locus, we compared 210
the complete sequences of 43 spacers of the DR locus (as published in (30)) to the sequences 211
of the DR loci of all M. tuberculosis complex species available in the public database. All 212
spacers were well-covered by sequences in the public database, with up to 41 hits. One entry 213
(accession number AF504309) contained a 1-bp deletion at the 3’ end of spacer 13, but neither 214
the spoligotype nor the complete DR locus sequence of the respective isolate (34) revealed a 215
deletion of the flanking regions of spacer 13. No polymorphisms at the 3’ or 5’ ends of the 216
other spacers were observed in any of the other M. tuberculosis complex entries. 217
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
12
Discussion 218
219
In this study we describe convergent evolution of the DR locus in very closely related M. 220
tuberculosis isolates of the Harlingen cluster. The central region of the DR, including two 221
IS6110 elements, was deleted independently on three occasions. The spoligotype and IS6110 222
RFLP profiles of the affected strains were different from those of the precursor strains. On one 223
occasion a unique spoligotype was obtained, characterised by the deletion of spacers 19 to 25. 224
On two occasions an identical spoligotype was obtained with spacers 20 to 25 deleted. These 225
two deletion events could be discriminated by a 3-bp polymorphism at the deletion site. 226
227
The occurrence of such polymorphisms (deletion footprints) in the DR region seems to be 228
highly infrequent. The DR locus is one of the most-sequenced genomic regions of the M. 229
tuberculosis complex with a plethora of publications. In general, virtually no interstrain 230
variation is observed in the sequences of the spacers (24, 30). Our own investigation of 43 231
spacers in the DR locus of the completed genomes of the M. tuberculosis complex species and 232
other previously published DR sequences (7, 10, 11, 30, 34) that contained a wide range of 233
deletions, did not reveal any putative deletion footprints, although it is unlikely that the 234
spacers deleted from these strains resulted from identical deletion events. We therefore 235
suggest that potential preferential deletions in the DR locus of closely related strains will 236
generally not be identified by sequencing of this locus. Nevertheless, combining spoligotype 237
results with other fingerprinting methods, such as VNTR typing (27), and methods that are 238
independent of repeat elements, such as multilocus sequence typing (12) or typing with strain-239
specific SNPs that were identified by comparative whole genome sequence analysis (25) 240
might identify such events. 241
242
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
13
The main mechanisms that have been suggested to cause the loss of spacers in the CRISPR 243
loci of bacteria are deletions by homologous recombination between DRs (6, 7, 13, 14) or 244
slippage during DNA replication (14). In addition, the DR locus of M. tuberculosis is known 245
as a hotspot region for IS6110 insertions (11, 20, 34). The transposition of IS6110 elements 246
can cause unequal interruption of the spacers which can lead to absence of spacer 247
hybridization (9, 20, 22, 34). Moreover, the presence of two or more IS6110 elements in the 248
DR locus could lead to homologous recombination of IS6110 elements (24). Nevertheless, the 249
preferential deletions of the central DR locus in the Harlingen strains in this study are not 250
likely to be caused by homologous recombination of the IS6110 elements. Firstly, the 251
deletions include the flanking regions of both IS6110 elements, and secondly, the IS6110 252
elements present in the Harlingen DR locus were inversely oriented which rules out RecA-253
mediated recombination, at least in classical understanding (21, 24). It is more likely that the 254
deletions are the result of the recombination of the DR that is flanking spacer 25 and 26 with 255
the DR between spacers 19 and 18 in the precursor of strain SH-C (and spacers 20 and 19 in 256
the precursors of SH-A and SH-B, respectively). In addition, the observed 3-bp-polymorphism 257
also points to a recombination of the DR rather than of the IS6110 elements and can be 258
explained by an unequal recombination of the flanking 3bp. Alternatively, the 3-bp 259
polymorphism might be a remnant of the insertion of a third IS6110 element that generated a 260
3-bp duplication (5) in a putative predecessor strain. Though the deletions are probably caused 261
by recombination of two DRs, the presence and orientation of the two IS6110 elements in the 262
DR locus might have promoted the deletion events, possibly by enabling a secondary structure 263
that placed the recombining DRs in close vicinity of each other. The precise mechanisms 264
remain unresolved and need further investigation. 265
266
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
14
IS6110 RFLP typing and spoligotyping methods are widely accepted as useful means to group 267
closely related M. tuberculosis complex strains together (2, 7, 8, 31). However, as shown for 268
the Harlingen cluster the occurrence of preferential deletions can result in false clustering of 269
isolates. The specific architecture of the Harlingen DR locus, containing two IS6110 elements, 270
probably promoted the preferential deletions observed in the three strains. If preferential 271
deletions of spacers in the DR locus are characteristic for certain compositions of the DR 272
locus, they would not be identified as independent events by spoligotyping, and, if they 273
include IS6110 elements, neither by IS6110 RFLP fingerprinting. For unambiguous clustering 274
of closely related isolates, application of a second, independent typing method might be 275
necessary. 276
277
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
15
Acknowledgment 278
The staff of the Tuberculosis Reference Laboratory at the National Institute for Public Health 279
and the Environment (RIVM), Bilthoven, The Netherlands and Rieneke Buitenhuis are 280
gratefully acknowledged for technical assistance. This work was funded by the Strategic 281
Research fund of the RIVM and the TBadapt project (LSHp-CT-2007-037919).282
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
16
References 283
284
285
1. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. 286
Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search 287
programs. Nucl. Acids Res. 25:3389-3402. 288
2. Brudey, K., J. R. Driscoll, L. Rigouts, W. M. Prodinger, A. Gori, S. A. Al-Hajoj, C. Allix, 289
L. Aristimuno, J. Arora, V. Baumanis, L. Binder, P. Cafrune, A. Cataldi, S. Cheong, R. 290
Diel, C. Ellermeier, J. T. Evans, M. Fauville-Dufaux, S. Ferdinand, D. Garcia de Viedma, 291
C. Garzelli, L. Gazzola, H. M. Gomes, M. C. Guttierez, P. M. Hawkey, P. D. van Helden, 292
G. V. Kadival, B. N. Kreiswirth, K. Kremer, M. Kubin, S. P. Kulkarni, B. Liens, T. 293
Lillebaek, M. L. Ho, C. Martin, C. Martin, I. Mokrousov, O. Narvskaia, Y. F. Ngeow, L. 294
Naumann, S. Niemann, I. Parwati, Z. Rahim, V. Rasolofo-Razanamparany, T. 295
Rasolonavalona, M. L. Rossetti, S. Rusch-Gerdes, A. Sajduda, S. Samper, I. G. 296
Shemyakin, U. B. Singh, A. Somoskovi, R. A. Skuce, D. van Soolingen, E. M. Streicher, P. 297
N. Suffys, E. Tortoli, T. Tracevska, V. Vincent, T. C. Victor, R. M. Warren, S. F. Yap, K. 298 Zaman, F. Portaels, N. Rastogi, and C. Sola. 2006. Mycobacterium tuberculosis complex 299
genetic diversity: mining the fourth international spoligotyping database (SpolDB4) for 300
classification, population genetics and epidemiology. BMC Microbiol 6:23. 301
3. Chenna, R., H. Sugawara, T. Koike, R. Lopez, T. J. Gibson, D. G. Higgins, and J. D. 302
Thompson. 2003. Multiple sequence alignment with the Clustal series of programs. Nucleic 303
Acids Res 31:3497-500. 304
4. Comas, I., S. Homolka, S. Niemann, and S. Gagneux. 2009. Genotyping of genetically 305
monomorphic bacteria: DNA sequencing in Mycobacterium tuberculosis highlights the 306
limitations of current methodologies. PLoS One 4:e7815. 307
5. Dale, J. W. 1995. Mobile genetic elements in mycobacteria. Eur Respir J Suppl 20:633s-648s. 308
6. Deveau, H., R. Barrangou, J. E. Garneau, J. Labonte, C. Fremaux, P. Boyaval, D. A. 309
Romero, P. Horvath, and S. Moineau. 2008. Phage response to CRISPR-encoded resistance 310
in Streptococcus thermophilus. J Bacteriol 190:1390-400. 311
7. Fang, Z., N. Morrison, B. Watt, C. Doig, and K. J. Forbes. 1998. IS6110 transposition and 312
evolutionary scenario of the direct repeat locus in a group of closely related Mycobacterium 313
tuberculosis strains. J Bacteriol 180:2102 - 2109. 314
8. Filliol, I., A. S. Motiwala, M. Cavatore, W. Qi, M. H. Hazbon, M. Bobadilla del Valle, J. 315
Fyfe, L. Garcia-Garcia, N. Rastogi, C. Sola, T. Zozio, M. I. Guerrero, C. I. Leon, J. 316
Crabtree, S. Angiuoli, K. D. Eisenach, R. Durmaz, M. L. Joloba, A. Rendon, J. Sifuentes-317
Osornio, A. Ponce de Leon, M. D. Cave, R. Fleischmann, T. S. Whittam, and D. Alland. 318 2006. Global phylogeny of Mycobacterium tuberculosis based on single nucleotide 319
polymorphism (SNP) analysis: insights into tuberculosis evolution, phylogenetic accuracy of 320
other DNA fingerprinting systems, and recommendations for a minimal standard SNP set. J 321
Bacteriol 188:759-72. 322
9. Filliol, I., C. Sola, and N. Rastogi. 2000. Detection of a previously unamplified spacer within 323
the DR locus of Mycobacterium tuberculosis: epidemiological implications. J Clin Microbiol 324
38:1231-4. 325
10. Groenen, P. M., A. E. Bunschoten, D. van Soolingen, and J. D. van Embden. 1993. Nature 326
of DNA polymorphism in the direct repeat cluster of Mycobacterium tuberculosis; application 327
for strain differentiation by a novel typing method. Mol Microbiol 10:1057-65. 328
11. Hermans, P. W., D. van Soolingen, E. M. Bik, P. E. de Haas, J. W. Dale, and J. D. van 329
Embden. 1991. Insertion element IS 987 from Mycobacterium bovis BCG is located in a hot-330
spot integration region for insertion elements in Mycobacterium tuberculosis complex strains. 331
Infect Immun 59:2695 - 2705. 332
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
17
12. Hershberg, R., M. Lipatov, P. M. Small, H. Sheffer, S. Niemann, S. Homolka, J. C. 333
Roach, K. Kremer, D. A. Petrov, M. W. Feldman, and S. Gagneux. 2008. High functional 334
diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLoS 335
Biol 6:e311. 336
13. Horvath, P., and R. Barrangou. 2010. CRISPR/Cas, the immune system of bacteria and 337
archaea. Science 327:167-70. 338
14. Jansen, R., J. D. Embden, W. Gaastra, and L. M. Schouls. 2002. Identification of genes 339
that are associated with DNA repeats in prokaryotes. Mol Microbiol 43:1565-75. 340
15. Kamerbeek, J., L. Schouls, A. Kolk, M. van Agterveld, D. van Soolingen, S. Kuijper, A. 341
Bunschoten, H. Molhuizen, R. Shaw, M. Goyal, and J. van Embden. 1997. Simultaneous 342
detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and 343
epidemiology. J Clin Microbiol 35:907-14. 344
16. Kiers, A., A. P. Drost, D. van Soolingen, and J. Veen. 1996. [Border-crossing source tracing 345
in tuberculosis via DNA fingerprint technique]. Ned Tijdschr Geneeskd 140:2290-3. 346
17. Kiers, A., A. P. Drost, D. van Soolingen, and J. Veen. 1997. Use of DNA fingerprinting in 347
international source case finding during a large outbreak of tuberculosis in The Netherlands. 348
The International Journal of Tuberculosis and Lung Disease 1:239-245(7). 349
18. Kremer, K., D. van Soolingen, R. Frothingham, W. H. Haas, P. W. M. Hermans, C. 350
Martin, P. Palittapongarnpim, B. B. Plikaytis, L. W. Riley, M. A. Yakrus, J. M. Musser, 351 and J. D. A. van Embden. 1999. Comparison of Methods Based on Different Molecular 352
Epidemiological Markers for Typing of Mycobacterium tuberculosis Complex Strains: 353
Interlaboratory Study of Discriminatory Power and Reproducibility. J. Clin. Microbiol. 354
37:2607-2618. 355
19. Lambregts-van Weezenbeek, C. S., M. M. Sebek, P. J. van Gerven, G. de Vries, S. 356
Verver, N. A. Kalisvaart, and D. van Soolingen. 2003. Tuberculosis contact investigation 357
and DNA fingerprint surveillance in The Netherlands: 6 years' experience with nation-wide 358
cluster feedback and cluster monitoring. Int J Tuberc Lung Dis 7:S463-70. 359
20. Legrand, E., I. Filliol, C. Sola, and N. Rastogi. 2001. Use of spoligotyping to study the 360
evolution of the direct repeat locus by IS6110 transposition in Mycobacterium tuberculosis. J 361
Clin Microbiol 39:1595-9. 362
21. Lloyd, R. G., and G. J. Sharples. 1992. Genetic analysis of recombination in prokaryotes. 363
Curr Opin Genet Dev 2:683-90. 364
22. Mokrousov, I., O. Narvskaya, E. Limeschenko, T. Otten, and B. Vyshnevskiy. 2002. 365
Novel IS6110 insertion sites in the direct repeat locus of Mycobacterium tuberculosis clinical 366
strains from the St. Petersburg area of Russia and evolutionary and epidemiological 367
considerations. J Clin Microbiol 40:1504-7. 368
23. Rutherford, K., J. Parkhill, J. Crook, T. Horsnell, P. Rice, M. A. Rajandream, and B. 369
Barrell. 2000. Artemis: sequence visualization and annotation. Bioinformatics 16:944-5. 370
24. Sampson, S. L., R. M. Warren, M. Richardson, T. C. Victor, A. M. Jordaan, G. D. van 371
der Spuy, and P. D. van Helden. 2003. IS6110-Mediated Deletion Polymorphism in the 372
Direct Repeat Region of Clinical Isolates of Mycobacterium tuberculosis. J. Bacteriol. 373
185:2856-2866. 374
25. Schürch, A. C., K. Kremer, O. Daviena, A. Kiers, M. J. Boeree, R. J. Siezen, and D. van 375
Soolingen. 2010. High-resolution typing by integration of genome sequencing data in a large 376
tuberculosis cluster. J Clin Microbiol. 48: 3403-3406. 377
26. Schürch, A. C., K. Kremer, A. Kiers, O. Daviena, M. J. Boeree, R. J. Siezen, N. H. Smith, 378
and D. van Soolingen. 2010. The tempo and mode of molecular evolution of Mycobacterium 379
tuberculosis at patient-to-patient scale. Infect Genet Evol 10:108-14. 380
27. Supply, P., C. Allix, S. Lesjean, M. Cardoso-Oelemann, S. Rusch-Gerdes, E. Willery, E. 381
Savine, P. de Haas, H. van Deutekom, S. Roring, P. Bifani, N. Kurepina, B. Kreiswirth, 382
C. Sola, N. Rastogi, V. Vatin, M. C. Gutierrez, M. Fauville, S. Niemann, R. Skuce, K. 383 Kremer, C. Locht, and D. van Soolingen. 2006. Proposal for Standardization of Optimized 384
Mycobacterial Interspersed Repetitive Unit-Variable-Number Tandem Repeat Typing of 385
Mycobacterium tuberculosis. J. Clin. Microbiol. 44:4498-4510. 386
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
18
28. van Deutekom, H., P. Supply, P. E. de Haas, E. Willery, S. P. Hoijng, C. Locht, R. A. 387
Coutinho, and D. van Soolingen. 2005. Molecular typing of Mycobacterium tuberculosis by 388
mycobacterial interspersed repetitive unit-variable-number tandem repeat analysis, a more 389
accurate method for identifying epidemiological links between patients with tuberculosis. J 390
Clin Microbiol 43:4473-9. 391
29. van Embden, J. D., M. D. Cave, J. T. Crawford, J. W. Dale, K. D. Eisenach, B. Gicquel, 392
P. Hermans, C. Martin, R. McAdam, T. M. Shinnick, and et al. 1993. Strain identification 393
of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized 394
methodology. J Clin Microbiol 31:406-9. 395
30. van Embden, J. D., T. van Gorkom, K. Kremer, R. Jansen, B. A. van Der Zeijst, and L. 396
M. Schouls. 2000. Genetic variation and evolutionary origin of the direct repeat locus of 397
Mycobacterium tuberculosis complex bacteria. J Bacteriol 182:2393-401. 398
31. Van Soolingen, D. 2001. Molecular epidemiology of tuberculosis and other mycobacterial 399
infections: main methodologies and achievements. Journal of Internal Medicine 249:1-26. 400
32. Van Soolingen, D., P. De Haas, and K. Kremer. 2001. Restriction fragment length 401
polymorphism typing of mycobacteria. In: Mycobacterium tuberculosis protocols, eds. T. 402
Parish and N.G. Stoker. Humana Press Inc., Totowa NJ:165-203. 403
33. Veen, J. 1992. Microepidemics of tuberculosis: the stone-in-the-pond principle. Tuber Lung 404
Dis 73:73-6. 405
34. Warren, R. M., E. M. Streicher, S. L. Sampson, G. D. Spuy, M. Richardson, D. Nguyen, 406
M. A. Behr, T. C. Victor, and P. D. van Helden. 2002. Microevolution of the direct repeat 407
region of Mycobacterium tuberculosis: implications for interpretation of spoligotyping data. J 408
Clin Microbiol 40:4457 - 4465. 409
410
411
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
19
Figure legends 412
413
Figure 1. Spoligotyping, IS6110 restriction fragment polymorphism (RFLP) typing and 414
single-nucleotide polymorphism (SNP) typing results of six Mycobacterium tuberculosis 415
isolates of the Harlingen cluster. An epidemiological link was identified between the patients 416
from whom strains SH-C and SH-D were isolated but not between the patients of isolates SH-417
A and SH-B. SH1, SH9 and SH5 are Harlingen isolates that were subjects of two earlier 418
studies (25, 26) and are represented here for comparison. Strain SH1 was isolated from the 419
index case of the Harlingen cluster. 420
421
Figure 2. Schematic representation of the organization of the central region of the direct 422
repeat (DR) locus of Mycobacterium tuberculosis strains of the Harlingen cluster. Blocks 423
indicated with DR represent the 36 bp direct repeats and numbers represent the unique spacer 424
sequences. The annealing positions of the primers used in the study are represented by an 425
arrow. PvuII restriction sites are indicated on the boxes that represent the IS6110 elements. 426
The three nucleotides at the 5’ end of spacer 19 that differed between strains SH-A and SH-B 427
are shown. The patients of which strains SH-C and SH-D were isolated from were linked by 428
epidemiological contact tracing. 429
430
Figure 3. Schematic representation of the most likely evolutionary scenario of 431
Mycobacterium tuberculosis isolates of the Harlingen cluster described in this study. 432
433
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
IS6110DR DRDRDR DRDR IS6110DR DR DR
19 20a 20b 21 22 23 24 2518
DR DRDR DR
19 2618 32
DR
27
DR
26
...GTG...
SH-C/SH-D
precursor strain
SH-A
SH-B
DR DR DR
28 29 30 31
DR DRDR DR
19 2618 32
DR
27
...TCA...
DR DR DR
28 29 30 31
DR DRDR
2618 32
DR
27
DR DR DR
28 29 30 31
PvuIIPvuII
DRDR
27 28 29
...TCA...
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from
precursor strain
transposition of IS6110
(additonal band)
and SNPs 1-4
deletion of
central region of DR
(spacers 20-25)
deletion of
central region of DR
(spacers 20-25)
deletion of
central region of DR
(spacers 19-25)
SH-A
SH-B
SH-C SH-D
SH5
SNP 5-8 SH9
(SH1)
on March 20, 2020 by guest
http://jcm.asm
.org/D
ownloaded from