+ All Categories
Home > Documents > The clusters, based on 16S rRNA identities and compared with...

The clusters, based on 16S rRNA identities and compared with...

Date post: 02-Oct-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
47
The clusters, based on 16S rRNA identities and compared with genome metrics: in silico DNA-DNA hybridization (isDDH), Mash-similarity, average nucleotide identity (ANI), conserved marker genes (CGS) Contents A: Methods.....................................................................................................................1 B: 16S rRNA sequence identities and genomic identities............................................. 5 B1: Clusters for which metrics are not congruent..................................................... 5 B1a: the Picocyanobacteria...................................................................................5 B1a1: the OMF clade....................................................................................... 5 B1a2: Prochlorococcus.................................................................................... 8 B1b: Synechocystis spp.......................................................................................10 B1c: the halophiles............................................................................................. 12 B1d: Sphaerospermopsis.................................................................................... 13 B1e: Nostoc spp. (part)....................................................................................... 13 B1f: Calothrix spp. (part)................................................................................... 15 B1g: individual strains........................................................................................ 15 B1g1: Leptolyngbya sp. PCC 7376................................................................ 15 B1g2: Gloeobacter kilaueensis JS1............................................................... 16 B2: Borderline cases................................................................................................16 B2a: Desikacharya......................................................................................... 16 B2b: Chlorogloeopsis.....................................................................................17 B2c: Microcystis.............................................................................................17 B2d: Crocosphaera (part).............................................................................. 18 B2e: Pseudanabaena......................................................................................19 B2f: "Leptolyngbya" (part).............................................................................19 B2g: Thermosynechococcus...........................................................................20 B3: Most clusters are supported by all metrics....................................................... 20 C: Conclusions............................................................................................................. 41 D: References............................................................................................................... 43 Note for users: you may return here by clicking the "R " link below each subject heading. A: Methods. Although genome sequences are still rather few for cyanobacteria, and many draft genomes do not contain 16S rRNA genes, we can correlate data derived from four methods of genome analysis with 16S rRNA sequence identities within 16S rRNA-based clusters which contain more than one genome sequence. We briefly describe some of the results below. The following 16S rRNA identity cutoff values were employed to define genera and species: members of the same species show greater than 98.70 %-99.0 % 16S rRNA sequence identity; members of a single genus share greater than 96.50-96.90 % 16S rRNA sequence
Transcript
Page 1: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

The clusters, based on 16S rRNA identities and compared with genome metrics:

in silico DNA-DNA hybridization (isDDH), Mash-similarity, average nucleotide identity (ANI), conserved marker genes (CGS)

ContentsA: Methods.....................................................................................................................1B: 16S rRNA sequence identities and genomic identities.............................................5

B1: Clusters for which metrics are not congruent.....................................................5B1a: the Picocyanobacteria...................................................................................5

B1a1: the OMF clade.......................................................................................5B1a2: Prochlorococcus....................................................................................8

B1b: Synechocystis spp.......................................................................................10B1c: the halophiles.............................................................................................12B1d: Sphaerospermopsis....................................................................................13B1e: Nostoc spp. (part).......................................................................................13B1f: Calothrix spp. (part)...................................................................................15B1g: individual strains........................................................................................15

B1g1: Leptolyngbya sp. PCC 7376................................................................15B1g2: Gloeobacter kilaueensis JS1...............................................................16

B2: Borderline cases................................................................................................16B2a: Desikacharya.........................................................................................16B2b: Chlorogloeopsis.....................................................................................17B2c: Microcystis.............................................................................................17B2d: Crocosphaera (part)..............................................................................18B2e: Pseudanabaena......................................................................................19B2f: "Leptolyngbya" (part).............................................................................19B2g: Thermosynechococcus...........................................................................20

B3: Most clusters are supported by all metrics.......................................................20C: Conclusions.............................................................................................................41D: References...............................................................................................................43

Note for users: you may return here by clicking the "R" link below each subject heading.

A: Methods.

Although genome sequences are still rather few for cyanobacteria, and many draft genomes do not contain 16S rRNA genes, we can correlate data derived from four methods of genome analysis with 16S rRNA sequence identities within 16S rRNA-based clusters which contain more than one genome sequence. We briefly describe some of the results below.

The following 16S rRNA identity cutoff values were employed to define genera and species: members of the same species show greater than 98.70 %-99.0 % 16S rRNA sequence identity; members of a single genus share greater than 96.50-96.90 % 16S rRNA sequence

Page 2: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

identity. The latter value is higher than that (94.5%) proposed by Yarza et al. (2014) from a detailed study of many bacterial phyla; however, the latter study seemingly included few cyanobacteria, since it was based only on the type strains of bacteria and archaea validly published under the rules of the ICNP. Inter-specific values lie between the lower limit of the intra-specific range and the upper limit of the genus cutoff.

For genome sequences, isDDH values were calculated with the DSMZ GG DC 2.1 server . GGDC uses 3 formulae for calculation of DDH, of which we employ formula 2 (sum of identities in HSPs/total HSP length) for incomplete (draft) genomes and formula 3 (sum of identities in HSPs/total genome length) for completed genomes; the latter is equivalent to a wet-lab hybridization, except that plasmid sequences that are not integrated into the chromosome are absent. Unless otherwise stated, the values calculated by formula 2 are given, since there are relatively few (about 10% of the total) complete genomic sequences, required by formula 3. However, formula 2 may, especially around the cutoff values, give results that are higher than the true value, since only the positive identities are given, in the absence of correction for genome size. The accepted cutoff value for species distinction by wet-lab DDH is an arbitrary 70% (Wayne et al, 1987) and is employed by the isDDH method. However, the 70% value was derived from 16S rRNA-based phylogenies of the type species of a broad range of bacterial phyla, with the Cyanobacteria largely neglected; there is no proofthat this is a good discriminatory value in individual phyla.

Data from wet-lab DDH studies are included where available; if thermal elution of the hybridization products from hydroxyapatite was employed, two values (% relative binding and ΔTm(e)) are obligatory, where a shift of 1-2.2% °C in the mean elution temperature (ΔTm(e)) represents a 1% base mismatch in the hybrid (see Stackebrandt & Goebel, 1994).

Where appropriate, genomic sequences were further studied in CheckM for degree of completeness and contamination (Parks et al., 2015). The JSpecies web server was employed for ANI (Average Nucleotide Identity) values; the inconvenience of uploading large genomes to the server can be overcome by copying the genomes of interest to a local folder, then zipping them into a single file with the command "zip -r cyano.zip ." (note that the space and dot in the expression are mandatory; the name "cyano" is only an example) from within the folder, then uploading the zipfile to the server. The input genomes are deflated by 70% and therefore upload more rapidly. Other available ANI calculation methods include fastANI, which uses Mash in place of BLAST (Jain et al., 2018) and OAU, which employs USEARCH in place of BLAST. Since both identity values and length of homologous fragments differ slightly with different methods (see also Palmer et al., 2020), it is wise to use the same method throughout. The accepted cutoff value for species demarcation with ANI is around 96% (Richter and Rosselló-Móra, 2009), based largely on the 70% DDH value. However, the ANI method evaluates only the similarity of elements shared between two genomes, does not yield information for regions that are not shared, and is not corrected for total genome size. In consequence the results, like those of formula 2 of the GGDC for isDDH, may be artifactually high.

Neither DDH nor ANI are normally used to distinguish taxa at the inter-specific level. being usually employed to decide only if two taxa can be assigned to the same species. We show below that both appear to efficiently discriminate at the inter-specific level in most comparisons, even if they rarely give conflicting results for the reasons described above. Further studies involved the use of (a) the alignment-free Mash (Ondov et al., 2016) and (b)

Page 3: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

our set of 79 conserved marker genes (CGS), which clearly does not measure the similarities of the entire genomes. Both of these are described on the Main page of this site. The Mash measurements are presented as distance matrices by default, and are presented here as uncorrected similarity values. For the CGS estimates, we also employ the uncorrected values throughout. These were obtained by building a tree without applying a correction, then converting the finished genomic tree to a distance matrix by using cophenetic.phylo of the R software package ape. For both types of matrix, a single custom script was then employed to extract the distance value for pairwise combinations of taxa, converting this to a similarity value. We have attempted to correlate the results obtained by these different methods in the following figure, where pairwise comparisons of taxa assigned to the same species on the basis of 16S rRNA sequence identity are shown as blue symbols, those assigned to different species of a single genus are represented by red symbols and separate genera by green. Draft genomes and metagenomes have been excluded from these calculations. The same dataset was employed for all methods. We emphasize that the 16S rRNA gene sequences arethose extracted (by our custom shell script) from the genomes of interest, and not those obtained by normal PCR amplifications, to avoid any errors. Overall, the results confirm the value of our 16S rRNA-based cutoff values, since the three categories of taxa are clearly separated:

However, the isDDH values calculated with formula 3 (top left) fail to discriminate between taxa of different genera. Using our set of 79 conserved marker proteins (CGS, bottom right) greatly increases the sensitivity for taxa of different genera, but compresses the intra-specific taxa. The results obtained from analysis of genome similarity with Mash (top right) and ANI (bottom left) appear to overcome all biases of the other methods and correlate reasonably well with the 16S rRNA identity estimates. We stress the use of only complete genomes in performing these analyses; draft genomes by definition may contain duplicated regions, or lack segments, thus obscuring the results. This can be easily demonstrated by artificially duplicating or deleting a segment of a draft genome; if the isDDH value obtained with the

Page 4: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

genome falls just above or below the inter-species or generic cutoff value, the value obtained with the artificial constructs gives an incorrect position. The minimum and maximum percentage similarity values obtained from our dataset with each method are summarized in the following table.

In addition to isDDH with formula 3 (f3), Mash-distance, ANI and CGS, the table contains the isDDH ranges obtained with the same dataset using formula 2 (f2). Notice that the isDDH results obtained with the two formulae are not identical, and that there is a marked overlap of the formula 2 values around the level of generic separation, indicated in red in the table. This is clearly shown in the following figure (left panel).

We would have expected similar results for isDDH values calculated with the two formulae; that this is not the case is shown in their plot in the figure (right panel), where extensive scatter is evident. This is the result of the different corrections applied in each formula. For this reason, we give preference to the formula 3 results (calculated as total identities divided by genome size) for complete genomes throughout this document, considering that the formula 2 estimates are only approximate. We are forced to use the latter for draft genomes, but results obtained by all other methods should be considered as having priority.

We report Mash similarity data only for complete genomes since this method searches for kmers and is unlikely to be valid for draft genomes, where kmers are possibly broken at the end of contigs. In contrast, we show CGS results for all genomes. These data are provided only when the DDH and ANI estimates conflict, or where the isDDH formula 2 value falls into the red zone of the table.

For many taxa we calculated the number of repeat sequences in the genome with RepSeek with a seed length of 25.

Page 5: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Members of some genera (as defined by 16S rRNA sequence identity) do not fit into the above scheme. These, described below in Section B1, include the following:

• Picocyanobacteria: Prochlorococcus and the Synechococcus OMF sister group. • the halophiles: Euhalothece natronophila strain Z-M001 and Halothece sp. PCC 7418

of genus 5.2.4.3, and possibly Dactylococcopsis salina PCC 8305 (genus 5.2.4.1).• Synechocystis spp.• individual strains: Leptolyngbya sp. PCC 7376 and Gloeobacter kilaueensis JS1.

B: 16S rRNA sequence identities and genomic identities

Note that we are unable to make this comparison for many incomplete genomes or those which lack, or contain only short fragments of, the 16S rRNA gene.

B1: Clusters for which metrics are not congruent

B1a: the Picocyanobacteria

B1a1: the OMF clade

Return to Contents R

Within the “picocyanobacteria” (genus 7.1.1), for which we use the term “OMF” (Oceanic, Marine, Freshwater, as defined by Laloui et al., 2002), we show 56 genomic sequences; only 17 of the genomes are complete, and 8 are metagenomes. The complete genomes show a range of DNA base composition of 52.45 to 67.98 mol% G+C and vary in size from 2.11 to 2.98 Mbp. Genome sequences are available for only 18 of the 32 species defined by 16S rRNA sequence identity. Comparison of genome metrics obtained with complete genomes within individual specific clusters is possibly only for two species (7.1.1T, 7.1.1Z).

Species 7.1.1T contains two strains represented by complete genomes: LTW-R and CB0101; 2.42 and 2.79 Mbp, 62.56 and 64.06 mol% GC, respectively. Values of 26.2% isDDH (GGDC formula 3), 76.44% ANI and 94.62% CGS place these strains on the borderline between different species and genera, despite their 16S rRNA sequence identity of 98.7%.

The five complete genomes in species 7.1.1Z, with 54.2-61.4 mol% G+C and size 2.11-2.59 Mbp, show Mash similarity 76.26-92.29%, CGS 93.11-98.43%, isDDH 16.6-66.9% (GGDC formula 3) and ANI 71.79-87.58%. They are therefore not con-specific. Strains CC9605 and KORDI-52 are distinguished by medium values of Mash similarity (89.86%), CGS (98.43%), isDDH (53.9% with GGDC formula 3) and ANI (87.58%); strain WH 8109 shows Mash similarity 89.95%, CGS 98.30%, isDDH 54.5% (GGDC formula 3) and ANI 87.96% with KORDI-52. The genome metrics show that the latter 3 strains are members of different species, the others being assigned to a different genus.

The inclusion of draft genomes permits the comparison of two pairs of Cyanobium spp. strains. Isolates PCC 7001 and NIES-981 share 98.78% 16S rRNA sequence identity and are

Page 6: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

therefore borderline members of a single species (7.1.1O). However, values of 32.4% isDDH, 86.80% ANI and 96.73% CGS assign them to distinct species. Strains PCC 6307 and Copco Reservoir LC18 (7.1.1J) are identical in 16S rRNA sequence, their con-specificity being confirmed by 50.5% isDDH and 92.61% ANI values.

In contrast, inter-specific comparisons may be performed with complete genomes for 11 of thedefined clusters of Synechococcus. Mash similarities of 72.61-78.50%, CGS 86.81-91.25%, isDDH 13.6-17.2% (GGDC formula 3), ANI 69.06-74.20% show that all are different genera. Cyanobium spp. strains PCC 6307 and NIES-981 (7.1.1J and 7.1.1O) show slightly higher values: Mash similarity 82.96%, CGS 91.37%, isDDH 23% (GGDC formula 3) and ANI 76.79%; these metrics place the strains on the borderline between species and genus.

The species not described above (7.1.1C, G, P, V, X, Y, AA and AC), defined on the basis of 16S rRNA sequence identities, are all also poorly supported by genome metrics.

Coutinho et al. (2016a) compared the genomes of 24 Synechococcus strains with six methods: ANI, AAI (Average Amino acid Identity), dinucleotide signature, isDDH, 16S rRNA identity and MLSA (Multi-locus sequence analysis) with genes rrsA, gyrB, pyrH, recA and rpoB. The authors assigned the generic name Parasynechococcus to 15 strains (the sister group of Prochlorococcus), with WH8102 as type strain. Although published under the rules ofthe ICNP, the strains are not axenic, and are therefore invalid under that code. The genus is also invalid under the rules of the ICN, since a type species was not named. Consequently, we have retained the original names in all other phylogenetic trees on this site. Subsequently, Coutinho et al. (2016b) assigned specific names to each of the 15 strains. These specific names are shown in RED in the figure below, but note that some of the generic names were later changed by Walter et al. (2017); these have been retained in the same colour in the figure. Walter et al. (2017) produced a tree of 100 genomes, using the 31 marker genes of Wu& Eisen, and validated the circumscription of species and genera with measurements of isDDH and AAI values. Strains were considered to belong to the same species when they share at least 98.8% 16S rRNA sequence identity, 95% AAI and 70% isDDH, the cut-offs for species delimitation being ≥95% AAI and ≥70% isDDH. The AAI values are similar to the ANI values that we use throughout this page: e.g. 70-90% AAI within Parasynechococcus, vs 71.8-91.2% ANI; 63-85 AAI vs 70.8-90.8% ANI for Pseudosynechococcus; between "genera" AAI 59-62 %, ANI ~69.1-71.0%. AAI values ≥70% were employed for genus delimitation. The new generic names were Pseudosynechococcus, Magnicoccus, Regnicoccus and Inmanicoccus. Salazar et al. (2020) markedly expanded the coverage of the OMF group by adding a large number of genomic sequences, shown in VIOLET in the figure. These authors also added the generic name Synechospongium to replace candidatus Synechococcus spongiarum (16S rRNA cluster 7.1.5), as shown in GREEN in the figure. Unfortunately, these authors included alarge number of incomplete genomes of this cluster, which we have excluded from our genome trees. Synechococcus lacustris strain Tous was renamed to Lacustricoccus sp., marked in the same colour.

Page 7: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

This figure was extracted from our genomic tree, with the 16S rRNA cluster numbers added as they appear in the16S rRNA tree. Strain PCC 6301 was employed as outgroup.

Since there is little correlation between similarity values provided by 16S rRNA analysis with genome metrics, there is clearly an unusual mode of evolution within this group. This is also true for the Prochlorococcus sister group, described below. The members of both clusters may be considered as being among the most abundant photoautotrophs in the oceans, having achieved buoyancy by marked reduction in cell size. However, the genomes of most members of the Prochlorococcus group appear to have undergone extensive streamlining (reduction in size), which is less evident in the OMF cluster. Although we do not know the true ancestor of these groups, the sizes of the complete genomes of related strains vary from 2.66-3.77 Mbp, 48.52-60.24 mol% G+C (Synechococcus spp. strains PCC 6312, PCC 6715, JA-2-3B'a, JA3-3Ab), whereas the OMF cluster strains contain genomes of size 2.11-2.98 Mbp, 52.45-67.98 mol% G+C. Most of the completed genomes of strains of the OMF group contain two copies of the rrn operon, like many other unicellular cyanobacteria, and therefore extensive genome reduction has not occurred. Unlike strains of the Microcystis and Moorea clusters described below, we find few repeat sequences (39-467) in members of the Synechococcus OMF clade, insufficient to effect the genome metrics. Again unlike the situation in Microcystis, the core genomes prepared from members of the OMF clade show relatively unchanged ANI values, implying that the evolutionary changes have occurred in all parts of the genome.

Page 8: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Outside of the OMF cluster, Walter et al. (2017) renamed many taxa, in many cases unwisely;for example, seemingly based only on their clustering in the tree and ignoring the 16S rRNA identity values and genome similarity data, Cyanobacterium stanieri PCC 7202 was transferred to the genus Geminocystis (represented by strain PCC 6308), even though these isolates share only 93.52-94.54% 16S rRNA identity, 72.87% ANI and 13.8% isDDH (GGDC formula 3), and Halothece sp. PCC 7418 was transferred to the genus Dactylococcopsis (PCC 8305), when the strains involved share only 96.26% 16S rRNA identity, 77.93% ANI and16.2% isDDH (GGDC formula 3). Since these values are representative of distinct genera, such transfers are invalid

B1a2: Prochlorococcus

Return to Contents R

Prochlorococcus spp. strains (genus 7.1.3) are divided into 5 species in the 16S rRNA tree on the basis of 16S rRNA sequence identity (within species, around 99.7%; between species 97.3-98.5%). The genus is represented by 63 genomes (14 completed, 2 metagenomic assemblages). Most members possess small genomes (approx. 1.6 Mbp) of low (30-38) mol% G+C and a single copy of the rrn operon; members of species 7.1.3E have larger genomes (around 2.6 Mbp) of higher (around 50) mol% G+C with two copies of the operon, as calculated from the only two complete genomes representing this group. With such large differences, particularly in mol% G+C content, it is pointless to perform isDDH between members of species 7.1.3E and the others. The tree of complete plus draft genomes, based on our set of 79 core marker genes, contains genomes of 54 Prochlorococcus strains (14 complete), as shown in the figure below (where the Prochlorococcus clade has been extracted from the genome tree, and the species cluster numbers are derived from those of the 16S rRNA tree). The members of species 7.1.3A represented by complete genome sequences show a minimum of 99.12% 16S rRNA sequence identity. However, again using only the complete genomes, 2 strains (CCMP1986 and MIT9515, sharing 63.4% isDDH [GGDC formula 3] and therefore probably representing two different species) form a sub-cladeof species 7.1.3A, rather than being included within the clade; this is supported by their isDDHvalues of 41.7 to 48.8% with the other strains of species 7.1.3A, which themselves show 68.3-82.5% isDDH and are bracketed in the figure as "DDH 1". isDDH analysis with the 3 completegenomes of species 7.1.3B confirms the similarity of strains NATL1A, NATL2A and MIT0801 (sharing a minimum isDDH value of 80.7%) and is thus in agreement with the results of 16S rRNA study ("DDH 2" in the figure). Unfortunately, there are insufficient complete genome sequences available for members of species 7.1.3C and 7.1.3D to permit DDH analysis withineach species, but the single complete genomes of strains CCMP 1375 and MIT 9211 show only 15.9% isDDH, Mash similarity 76.74%, ANI 72.3% and CGS 98.56%, placing them into two distinct genera. Since clades 7.1.3A to 7.1.3D (as defined by 16S rRNA similarity values) show longer branches in the genomic tree than in the 16S rRNA sequence-based tree, increasing the number of genes employed for the analysis results in increased resolution. Clade 7.1.3E is therefore separated more clearly from the others here than in the 16S rRNA-based tree; strains MIT9303 and MIT9313, represented by complete genomes of greater size and higher mol% G+C content than all other "species", exhibit 78.3% isDDH, 95.84% Mash similarity, 94.83% ANI and 98.16% CGS, thus representing a single species ("DDH 3").

Page 9: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

"#": not present in 16S rRNA tree (containing only fragments of the 16S rRNA gene). The genome of strain MIT9202 is draft, erroneously included by Thompson et al.

In summary, the genome metrics further divide "species" 7.1.3A into two species, maintain cluster 7.1.3B as a single species and define "species" 7.1.3C and 7.1.3D as distinct genera. The only generic and specific epithets appended, P. marinus, are employed for all strains; it is evident that extensive renaming at both taxonomic levels is required. An important step in this direction was made by Thompson et al. (2013), who proposed division of the genus into 10 species based on isDDH and ANI analysis of complete genomes; these species have not been validated under the ICN. The proposed specific names are shown in parentheses and highlighted in red in the figure; note that the reference strain (CCMP1375) was designated by Chisholm et al. (1992) and that 2 complete genomes (marked in black) have more recently become available. Our more extended DDH/ANI studies, including draft genomes, show that many more species can be distinguished by ANI values, indicated in the figure as ANIb 1 to ANIb 25. There is no effect of repeated sequences on species circumscription by ANI in this genus, since the strains of species 7.1.3A to 7.1.3D contain only low numbers (37-190) of these elements and even the strains with larger genomes (species 7.1.3E) contain only 425-533; ANI values of the core and accessory genomes are unchanged. The necessity of whole-genome sequencing to achieve a taxonomic scheme, rather than relying on a 16S rRNA-based phylogeny, is an open question: the strains of clusters 7.1.3A to 7.1.3D have undergone extensive genome streamlining (see Giovannoni et al., 2014 and references therein), and different lineages may have lost and/or gained different genomic regions (see Humbert et al., 2013, and references cited) making comparisons difficult and perhaps explaining the apparent discrepancies between genome metrics in defining species

Page 10: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

boundaries. Li et al. (2015) suggested that ANI and DDH values vary in different genera and should be replaced by the sequence identity of homologous genes as the criterion for delineation of species; this is exactly our approach for inference of genomic phylogenetic trees.

Thompson et al. employed GGDC formula 2 (identities/HSP length), which is equivalent to ANI, whereas we have used formula 3 (identities/total genome length), which is more appropriate for complete genomes. ANI, if the analysis is performed with default parameters, splits species established by DDH (formula 3) into further specific clusters (as shown in our tree, above) and would appear to be inappropriate. It is possible to show this by decreasing the length setting in JSpecies, or the fraglen or Kmer size settings in fastANI, to a point where the ANI values match those of DDH. DDH does not suffer from this artifact. If we consider onlythe isDDH values obtained with formula 3, the species P. chisholmii, P. ponticus, P. neptunius and P. nereus proposed by Thompson et al. become a single species; the six remaining species appear to be validly delineated. Walter et al. (2017), including only 13 strains in their tree, further divided the genus Prochlorococcus into four genera, their names being shown after the "/" in the figure. None of these genera are valid under the ICN. Eurycolium is equivalent to clusters DDH 1 plus ANIb 13 in the figure, which together form 16S rRNA group 7.1.3A; Prolificoccus represents DDH group 2 and 16S rRNA cluster 7.1.3B; Prochlorococcus was retained as two species (equivalent to ANIb groups 18 and 21, 16S rRNA clusters 7.1.3C and 7.1.3D); Thaumococcus (including only the two P. swingsii strains described above, formsANIb clusters 22 and 25, 16S rRNA group 7.1.3E). Tschoeke et al. (2020), with a tree of 208 genomes created with an unspecified 249 marker gene set, expanded further the studies of Walter et al. (2017). These authors proposed the creation of a new genus, Riococcus, and greatly increased the coverage of this group by the addition of many draft genomes; their suggestions are shown in violet in the figure. None of the nomenclatural changes were validly published under the rules of the ICN.

B1b: Synechocystis spp.

Return to Contents R

Although genus 5.1.7.11 (“Synechocystis”) is divided into 6 species on the basis of 16S rRNA sequence identity, genome sequences have been obtained only 8 isolates, all members of species 5.1.7.11A; three (IPPAS B-1465, PCC 6714 and PCC 6803) are completed. Completegenome sequences are additionally available for sub-strains of PCC 6803. The 16S rRNA of strain PCC 6803 is identical to that of strains FACHB-898, FACHB-908 and FACHB-929; the four strains are confirmed as con-specific by high isDDH and ANI values (99.9 and 99.86-99.88%). However, the apparent placement of strain FACHB-383 in this species (based on 99.39% 16S rRNA sequence identity with strain PCC 6803) is not confirmed by the low valuesof isDDH (32.1%) and ANI (86.22%); although isDDH should be considered as non-discriminatory in this range, the value of 98.39% CGS supports this specific separation. StrainFACHB-383 is also distinct from strain PCC 6714, as shown by values of 82.90% ANI, 26.2% isDDH (again non-discriminatory) and 97.51% CGS, which place the two organisms into two species. The con-specificity of Synechocystis sp. strains PCC 6714 and PCC 6803 (99.59% 16S rRNA sequence identity) is not confirmed by isDDH of their complete genome sequences (47.5%; GGDC formula 3). This isDDH value is unexpectedly low and was confirmed by a low ANI value of 84.16%, Mash similarity 87.44% and CGS 98.23%. All genome metrics,

Page 11: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

therefore, place strains PCC 6714 and PCC 6803 into two distinct species, giving three species in total. The genomes are similar in mol% GC (47.7-47.8) and size (3.49-3.57 Mbp); strains PCC 6714 and PCC 6803 are axenic and have similar morphological and physiologicalcharacteristics (Rippka et al., 1979). Kopf et al. (2014a) described extensive differences between the complete genome of Synechocystis sp. PCC 6803 and the draft genome of Synechocystis sp. PCC 6714, later confirmed with the complete genome of the latter strain (Kopf et al. 2014b). The strains were shown to share 2838 protein-coding genes, with 845 unique genes in Synechocystis sp. PCC 6803 and 895 unique genes in Synechocystis sp. PCC 6714; the strains further differed in a different composition of the pool of transposable elements and the presence of a prophage in the genome of strain PCC 6714. We can show rearrangements by changes in position of many of our 79 core marker genes (see figure below), although these alone should have no effect on isDDH values. Strain PCC 6803 does show absolute identity with the various sub-strains of the same organism and with strain IPPAS B-1465 (100% 16S rRNA sequence identity, 100% isDDH, ANI, Mash similarity and CGS), and slightly lower values (99.65% 16S rRNA sequence identity, 88.6% isDDH, 96.57% ANI and 99.59% CGS) with strain CACIAM 05. Although the latter two strains are con-specific (99.66% 16S rRNA sequence identity, 88.5% isDDH, 96.58% ANI and 99.59% CGS), strain PCC 6714 shows only 47.5-47.7% isDDH, 84.09-84.15% ANI and 98.16-98.23% CGS with them, although the 16S rRNA identity value (99.60-99.80%) remains high.

The figure below compares the graphical output of fastANI for pairwise comparisons (left: strain PCC 6803 with IPPAS B-1465, right: PCC 6803 with PCC 6714). The red lines indicate regions of identity between a pair of genomes. Rearrangements and low similarity between strains PCC 6803 and PCC 6714 are evidenced by the slanted lines and their low density compared to the comparison between two identical strains on the left.Unlike the situation in the genus Microcystis described below, neither the core genomes nor the accessory genomesof strains PCC 6714 and PCC 6803 show significant change in ANI values; the repeat sequences (2764, 1008 and 1026 per genome for strains PCC 6714, PCC 6803 and IPPAS B-1465, respectively) are too few to markedly change the low ANI and isDDH values of the first

Page 12: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

two strains. Interestingly, the number of paired repeats in the substrains of PCC 6803, maintained in different laboratories, shows little change, ranging from 1008-1034.

B1c: the halophiles

Return to Contents R

In genus 5.2.4.3, the 16S rRNA genes extracted from the complete genomes of Euhalothece natronophila Z-M001 (species 5.2.4.3A) and Halothece sp. PCC 7418 (species 5.2.4.3B), together with the draft metagenome of "Cyanobacteria bacterium" GSL.Bin1 (species 5.2.4.3E), from alkaline or hypersaline habitats, share 96.81-97.52% sequence identity and show low isDDH and ANI values (~15% and ~75%, respectively). Although the 16S rRNA identity value of 97.31% between the strains represented by complete genomes suggests them to belong to two species of the same genus, the low values with isDDH (15.1%, GGDC formula 3), ANI (75.76%) and Mash (78.42%) suggest them to belong to different genera. However, the higher CGS value (94.19%) agrees with their separation into two species. The separation of the halophilic strains Dactylococcopsis salina PCC 8305 and Halothece sp. PCC7418 into 2 genera (5.2.4.1 and 5.2.4.3), justified by low (96.36%) 16S rRNA sequence identity, is accompanied by slightly higher isDDH (16.2%), ANI (77.93%), Mash similarity (81.62%) and CGS (91.82%) values of their complete genomes, which place them on the borderline between species and genus. Genus 5.2.4.8 contains 16S rRNA genes extracted from the draft genomes of strain Phormidium (previously Oscillatoria) sp. HE10JO (only one offour being shown), Microcoleus sp. IPPAS B-353 (only one of two are shown), Geitlerinema sp. P-1104 (a single gene) and the metagenomes labelled as Cyanobacteria bacterium T3Sed10_304 and Phormidium sp. CSSed162cmB_426, all from hypersaline or alkaline environments. The 2 metagenomes are incomplete, containing respectively only 72 and 63 of our 79-gene core marker set, and have been excluded from the genome tree and calculation of genome metrics. The 16S rRNA sequences from the first three genomes share 99.33-99.60% identity, but the genomes show only 81.74-87.72% ANI and 97.65-97.68% CGS (the

Scy6803 v/s Scy1465

0 Mb 0.5 Mb 1 Mb 1.5 Mb 2 Mb 2.5 Mb 3 Mb 3.5 Mb

0 Mb 0.5 Mb 1 Mb 1.5 Mb 2 Mb 2.5 Mb 3 Mb 3.5 Mb

Scy6803 v/s Scy6714

0 Mb 0.5 Mb 1 Mb 1.5 Mb 2 Mb 2.5 Mb 3 Mb 3.5 Mb

0 Mb 0.5 Mb 1 Mb 1.5 Mb 2 Mb 2.5 Mb 3 Mb 3.5 Mb

Page 13: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

values of 28.2-35.1% isDDH should be considered as ambiguous in this region), placing them in separate species of the genus. These rapid changes of the genomes (in whole or part) of allof the extremophiles described above, with little change in 16S rRNA identity, suggest a mechanism of evolution in which some or many gene products have become adapted to the stress imposed in hypersaline environments.

Organisms which tolerate lower salt concentrations are widespread in the 16S rRNA tree. Of over 250, few genomic sequences are available. In all cases, the 16S rRNA sequence identities and genome metrics are congruent; these organisms are described in the appropriate paragraphs below.

B1d: Sphaerospermopsis

Return to Contents R

The draft genomic sequences of two axenic strains of Sphaerospermopsis, S. reniformis NIES-1949 and S. kisseleviana NIES-73, together with that of the non-axenic Sphaerospermopsis sp. LEGE 00249 (species 1.2.3A, which also contains many other members of this genus, not represented by genomic sequences), despite their different specific epithets, are shown to be con-specific by values of 97.05-98.21% ANI, 74.8-83.2% isDDH and 98.70-99.87% identity of their extracted 16S rRNA sequences. A fourth strain, S. aphanizomenoides BCCUSP55, also represented by a draft genomic sequence but not knownto be axenic, is seen as a different genus in species 1.2.5C, sharing only 95.86-96.17% 16S rRNA sequence identity with the strains of species 1.2.3A. However, the values of 83.22-83.88% ANI and 95.84-95.98% CGS suggest this strain to be a member of a distinct species, rather than a separate genus. The isDDH values (28.6-29.1%) obtained with GGDC formula 2are ambiguous in this region. The genome of strain BCCUSP55 is incomplete, containing only77 of our 79 core marker genes (which caused us to deviate from our strict CGS calculation method of permitting only 79 markers); this is also shown by the apparent genome size, measured for this draft genome as the sum of contig lengths, of 4.31 Mbp versus 5.35-6.03 Mbp for the two other strains, and the CheckM estimate of only 90.7% completeness.

B1e: Nostoc spp. (part)

Return to Contents R

Although most clusters of Nostoc spp. are congruent in terms of 16S rRNA identities and genome metrics (see below), several clusters are problematic.

Two strains of the type species, N. commune (species 1.8.1L), NIES-4072 (with 3 identical rrnoperons) and HK-02 (NIES-2114, with 4 identical rrn operons), are identical in 16S rRNA sequence and their draft genomes show 99.8% isDDH, 99.99% ANI (reported for 99% of the genome) and 99.99% CGS, confirming their con-specificity. They share only 97.92% rRNA sequence identity, 49.7-51.0% isDDH and 91.16-91.82% ANI with strain Nostoc flagelliforme CCNUN1 which, on the basis of 16S rRNA identity, lies in a separate species (species 1.8.1AD). The genome metrics, however, are conflicting: isDDH (with GGDC formula 2)

Page 14: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

suggests con-specificity of the 3 strains (although we emphasize our concerns regarding the use of formula 2); ANI places the strains on the borderline between species. The ANI value seems to be artficially high, since only the identities of around 60% of the genome are reported in comparisons involving strain CCNUN1. The CGS results are also confusing, since the three strains are combined into a single species with values of 99.31-99.32%. Strain CCNUN1 is represented by a complete genome sequence (CP024785) with six 16S rRNA genes but only 5 rrn operons (rrnA-rrnE) whose almost completely identical (99.8-100%) 16S rRNA sequences fall into species 1.8.1AD (whose members show 99.8-100% 16S rRNA sequence identity); only one (rrnA) is shown in the tree based on 16S rRNA sequences. The additional 16S rRNA sequence (labelled as "(no operon)" in place of an operon designation) falls into species 1.8.1AC.

The symbiotic 'Nostoc azollae' strain 0708 was described by Ran et al., (2010). The quotation marks are as shown in the NCBI entry, and presumably indicate that the authors were unsure of the identity of this organism. This is in NCBI as 'Nostoc azollae' (TITLE field of flatfile), but Trichormus (ORGANISM field). In a tree of 10 taxa extracted from a larger genomic tree built with 476 marker genes and 53 organisms, the authors show strain 0708 clustering with Cylindrospermopsis. This clustering is similar to our own, if we omit all intervening sequences.The complete genome contains 4 identical copies of the 16S rRNA gene and lies in species 1.2.5A of the 16S rRNA tree. Strain 0708 is separated only at the specific level from strain Sphaerospermopsis aphanizomenoides BCCUSP55 in species 1.2.5C (97.18% 16S rRNA sequence identity). The values of 85.42% ANI and 96.32% CGS confirm this specific separation. The strain is separated at the generic level from Sphaerospermopsis strains NIES-1949, NIES-73 and LEGE 00249 on the basis of a mean value of 96.24% 16S rRNA sequence identity; however, the genome metrics (81.24-81.45% ANI and 95.33-95.47% CGS) suggest separation at the specific, not generic, level. Values of 25.5-31.5% isDDH give no useful information. A logical conclusion would be that, based on the genome metrics, the organism Nostoc azollae strain 0708 is a member of the genus Sphaerospermopsis. The organism was collected as an extracellular cyanobiont of Azolla filiculoides and in terms of taxonomy has a mixed history, being otherwise known as Anabaena azollae and Trichormus azollae, but is clearly different from all other symbiotic Nostoc strains, most of which lie in genus 1.8.1, as described below. The genome is also the smallest (5.35 Mbp) of all Nostoc strains in our database, which range from 6.33-8.42 Mbp, measured for complete genomes only, and is identical in size to the "nearly complete" genome of S. kisseleviana NIES-73 (5.35Mbp). However, N. azollae strain 0708, although suggested to separate at the generic level from Anabaena sp. PCC 7108 and Trichormus sp. strain NMC-1 in genus 1.1.4, sharing 95.92% 16S rRNA sequence identity with both, appears to be separated only at the specific level from these strains by genome metrics (80.95-81.45% ANI, 95.52-95.69% CGS; the isDDH values of 25.7-26.3% do not permit discrimination within this range)

Since strains NMC-1 and PCC 7108 were isolated from saline habitats, Sphaerospermopsis spp. from freshwater and strain 0708 is an obligate cyanobiont, it is unlikely that this can be explained by some degree of convergent evolution in the genomes of these genera. Strain 0708 produces motile, small-celled hormogonia (never described for Sphaerospermopsis spp.or the two members of genus 1.1.4), which permit the formation of symbiosis. Strain 0708 cannot be a member of two distinct genera, as implied by the genome metrics. The genome appears to be in a state of erosion (Ran et al., 2010) and contains a large number of pseudogenes. The latter genomic evolutionary process may greatly influence measures of similarity.

Page 15: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Nostoc sp. strains UCD120, UCD121 and UCD122, all of which are cyanobionts, lie in species 1.8.1A of the 16S rRNA tree. This con-specificity is confirmed by values of 99.88-99.90% ANI and 99.3-99.4% isDDH of their draft genomes. They show 80.95-81.49% ANI with Nostoc sp 2RC, Nostoc sp Moss 3 and N. linkia strain z1 in genus 1.8.2 (all cyanobionts). The isDDH values are 27.3-27.6%, the CGS values are 95.97-95.99% and the 16S rRNA identities are 95.16-95.23% (the genome of strain UCD120 does not contain a 16S rRNA gene sequence). isDDH is not discriminatory in this range of values, but the ANI and CGS values both contradict the generic separation of these strains. However, strains UCD120, UCD121 and UCD122 show 75.91-75.95% ANI, 22.2-22.4% isDDH, 94.12-94.1% CGS and 93.81% 16S rRNA sequence identity with the symbiotic strain Nostoc sp TLC 26-01 in species 1.11.3A; separation at the generic level is therefore reflected by ANI but not by CGS. They are also separated at the generic level from the cyanobiont 'Nostoc sp' strain 0708 (species 1.2.5A, 75.54-75.65% ANI) and the freshwater strain Nostoc sp. PCC 7937 (ATCC 29413, 76.16-76.24% ANI) in genus 1.10.1. The CGS values (93.38-94.26%) again contradict both generic separations. Note that Nostoc sp. PCC 7937 is incorrectly listed as A. variabilis in NCBI (deposited by an author who received it from ATCC under that name), and has been changed to Trichormus in the ORGANISM fieldof the NCBI flatfile).

The three Peltigera cyanobionts (Nostoc sp. strains 213, 232 and N6) are con-specific with other members of species 1.8.1A described below, if comparison is based on their ranges of 98.63-99.33% 16S rRNA sequence identity and 98.77-99.10% CGS. However, they show unexpectedly low isDDH (40.6-40.9%) and ANI (87.77-88.43%) values, which conflict with their con-specificity.

B1f: Calothrix spp. (part)

Return to Contents R

The two genomes representing species 1.15.15D, Calothrix parasitica NIES-267 (containing 3identical rrn operons) and Rivularia sp. PCC 7116 (also 3 identical rrn operons), are similar in size (8.95 and 8.70 Mbp, respectively); they show only 82.23% ANI (measured over 54% of the genome) and 97.32% CGS, which places them as members of two distinct species. The isDDH formula 2 value of 26.9% is not discriminatory in this range. However, their extracted 16S rRNA gene sequences exhibit 99.66% identity. Genome metrics and 16S rRNA identity values therefore disagree in this cluster. Other clusters of Calothrix spp., for which all similarityvalues agree, are described below.

B1g: individual strains

B1g1: Leptolyngbya sp. PCC 7376

Return to Contents R

Species 5.1.7.9B contains the filamentous Leptolyngbya sp. PCC 7376, which shares 97.51%rRNA sequence identity with, for example, Synechococcus sp. strain PCC 7002 in species 5.1.7.9A, suggesting their assignment to distinct species of the same genus. That this is unlikely is shown by their different genome properties: mean DNA base composition 43.87 mol% and 49.17 mol%, respectively; genome size 5.13 Mbp and 3.43 Mbp. The isDDH value for their completed genomes is only 13.9% (GGDC formula 3), with only 71.91% ANI, 93.23%

Page 16: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

CGS and 73.77% Mash similarity, placing them as members of different genera. Another filamentous organism, Limnothrix rosea strain IAM M-220 (NIES-208), also in species 5.1.7.9B, is represented by a draft genome sequence; this strain is closely related to Leptolyngbya sp. PCC 7376 on the basis of both 16S rRNA sequence identity (99.33%) and 23S rRNA sequence identity (99.03%); however, their full genomes share only 21.4% isDDH, 77.39% ANI and 96.58% CGS, assigning them to two distinct species. The filamentous members of this genus have therefore retained high 16S rRNA sequence identity but have undergone more rapid divergence in other parts of the genome, each in different regions. These differences between unicellular and filamentous members may have arisen by either oftwo scenarios: (1) an HGT event in which a filamentous ancestor of the Leptolyngbya and Limnothrix strains acquired the rrn operon of a unicellular organism, followed by extensive genomic evolution; (2) the ancestor was unicellular, evolving in morphology and gene content while retaining the rrn operon.

B1g2: Gloeobacter kilaueensis JS1

Return to Contents R

At the base of the 16S rRNA and genome trees, Gloeobacter violaceus strain PCC 7421 and G. kilaueensis strain JS1 are separated into two species, 11.1.1A and 11.1.1B, sharing 98.65% 16S rRNA sequence identity. Their complete genomes share only 17.0% isDDH. The low isDDH value and apparent generic separation are strongly confirmed by a low ANI value of 73.6%, Mash similarity of 78.69% and CGS of 90.75%. The similarities derived from 16S rRNA analysis and genome metrics are therefore conflicting. None of the values derived from genome metrics fall near those used as cutoffs between species and genera. However, the strain JS1, fully described by Saw et al. (2013), was not axenic and the authors reported genomic regions that were not recognized by BLAST, were from other bacteria or of viral origin. In contrast, strain PCC 7421, whose genome was sequenced by Nakamura et al. (2003), is held in the PCC in an axenic state. The regions described by Saw et al. obviously make genome metrics inaccurate and difficult to calculate. A third genomic sequence attributed to Gloeobacter is the metagenome SpSt-379; this is smaller in size (4.03 Mbp versus 4.66-4.72 Mbp for the two isolates), has a much higher mol% G+C content (65.3 versus 60.5-62.0), contains only 62 of our 79 core gene marker set and lacks the rrn operon. It has been excluded from the trees and from DDH/ANI calculations.

B2: Borderline cases

Return to Contents RB2a: Desikacharya

Genus 1.12.3 ("Desikacharya") contains 3 species, of which two are represented by genomic sequences. D. piscinale strain CENA21 (complete genome), species 1.12.3A, and "Nostoc" sp. MBR 210 (draft genome), species 1.12.3C, share an isDDH value of 52.9%, 93.08% ANI (with 73% query cover) and 98.57% 16S rRNA sequence identity. In contrast to the latter

Page 17: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

value, which suggests that the strains should be assigned to two different species, the genome metrics place both into a single species.

B2b: Chlorogloeopsis

Return to Contents R

The two draft genomes of Chlorogloeopsis fritschii strain PCC 6912 and the draft genome of strain C. fritschii PCC 9212 (genus 1.15.3) give high (99.9%) isDDH values and ANI values of 99.98%, as expected from their 16S rRNA sequence identity of 100%, and are only distantly related to the unidentified HTF (High Temperature Form) PCC 7702 in genus 1.15.2 (23.1% isDDH, 77.99-78.97% ANI, 92.54% 16S rRNA identity). The isDDH value may be ambiguous in this range, but the ANI value and 95.22% CGS clearly indicate that these strains are members of distinct species, whereas the 16S rRNA identity value places them into different genera. Organisms of the HTF type are often incorrectly named as Chlorogloeopsis sp. in the literature. Similar results were obtained by Lachance (1981) in wet-lab thermal hydroxyapatite elution DDH studies for C. fritschii strains PCC 6718, PCC 6912 and PCC 9212 in genus 1.15.3 (93% DDH, no change in ΔTm(e); our unpublished results (Raum, Herdman & Rippka) using optical thermal renaturation DDH again show the con-specificity of strains PCC 6912 and PCC 9212 (100% DDH) and that of unidentified HTF strains PCC 7517, PCC 7518, PCC 7519 and PCC 7702 (86-100% DDH) in genus 1.15.2.

B2c: Microcystis

Return to Contents R

The 76 strains of Microcystis spp. present in the 16S rRNA tree (genus 5.1.1.1) and represented by genomic sequences exhibit a minimum 16S rRNA sequence identity of 98.85% and therefore appear to be members of a single species, which by consensus would be M. aeruginosa. The specific names given for many members of genus 5.1.1.1 (bengalensis, flos-aquae, ichthyoblabe, panniformis, protocystis, pseudofilamentosa, ramosa, robusta, smithii, viridis, wesenbergii) may therefore be considered invalid. The gene extracted from the draft metagenome TA09 gives lower 16S rRNA identity values; this may show poor assembly of the genome that consequently contains foreign DNA. Of the 38 genomes shown in the genome tree, only 9 are complete (strains FACHB-1757, FD4, MC19, NIES-102, NIES-298, NIES-843, NIES-2481, NIES-2549 and PCC 7806SL). Their high (99.60-100%) 16S rRNA sequence identity is mirrored by ANI values of 94.8-99.49%, 95.31-99.69% Mash similarity and 98.53-99.60% CGS, confirming the con-specificity of the strains. However, isDDH (GGDC formula 3) values greater than the intra-species limit of 62.10% are observed only for strains NIES-102, NIES-298, NIES-1757 and PCC 7806SL, the remainder being placed into a second species.

We have analyzed core and accessory genomes for each strain, made with the spine software (Ozer et al., 2014), using only the 9 available complete sequences. The core genomes represent 56.39-68.63% of the total genome and show slightly higher mol% G+C (43.5-44.0 vs 42.09-42.92%). The addition of the draft genomic sequences would further slightly reduce the size of the core genome (see Humbert et al., 2013). With the core genomes, ANI values increased only slightly to a minimum of 95.81%, while isDDH values increased dramatically to 96.6-100% and Mash similarities to 100%. All core genomes contain

Page 18: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

our 79 marker genes (therefore the CGS values were unchanged) plus two complete rrn operons. In contrast, all pairwise crosses of the accessory genomes gave ANI values of at most 94.32% and a maximum isDDH value of 37.8%. Unlike the core genomes, the accessorygenomes did not contain rrn operons or our marker genes. We have shown the presence of an unusually high number of repeat sequences in the complete genomes of members of this genus, using the RepSeek software (Achaz et al., 2006) with a minimum seed value of 25. The complete genomes vary not only in size (4.29-5.87 Mbp) and number of genes (4606-6671) but also in their content of paired repeats (3.46x104-1.54x105), the highest being found in strain M. panniformis FACHB-1757 (CP011339); see the figure included in the description of Moorea, below. The majority (66-70%) of the repeat sequences fall into the core genome, and within a single strain the core and accessory genomes share 28-36% of these elements, showing that they occur at least twice within the genome. The core genomes themselves share 49-82% of their total repeats between strains, whereas the accessory genomes share only 18-25%. The combined results show the presence of a conserved core genome and a variable accessory genome in M. aeruginosa, and collectively unite all the strains into a singlespecies. These results confirm and extend those of Humbert et al. (2013), who commented ona genome evolutionary strategy that combines a large genome plasticity, characterized by a high number of repeated sequences, numerous rearrangements and an ability to include new adaptive genes by horizontal gene transfer. Our unpublished studies using wet-lab optical thermal renaturation DDH show the con-specificity of 5 strains of the PCC (PCC 7005, PCC 7806, PCC 7813, PCC 7820 and PCC 7941), having DDH values of 64-100%. Results with 3 of these strains can be compared with those of their incomplete genomes obtained with isDDH (the latter in parentheses): PCC 7806 x PCC 7941 64% (72.4%); PCC 7806 x PCC 7005 74% (72.1%); PCC 7941 x PCC 7005 76% (80.1%). The members of this toxin-producing genus require further detailed study. Several strains, apparently misidentified as Sphaerocavum brasiliense, also fall into genus 5.1.1.1 but are not represented by genomic sequences; more details of these are given on the main page of this site.

B2d: Crocosphaera (part)

Return to Contents R

Within species 5.1.4.1A, the complete genome of Crocosphaera subtropica ATCC 51142 (previously Cyanothece sp.) shows an isDDH value of 98.0%, 99.99% ANI (100% 16S rRNA sequence identity) with that of Crocosphaera subtropica ATCC 51472 (draft); these two strainsare therefore clearly con-specific. However, strain ATCC 51142 shows only 34.8% isDDH, 87.15% ANI (99.12% 16S rRNA sequence identity) with Cyanothece sp. strain BG0011 (draft),33.5% isDDH, 87.14% ANI (99.12% 16S rRNA sequence identity) with strain Crocosphaera chwakensis CCY0110 (draft) and 33.6% isDDH, 86.71% ANI (99.39% 16S rRNA sequence identity) with the unidentified unicellular cyanobacterium SU3 (draft). The corresponding CGS values are 98.13, 97.95 and 98.07%, respectively. On the basis of genome metrics, the single species delineated by 16S rRNA analysis should be divided into two. In contrast, members of Crocosphaera species 5.1.4.1B (below) appear to be supported as con-specific by all similarity measures.

Page 19: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

B2e: Pseudanabaena

Return to Contents R

Genus 5.4.1.1 contains 11 strains for which genomes have been sequenced (Pseudanabaenasp. ABRG5-3 (completed genome), 0153, BC1403, GIHE-NHR1, PCC 7429, UWO310, UWO311 and Roaring Creek (all draft) and the metagenomic sequences Pseudanabaena sp. ULC068 (incomplete) and ULC187, together with Limnothrix redekeii PCC 9416, draft). Surprisingly, Pseudanabaena sp. PCC 7429 (species 5.4.1.1A) shares 99.32% 16S rRNA sequence identity but shows only 26.3% isDDH, 79.50% ANI and 97.07% CGS with Limnothrix redekeii PCC 9416, and 98.85% 16S rRNA sequence identity but only 24.9% isDDH, 78.76% ANI and 96.99% CGS with Pseudanabaena sp. UWO311; these values are inconsistent for a single specific cluster, even allowing for the failure of isDDH formula 2 to discriminate in this region. Strains 0153, ABRG5-3, GIHE-NHR1, Roaring Creek, UWO310 and the metagenomes ULC068 and ULC187 clearly fall into six additional species of this genus, as shown in the 16S rRNA tree, with low 16S rRNA sequence identities, isDDH and ANI values (the incomplete sequence of strain ULC068 being excluded from DDH and ANI analysis and the genome tree). Pseudanabaena sp. strains Roaring Creek and UWO310 (species 5.4.1.1F) are identical in their 16S rRNA sequences and show values of 63.5% isDDH and 94.88% ANI, confirming their con-specificity. Note that many other strains assigned to the genus "Pseudanabaena" fall into different generic clusters, although these areall near the root of the tree.

B2f: "Leptolyngbya" (part)

Return to Contents R

Species 6.1.1C contains two genomic sequences, Phormidium tenue NIES-30 (draft) and the metagenomic Leptolyngbya sp. ULC186, that share 98.92% 16S rRNA sequence identity, 87.35% ANI and 97.7% CGS. The isDDH value of 34.0% obtained with GGDC formula 2 is not discriminatory in this region. They should therefore be considered as members of differentspecies, unless the metagenome is badly assembled. Within genus 6.1.10, 2 of the 6 species contain uniquely the genome sequences of strains of Leptolyngbya; Leptolyngbya sp. BC1307(species 6.1.10D) and L. foveolarum ULC129 (species 6.1.10E) are defined as members of different species in sharing 97.24% 16S rRNA sequence identity. However, the results obtained with the genome metric methods applicable to their draft genomes are conflicting: 21.7% isDDH (again ambiguous in this region), 73.49% ANI (suggesting them to be members of two genera) and 93.21% CGS (placing them as two species of a single genus). The extracted 16S rRNA gene of Synechococcus sp. PCC 7335 (species 6.1.10B) shares 97.51% sequence identity with that of Leptolyngbya sp. BC1307 and 97.04% sequence identity with that of L. foveolarum ULC129. Their draft genomes show only 19.1-19.7% isDDH (again ambiguous) and 71.4-73.49% ANI, suggesting their assignment to different genera. However, the CGS values (92.49-92.94%) define them as members of a single genus. Other genomic sequences of Leptolyngbya spp. and related genera, showing congruent results with all methods, are described below.

Page 20: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

B2g: Thermosynechococcus

Return to Contents R

Thermosynechococcus elongatus strains BP1 and PKUAC-SCTE542, T. vulcanus strain NIES-2134 and Thermosynechococcus sp. strains CL-1 and NK55a (genus 8.1.1), all isolatedfrom thermal springs, share 99.66-100% 16S rRNA sequence identity (strains BP1 and NIES-2134 being identical), and are therefore members of the same species based only on this comparison. The genomes are all similar in size (2.52-2.65 Mbp) and mol% G+C (53.3-53.9). Genome analysis by isDDH confirms the con-specificity of all strains (67.1-83.7% isDDH with GGDC formula 3 for complete genomes). Strains BP1 and NIES-2134 are confirmed as con-specific by values of 99.09% ANI, 99.34% Mash similarity and 99.86% CGS). However, strains PKUAC-SCTE542, CL-1 and NK55a show only 87.13-92.75% ANI, 89.05% Mash similarity, 98.01-98.59% CGS with strains BP1 and NIES-2134. They show among themselves 87.36-90.44% ANI, 89.27-92.38% Mash similarity, 97.92-98.81% CGS, thus each behaving as an individual species in contrast to the con-specificity suggested by the 16S rRNA sequence analysis. Strains BP1 and NIES-2134 both show low BLAST query coverage with the marker recN, whereas the other strains contain a full-length copy of this gene, as described on the main page of this site. Strain Synechococcus lividus PCC 6715, also isolatedfrom a thermal spring, is also a member of this genus. The complete genome of this strain is similar in size (2.66 Mbp) and genomic G+C content (53.5 mol%) to the Thermosynechococcus strains listed above. It shows a range of 98.72-98.86% 16S rRNA sequence identity with all Thermosynechococcus strains, but only 16.3-17.0% isDDH (GGDC formula 3), 74.76-75.19% ANI, 78.02-78.77% Mash similarity and 93.92-94.26% CGS. The genome metrics therefore place strain PCC 6715 into a different species or even genus. The conflicts between 16S rRNA sequence identities and genome metrics in genus 8.1.1 perhaps indicate an adaptation to thermal stress of many gene products. Other strains of thermal origin, such as Leptolyngbya sp. O-77 and Thermoleptolyngbya sp. PKUAC-SCTA183 (species 6.2.8A) and Synechococcus spp. JA-3-3Ab (species 10.1.1A) and JA-2-3B’a (species 10.1.1B), described in Section B3 below, show speciation patterns that are coherent when measured with all available methods.

B3: Most clusters are supported by all metrics

Return to Contents R

Isolates known as Anabaena, Aphanizomenon and Dolichospermum in genus 1.1.1 internally share 96.9-100% 16S rRNA sequence identity. Despite their major morphological differences and generic names, they are resolved only as species (1.1.1A to 1.1.1J) of a single genus; the16S rRNA data are supported in all cases by genome metrics. These values cannot be compared with the 16S rRNA identities for some strains (UHCC 0167, UH 0299, UHCC 0406) that lack or contain only partial copies of the gene; additionally, strains Dolichospermum spp. UHCC 0299 and UHCC 0352 have a large gap in the 16S rRNA sequence and have been excluded from the 16S rRNA tree.

A phylogeny derived by analysis with our 79 conserved marker genes is shown in the figure below. Österholm et al. (2020) showed a similar genomic phylogeny, using the 31 marker genes of Wu & Eisen (2008), and termed this cluster the ADA clade, derived from a larger tree

Page 21: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

of 94 genomes.

The species designations are taken from our 16S rRNA tree. The sequences of strains AWQC131C, UHCC 0167, UHCC 0299, UHCC 0352 and UHCC 0406 are all absent from the 16S rRNA-based tree, since they do notcontain, or contain only fragments of, the gene.

The two trees are reasonably comparable, except that the clusters are more clearly resolved with our 79 marker genes. The tree of Österholm et al. contains in addition 7 Anabaena sp. metagenomes, 5 Aph. flos-aquae metagenomes and Aph. flos-aquae strain 2012/KM1/D3, excluded from our tree because it is incomplete, having only 76 of our 79 core gene marker set. The tree of Österholm et al. lacks D. circinale strains ACBU02 (available from the JGI, notfrom the NCBI) and ACFR02 (not in NCBI or JGI), and D. flos-aquae CCAP 1403/13F which was not available at the time of publication. Of the four complete genomic sequences of this genus, Anabaena sp. strain WA102 (1.1.1F) is clearly separated at the specific level from Dolichospermum sp. strain UHCC 0090 (1.1.1J), with 16S rRNA sequence identity of 98.2%, isDDH values of 45.7% (GGDC formula 3), ANI 90.43%, MASH distance 92.43% and CGS 97.85. For these DDH studies, it was necessary to artificially concatenate the two circular chromosomes of strain UHCC 0090 (chromosome 1, CP003284, 4.33 Mbp and chromosome 2, CP003285, 0.82 Mbp). Chromosome 1 contains 5 identical 16S rRNA genes, of which only one is shown in the trees. The five 16S rRNA genes extracted from the complete genome of Anabaena sp. strain WA102 share 99.9-100% identity, only one being shown in the tree. The three complete genomes of strains in species 1.1.1J are confirmed as members of a single species, showing Mash similarity of 97.31-97.78%, CGS 99.19-99.25%, ANI 96.42-96.93% and isDDH (65.8-74.2% with GGDC formula 3).

The five 16S rRNA genes of the complete genome of strain UHCC 0315 share only 98.79-99.19% identity, thus exhibiting microheterogeneity. These sequences, of which we show 3 in the rRNA tree, share 98.72-99.13% identity with the 16S rRNA of Dolichospermum sp. strain UHCC 0090 (species 1.1.1J); isDDH values of 65.8% (GGDC formula 3), ANI 96.42%, MASH distance 97.31% and CGS 99.25% confirm the con-specificity of the strains. Similar microheterogeneity may be seen in the complete genome of D. flos-aquae CCAP 1403/13F (also in species 1.1.1J), which also contains 5 rrn operons; the 16S rRNA sequences share

Page 22: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

98.99-99.73% identity; only two are included in the 16S rRNA tree. These share 98.65-98.92% identity with strain UHCC 0090. These values are at the limit of our cutoff value; the Mash (97.38%), ANI (96.49%), isDDH (68.5%, GGDC formula 3) and CGS (99.25%) values establish them as members of a single species.

With the complete and metagenomic sequences excluded, 19 further genomes, all draft, remain in genus 1.1.1. Aph. flos-aquae strain 2012/KM1/D3 (species 1.1.1F) has been further excluded from DDH comparisons and the genome tree since the genome is incomplete. Nine species (defined by 16S rRNA identity values) contain genome sequences, as described below.

D. planctonicum strain NIES-80 lies in species 1.1.1A; although it cannot be compared with strain UHCC 0167 on the basis of 16S rRNA identity, the two are members of a single species, showing 95.99% ANI and an isDDH value of 67.9%.

The four draft genomic sequences in species 1.1.1B share 99.2-100% 16S rRNA sequence identity and show 69.9-97.3% isDDH and 96.18-99.7% ANI values. All four strains are therefore members of a single species and are named as D. circinale. The metagenome D. circinale Clear-D4 fits into this species on the basis of genome metrics (96.05% ANI, 70.9% isDDH with strain ACFR02), but is incomplete, containing only 75 of our 79 core marker genesand no 16S rRNA gene, and has been omitted from the rRNA and genome trees.

The six strains of species 1.1.1F for which 16S rRNA data are available are confirmed as members of a single species (99.74-99.93% 16S rRNA sequence identity, 68.2-84.1% isDDH and 95.39- 98.04% ANI). The incomplete draft genome of Aph. flos-aquae strain 2012/KM1/D3 falls into this cluster in the 16S rRNA tree, but has been excluded from the genome tree and DDH calculation; the 16S rRNA gene of the metagenome Aph. flos-aquae WA102 incorrectly places this genome into cluster 1.1.1F due to extreme heterogeneity (see below); this has been excluded from DDH calculation. The metagenome Anabaena sp. AL93 WA93 has been excluded from our genome tree.

Within species 1.1.1J, 7 strains share 99.77-99.66% 16S rRNA sequence identity, more than 96.85% ANI and a minimum of 74.9% isDDH, confirming their placement into a single species.Strain Anabaena sp. UHCC 0187 shares 98.63-99.38% 16S rRNA sequence identity with the above, seemingly being a member of the same species, but is confirmed as a different species by ANI values of 89.17-89.36% and isDDH values of 40.3-40.5%. The 3 metagenomes of this species are not shown in our genome tree.

Species 1.1.1L contains two draft genomic sequences (Anabaena sp. strains UHCC 0204 andUHCC 0253) that show 99.46% 16S rRNA sequence identity, 84.0% isDDH and 98.05% ANI, confirming their assignment to the same species.

The heterogeneity of 16S rRNA sequences within the genomes of strains UHCC 0315 and CCAP 1403/13F (above) is also observed in the draft metagenome labeled as WA102, but named as Aph. flos-aquae. This contains four 16S rRNA sequences, only two being shown in the 16S rRNA tree; one (incomplete) is found in species 1.1.1F (sharing 99.5% identity with the sequence from the complete genome of Anabaena sp. strain WA102) and the other in 1.1.1E. The 2 sequences thus assign this organism to different species. On the basis of isDDH, this draft metagenome fits into species 1.1.1E, sharing 82.3-82.6% identity and 97.44-97.81% ANI with Aphanizomenon MDT13, MDT14a and UKL13-PB (all being again metagenomic assemblies); it shows only 35.3% isDDH, 87.51% ANI with the complete genome of Anabaena sp. strain WA102 in species 1.1.1F. Strains MDT13 and MDT14a show high isDDH values of 92.3% and 98.99% ANI. The (almost) full length 16S rRNA genes

Page 23: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

extracted from the metagenomic sequences Anabaena spp. strains MDT14b (2 sequences) and WA113 (3 sequences) are not of cyanobacterial origin, since they fall into the bacterial outgroup of the tree. The metagenome Aph. flos-aquae Clear-A1, showing 98.00% ANI and 83.8% isDDH to Aphanizomenon MDT14a, places the organism into species 1.1.1E but, lacking a rrn operon, is not shown in the RNA tree and has been omitted from the genome tree, as have all 6 metagenomes of this species and of species 1.1.1M.

The problem of inter-mixing of the generic names appended to members of genus 1.1.1 has been reported in several publications (see Driscoll et al., 2018, and references therein) and may arise, in part, by the over-zealous transfer of strains with a variety of specific epithets intoDolichospermum by various authors, and in part by renaming of strains originally identified as those species of Anabaena to Dolichospermum, which is done automatically by the NCBI. This problem was partly solved by Rajaniemi et al. (2005), who transferred a large number of strains known as Aphanizomenon issatschenkoi into the newly-erected genus Cuspidothrix, since they showed only 95.7% 16S rRNA sequence identity to strains of the Aphanizomenon flos-aquae type; members of Cuspidothrix are found in our tree in genus 1.1.4, well removed from the planktonic organisms discussed above. The draft genome available for a single member of this genus, C. issatschenkoi strain CHARLIE-1 (PGEM00000000), forms a clade separated from other planktonic forms in the tree inferred from all (complete plus draft) genome sequences.

Trichormus strains are grouped in at least 2 generic clusters in the 16S rRNA tree (1.1.4 and 1.10.1). Single strains are also found in 1.2.6B (Sherwood et al., unpub), 1.3.7B (Miscoe et al., 2016) 1.4.1A (Johansen et al., unpub) 1.12.1B (Miscoe et al., 2016) 1.13.4 (Rajaniemi et al., 2005a). This makes a total of 7 generic clusters in 7 families. Rajaniemi et al. (2005a) remarked on the polyphyly of this "genus". Komárek, J. and Anagnostidis, K. (1989) transferred Anabaena variabilis (and many other species) into the genus with T. variabilis as type species. Two genomes (Trichormus sp. NMC-1, Anabaena sp. PCC 7108, both draft and from saline habitats) and the 16S rRNA of 13 strains (named as Anabaena sp., Trichormus sp. and many T. variabilis) fall into genus 1.1.4, mostly isolated from freshwater habitats. The draft genome of T. variabilis SAG 1403-b also fits into this genus in the genome tree, but cannot be shown in the 16S rRNA tree. The genomes share 89.42% ANI, 41.2% isDDH and 97.11% 16S rRNA sequence identity, as expected from their placement into two species. The con-specificity of Anabaena sp. strains PCC 6309 and PCC 7122 (species 1.2.9A in the tree) and their separation from strain PCC 7108 (species 1.1.4B) with only 32% DDH and high (11 °C) ΔTm(e) of the hybrids was shown by thermal hydroxyapatite elution DDH studies (Lachance, 1981).

Within genus 1.2.1 (Cylindrospermopsis), 16S rRNA and draft genome sequences are jointly available for 15 strains of C. raciborskii. These (Cr2010, CS-505 [2 sequences], CS-508, CS-509, CENA302, CENA303, CYLP, CYRF, GIHE 2018, ITEP-A1, MVCC14, MVCC19 and UNSW 506 (CS-506)), together with Raphidiopsis brookii D9 and R. curvata NIES-932 (both represented by draft genomes), and C. curvispora GIHE-G1 (complete genome), share 99.33-100% 16S rRNA sequence identity and, on the basis of this character, are therefore members of a single species. This is confirmed by isDDH (GGDC formula 2), Mash similarity and CGS. Unfortunately, the lack of complete genomes, with the exception of that of C. curvispora GIHE-G1, prevents detailed study of this group. Our BLAST results show that C. raciborskii CENA303 and R. brookii D9 both lack the nifH (N2 fixation) and hglD (heterocyst glycolipid synthase) genes considered (Stuken et al., 2010) to be a diagnostic feature of Raphidiopsis,

Page 24: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

although these genes are present in the other members of the species. R. curvata NIES-932 also lacks these genes. C. raciborskii CENA303 may therefore be misnamed. However, all of these genomes are in draft stage; the "diagnostic feature" may be an artifact, resulting from missing segments. The two Raphidiopsis spp. strains do contain (like all Cylindrospermopsis strains) the gvpA (gas-vesicle) gene, although that of strain D9 shows only 60% query cover in BLAST analyses and may be truncated at the end of a contig. We currently include all species of Raphidiopsis within the genus Cylindrospermopsis, since they appear to be simply Het- mutants of the latter; the genomes of further isolates of Raphidiopsis, particularly of the type species (R. curvata), should be sequenced in order to confirm or refute the validity of thisgenus.

Three strains of Sphaerospermopsis spp. (LEGE 00249, NIES-73 and NIES-1949), carrying two different specific epithets, lie in species 1.2.3A. Their con-specificity is shown by the ranges of 98.70-99.85% 16S rRNA sequence identity, 97.05-97.45% ANI and 77.2-83.2% isDDH.

The complete genome of Nodularia spumigena strain UHCC 0039 and the draft genome of strain CCY9414 in species 1.3.1A share 99.73% 16S rRNA sequence identity and show an isDDH value of 98.4% (GGDC formula 3), ANI 99.45% over 94.6% of the genome and CGS 99.90, confirming their con-specificity. Strain UHCC 0039 shares slightly lower (99.46%) 16S rRNA sequence identity, 74.2% isDDH, 96.85% ANI (calculated on 78.5% of the genome) and 99.59% CGS with the draft genome of Nodularia sp. CENA596, supporting their con-specificity, but lower values of 98.79% 16S rRNA sequence identity, 38.0% isDDH, 87.98% ANI (over only 58.7% of the genome) and 98.75% CGS with the draft genome of strain NIES-3585. The isDDH and ANI values suggest that the latter strain falls below the value employed here for species demarcation, but the CGS value suggests them to be con-specific.

Cylindrospermum spp. strains PCC 7417 (represented by a complete genome sequence) and NIES-4074 ("nearly complete") share only 97.5% 16S rRNA sequence identity; the isDDH value of 29.0% (GGDC formula 3), ANI of 83.41% and CGS of 97.16% confirm their assignment to different species (1.7.1A and 1.7.1F).

Two strains (CA=ATCC 33047 and 4-3) named as Anabaena sp. and found in the same environment (estuary, Port Aransas, Texas) lie in the monospecific genus 1.9.2; their draft genomes share 99.31% ANI, 94.4% isDDH and 99.93% 16S rRNA sequence identity and are therefore members of a single species.

The “Nostoc” strains for which sequenced genomes are available mostly fall into dispersed generic/specific clusters of the tree as single members, and show only low (around 22%) isDDH values. The 16S rRNA identity values are consistent with genome metrics in all cases, except for three clusters (described above).

Genus 1.8.1 contains 23 “Nostoc” strains represented by genomic sequences, 10 being complete, and two genomes from herbaria; their grouping into the same or different species is, in almost all cases, supported by both 16S rRNA sequence identity and isDDH and ANI values.

Nelson et al. (2019), in a tree of 100 genomes inferred with 834 unspecified single copy markers, described the complete genomes of four Nostoc cyanobionts, isolated from hornworts and a liverwort.

Page 25: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Nostoc sp. strains TCL240-02 and C057 lie in species 1.8.1A of the 16S rRNA tree, with Nostoc sp. strain C052 in species 1.8.1I. Nostoc sp. strain TCL26-01 is unrelated to the first three, falling into a different family (species 1.11.3A). The con-specificity of strains TCL240-02 and C057 is confirmed by genome metrics (isDDH 65.1%, ANI 93.19%), and the separation of strain C052 at the specific level bylower values (isDDH 37.2%, ANI 86.43%). Similarity measures involving strain TCL26-01 are more difficult to calculate, but the estimates of 14.6-14.7% isDDH and 75.84-75.92% ANI support generic separation. The isDDH results described above were obtained with GGDC formula 3. Unfortunately, the addition of strains C052, C057 and TCL26-01 severely perturbs the topology of the genome trees; these sequences have therefore been excluded.

Species 1.8.1A contains, in addition to Nostoc sp. strains TCL240-02 and C057, Nostoc sp. strains ATCC 53789 (represented by 2 sequences, 1 complete), N6, PCC 73102 (both represented by complete genomes, and strains 213, 232, UCD121, UCD122 and UIC 10630 (all with draft genomes). Nostoc sp. strain UCD120 also falls into this species on the basis of genome metrics, but lacks rrn operons and is not shown in the 16S rRNA phylogeny. The strains are all cyanobionts, except strain UIC 10630 which was isolated from soil. Also present are several strains named as Nostoc sp. (16S rRNA only). All are cyanobionts. For Nostoc sp. strains 213, 232, N6, UCD120, UCD121 and UCD122, genome metrics conflict either with their con-specificity within species 1.8.1A or their assignment to different genera when compared to the cyanobiont members of genus 1.8.2, as described above. The five remaining members of species 1.8.1A are truly con-specific, as shown by their ranges of 98.99-100% 16S rRNA sequence identity, 52.4-99.6% isDDH and 91.93-99.90% ANI. Their inter-specific relationship to other members of genus 1.8.1 is shown unambiguously by values of 97.38-98.79% 16S rRNA sequence identity, 35.8-37.8% isDDH, 86.36-87.13% ANI and 98.42-98.56% CGS. Strain N6 and Nostoc sp. 'Lobaria pulmonaria cyanobiont' strain 5183 (species 1.8.1S), both represented by complete genomes (Gagunashvili and Andrésson, 2018), are also placed into different species by all available measures (96.97% 16S rRNA sequence identity, 36.5% isDDH (GGDC formula 3), 87.12% ANI and 89.02% Mash similarity.

N. edaphicum strain CCNP1411 (complete genome) and Nostoc sp. strain KVJ20 (draft) appear to be con-specific in species 1.8.1F, showing 99.46% 16S rRNA sequence identity, 59.4% is DDH and 93.59% ANI, but we have shown that a small segment of the genome of strain CCNP1411 is chimeric. None of the other Nostoc cyanobionts (e.g. strains 0708 from Azolla (described in detail above) and KVJ20 from Blasia) are closely related to the members of genus 1.8.1.

The cyanobiont Nostoc sp strain 2RC lies in species 1.8.2E with the N. linckia strain Z series (13 genomes, all cyanobionts) plus the metagenome Nostoc sp. Moss3 (moss epiphyte). Additionally present are some Nostoc spp., Desmonostoc spp. and Dolichospermum spp. (represented by their 16S rRNA genes) from varied habitats, only some of which are cyanobionts. Strain 2RC shares 96.48% ANI(74.7% isDDH) with strain z1 and 96.3% ANI (74.4% isDDH) with strain Moss 3, and 99.66% 16S rRNA sequence identity with both. The genome metrics therefore agree with their con-specificity. Nostoc sp. Moss 3 itself shows low similarity to another metagenome from a similar habitat, Nostoc sp. Moss 2 (found in species 1.8.1AD), showing only 95.50% rRNA sequence identity. However, values of 82.31% ANI and 95.86% CGS poorly support this generic separation. A value of 28.9% isDDH with GGDC formula 2 should not be considered to be accurate in this range. The N. linckia strain Z series genomes are virtually identical in size (8.9-9.2 Mbp, deduced by addition of the lengths of their contigs) and exhibit 99.91-99.99% ANI, 99.7-99.9%isDDH. We have excluded the 9 most identical strains from the genome tree. The genomic sequence of strain NIES-25 (whose extracted 16S rRNA sequence lies is species 1.8.2G, only

Page 26: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

one of three being shown in the 16S rRNA tree) is incomplete, containing only 47 of our 79 marker genes, and has been excluded from the similarity calculations.

The genome of Nostoc sp. ATCC 43529 contains 2 16S rRNA genes, one falling into species 1.8.2H, the other into the bacterial outgroup. The genome shares 98.5% identity, 42.8% isDDH, 90.04% ANI, to that of Nostoc sp. Hyl-09-2-6 in species 1.8.2I. The separation of these strains into two species is therefore confirmed by all methods. The two species (including sequences obtained via normal PCR reactions) show internally 98.8-99.9% 16S rRNA sequence identity, separated at 98.5%. The genome of strain PA-18-2419, whose 16S rRNA sequence falls into species 1.8.2H, is chimeric; this has been excluded from the tree based on genomic sequences and from calculation of genome metrics.

The genome of Nostoc minutum NIES-26, like that of strain ATCC 43529, contains 2 16S rRNA genes, the first appearing in genus 1.8.6, the second as a single "genus" in family 1.13. Genomic sequences of close relatives of these strains are not available, therefore isDDH cannot be used to decide which sequences are correct.

Seven genomes of T. variabilis are in the mono-specific genus 1.10.1; 6 (strains 9RC, ARAD, FSR, N2B, PNB and V5, from Azolla caroliniana, A. feliculoides, A. pinnata and Azolla sp., Thiel et al. submitted) are cyanobionts, and 1 (strain 0441) from a thermal spring. Other genomes are: Anabaena sp. YBSO1 (terrestrial); Anabaena variabilis NIES-23 (no informationon habitat); in NCBI as A. variabilis (TITLE field of flatfile), but Trichormus (ORGANISM field); Nostoc sp. PCC 7937 (ATCC 29413, T. variabilis; UTEX 1444, A. variabilis) (freshwater); in NCBI as A. variabilis (TITLE field of flatfile), but Trichormus (ORGANISM field), but the RefSeq changed the TITLE to T. variabilis. Isolated as A. flos-aquae; received by the PCC from C.P. Wolk as A. variabilis. Sensitive to cyanophage N1, like PCC 7120 (thus closely related), but forms hormogonia, unlike PCC 7120; Nostoc sp. PCC 7120 (freshwater); the metagenomes Nostoc sp. Hyl-06-6-4 and Pleu-09-321, both of organisms epiphytic on moss. The genomes of Trichormus sp. strains 9RC, ARAD, FSR, N2B, PNB and V5 show 100% 16S rRNA sequence identity, 99.9% isDDH and 99.81-99.98 ANI with each other and are also identical to Nostoc sp. 7937, Anabaena sp. YBSO1 and T. variabilis 0041; they share slightly lower values of 98.92% 16S rRNA sequence identity with Nostoc sp. 7120, and 98.92% 16S rRNA sequence identity with Anabaena variabilis NIES-23, and are therefore con-specific. This is confirmed by isDDH (63.2-100%)and ANI (92.15-99.99%) values.

Nostoc sp. NIES-2111 and Nostoc sp. NIES-3756 (both terrestrial isolates) are con-specific (species 1.11.1A, 99.9% 16S rRNA sequence identity and 85.4% isDDH, 98.20% ANI). Within species 1.12.1A, Nostoc sp. HK-01, N. cycadae WK-1 and a strain named as Anabaenopsis circularis NIES-21 (= IAM M-4, Nostoc PCC 6720,) are con-specific, having identical 16S rRNA sequences; they show 82.6-84.9% isDDH and 97.44-97.96% ANI. However, Nostoc sp. strain PCC 7107, represented by a complete genome, shows a high (99.60%) rRNA sequenceidentity with the other 3 members, 62.1% isDDH with the complete genome of Nostoc sp. HK-01 (GGDC formula 3) and 47.0-47.1% isDDH with the remaining strains. The ANI values between strain PCC 7107 and the others are 91.13-91.43%. The genome metrics (Mash similarity 93.25%, ANI 91.3% and CGS 99.17%) between strains PCC 7107 and HK01 suggest these strains to be borderline members of a single species. Lachance (1981) demonstrated a wet-lab DDH value of 67% (ΔTm(e) 4°C) between strains Nostoc sp. PCC 7107and PCC 6720 (= A. circularis NIES-21), confirming their con-specificity.

Page 27: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

In conclusion, the 16S rRNA sequence similarities, isDDH and ANI values are in good agreement for defining species of this large clade of Nostocacean cyanobacteria. However, genome sequences of further Nostoc isolates are urgently required. Lachance (1981) first demonstrated by wet-lab DDH the wide dispersal of Nostoc spp. strains of the PCC, but several clusters were apparent. The first grouped strains PCC 6302, PCC 7121 and PCC 7422 with 52-56% DDH and high ΔTm(e) values of 9-16°C, which suggests these strains to be members of different species of the same genus; their PCR-derived 16S rRNA sequences arefound here in three species of genus 1.8.2. In the second, strains PCC 6720, PCC 7107 and PCC 7416 showed 69-93% DDH and low (0-4°C) ΔTm(e) and should be considered to be members of a single species; this is confirmed by the placement of their PCR-derived 16S rRNA sequences into species 1.12.1A of the tree. High (93-100%) DDH values with low (0-1°C ΔTm(e)) show the near-identity of strains of the third cluster (PCC 6411, PCC 6705, PCC 6719, PCC 7118, PCC 7119 and PCC 7120); their PCR-derived 16S rRNA sequences are found here within genus 1.10.1.

Of the few genome sequences of strains assigned to the genus Tolypothrix, strain Tolypothrix sp. PCC 9009 (in species 1.14.1E, draft genome) clusters with Hassallia byssoides strain VB512170, also represented by a draft genome sequence, in the same species (99.19% 16S rRNA sequence identity but low isDDH value of 40.9% and 87.14% ANI, based on only 31% coverage); note that the latter genome is contaminated by bacterial DNA and is far larger (13 Mbp) than expected (e.g. 8 Mbp for PCC 9009). Strain PCC 9009 also shows low relationship with Tolypothrix sp. NIES-4075 in species 1.14.1A (isDDH 38.2%, ANI 88.07%, 98.64% 16S rRNA sequence identity), confirming their assignment to different species. Strain Tolypothrix sp. PCC 7601 (in species 1.8.4A, draft genome) shows only 23.1% isDDH (not discriminatory in this range), 74.98% ANI (with 38% coverage), 94.42% CGS and 95.80% 16S rRNA sequence identity with strain PCC 9009, confirming them to be members of different genera. The members of species 1.8.4A, all represented by draft genome sequences, show 99.26-100% 16S rRNA sequence identity, 54.2-99.7% isDDH, 93.5-99.9% ANI and are therefore con-specific. Strain PCC 7601 is separated at the specific level from Nostoc sp. 106C (species 1.8.4B, draft genome), Calothrix spp. NIES-2098 (species 1.8.4C, complete genome)and NIES-2100 (species 1.8.4D, draft genome) by low (96.8-98.8%) 16S rRNA sequence identity, low isDDH values of 27.5-28.1% (not discriminatory in this range), ANI values of only 80.88-81.16% and CGS 96.65-97.99%; the incomplete genome sequence of Nostoc carneumNIES-2107, containing only 75 of our standard 79 genes marker set, has been excluded from DDH comparison and from the genome tree. Other strains, previously named as Calothrix, were studied in wet-lab DDH experiments by Lachance (1981) and assigned to the genus Tolypothrix; the strains then available formed several clusters, as we observe in the tree: strains PCC 7101, PCC 7504, PCC 7601 and PCC 7710 (species 1.8.4A of the tree) with strain PCC 7708 (in species 1.8.4B); strains PCC 6305 and PCC 6601 (species 1.8.3A).

The two strains of genus 1.9.2 represented by 16S rRNA genes extracted from their draft genomes, Anabaena sp. CA and 4-3, are almost identical, as seen from the available metrics (99.93% 16S rRNA sequence identity, an isDDH value of 94.4% and 99.31% ANI).

Of the genomes of strains named as Fischerella spp., Mastigocladus lamimosus, Hapalosiphon spp., Westiella intricata and Westiellopsis prolifica (genus 1.15.1), those of one member of species 1.15.1A are available, 5 for species 1.15.1G, 5 for species 1.15.1J and 1 for species 1.15.1K. Strains within a single species show isDDH values of 76.3-99.9%, with members of separate species having isDDH values of 41.0-46.9%. Within species 1.15.1G,

Page 28: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

the sequences of 5 strains assigned to three genera (Fischerella, Mastigocladus and Westiellopsis) share 99.71-100% 16S rRNA sequence identity, confirmed by isDDH values of 81.0-85.3% and 96.74-98.22% ANI, illustrating the current state of nomenclatural confusion. Note that, of the 2 rrn operons recovered from the genomic sequence of Westiella intricata strain UH HT-29-1, one falls into species 1.15.1J together with a sequence obtained by 16S rDNA-specific PCR; the other (from rrnA) lies in family 1.13. The other members of species 1.15.1J, named as Fischerella and Hapalosiphon spp., show, together with strain UH HT-29-1,16S rRNA sequence identities of 99.79-100%; the isDDH (76.3-77.2%) and ANI (96.59-100%)results are in perfect agreement with the 16S rRNA sequence identities. A similar situation occurs for genus 1.15.4 (where genome sequences are available for all of the 4 species of Fischerella): isDDH between all available genome sequences confirms the con-specificity or division of the strains at the specific level on the basis of 16S rRNA sequence identity. Note that of the sequences of species 1.15.4A, all those from White Creek, Yellowstone NP, USA have been omitted from the genome tree; in the absence of a publication, it is not clear whether they represent isolates or are metagenomes derived from identical environmental samples. Unfortunately, no Mastigocladus spp. genomic sequences are available for genus 1.15.4. Lachance (1981) in wet-lab thermal hydroxyapatite elution DDH studies previously demonstrated the dispersal of Fischerella spp. strains of the PCC into two specific clusters: strains PCC 7521, PCC 7522 and PCC 7523 with 87-91% DDH and 0-2 °C ΔTm(e) (here species 1.15.4A); strains PCC 7115, PCC 7414, PCC 7520 and PCC 7603 with 98-100% DDH and no change of ΔTm(e) (species 1.15.4D). However, Fischerella muscicola strain PCC 73103 showed 99.0% DDH, with 0 °C ΔTm(e), with members of our species 1.15.4D whereas this strain here falls into species 1.15.1A. The isDDH results (32.7%, not discriminatory in this range), 85.53% ANI and 98.24% CGS values confirm the assignment of strains PCC 73103 and PCC 7414 to different species. We conclude that an error involving strain provision or mislabeling occurred. Comparison of wet-lab and isDDH values for other strains is not possible, since there are no other results in common.

Strains named as Calothrix spp. are widely dispersed throughout order 1, with the majority falling into family 1.15. We show only 63 members in the 16S rRNA tree. Genera 1.15.7, 1.15.8, 1.15.14 and 1.15.15 are each divided into several species on the basis of their 16S rRNA sequence identities. Unfortunately, genomic sequences are available only for 7 species of Calothrix and 1 of Rivularia, representing 4 genera plus 1 singleton genus containing only one strain.

The complete genome of Calothrix sp. NIES-4101, containing 4 rrn operons of which the 16S rRNA of only one is shown, forms the singleton genus in family 1.15. Species 1.15.7B contains the single draft genomic sequence of Calothrix sp. PCC 7103. Species 1.15.7C, containing the complete genomic sequences of two strains (Calothrix sp. NIES-4071 and NIES-4105), is one of two clusters for which intra-specific comparison is possible. Both genomes contain 5 copies of the rrn operon; only 2 16S rRNA sequences are shown for each,sharing 99.87% sequence identity within each strain and 99.87-100% between strains. Species 1.15.7D contains 2 draft genome sequences of C. desertica PCC 7102. Monospecificgenus 1.15.13 is represented by the single complete genome of Calothrix sp. PCC 6303, which contains 4 rrn operons; we show only one 16S rRNA sequence. Species 1.15.14C contains the complete genome sequence of Calothrix sp. NIES-3974 which possesses 3 rrn operons, of which only one 16S rRNA sequence is shown. Species 1.15.15D is represented by the draft genome of C. parasitica strain NIES-267, containing 3 rrn operons, of which only

Page 29: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

one 16S rRNA sequence is shown, and also contains the complete genome of Rivularia sp. PCC 7116, of which only one 16S rRNA sequence of the 3 rrn operons is shown.

The complete genomes representative of 5 genera, Calothrix spp. NIES-4071, NIES-4101, PCC 6303 and Rivularia sp. PCC 7116 vary in size from 6.77-11.06 Mbp. Their extracted 16S rRNA genes show 90.77-92.79% sequence identity. The isDDH values of 13.1-14.5% and ANIvalues of 68.67-73.7% confirm the generic separation of these strains.

Within genus 1.15.7, strains of the 3 species have genomes of size 11.06-11.58 Mbp (measured as the sum of contig lengths for the draft genomes). They show 97.85-98.72% 16SrRNA sequence identity, 82.31-89.92% ANI and 97.54-97.57% CGS, confirming their specific separation. The DDH values of 29.2-29.5% (formula 2) are not discriminatory in this range Within species 1.15.7C, strains NIES-4071 and NIES-4105 are identical in genome size (11.06 Mbp); their complete genomes show 100% DDH and 100% ANI. The two genomes representing species 1.15.15D, whose metrics do not agree with the 16S rRNA identity value are described above.

Specific separation of strains PCC 7102 and PCC 7103 was shown in the wet-lab DDH resultsof Lachance (1981), whose data demonstrate (for the isolates available at that time) the division of this “genus” into multiple clusters: strains PCC 7103 and PCC 7713 (species 1.15.7B), showing 80% DDH with a ΔTm(e) of 1°C; and strain PCC 7102 (species 1.15.7D) with65% DDH and 5°C ΔTm(e) to the above; strains PCC 7709, PCC 7715 and PCC 7716 (85-86% DDH, ΔTm(e) 5°C, species 1.15.7A); strains PCC 7111, PCC 7116, PCC 7204, PCC 7426, PCC7711, PCC 7810 and PCC 7815, with lower DDH values of 40-60% and higher (13-15°C) ΔTm(e) (Rivularia species 1.15.15A, 1.15.15D and 1.15.15F). Low DDH values of only 10-27% were observed for strains PCC 6303 (genus 1.15.13), PCC 7507 (species 1.5.1B) and PCC 7714 (species 1.15.9A) in crosses with representatives of the above clusters. Strains PCC 7102 and PCC 7103 showed 52% DDH (ΔTm(e) 5°C), close to the isDDH value.

Members of the genus Richelia (1.15.18), N2-fixing endosymbionts of diatoms, divide into three species: the first contains strains HH01 and HM01 (67.9% isDDH, 99.92% ANI, 99.26% 16S rRNA sequence identity; the latter was excluded from the genome tree because it contains only 63 of 79 marker genes) together with the metagenomic sequence Richelia UBA3481, that shows 99.7% isDDH with strain HH01 (unfortunately, 16S rRNA sequences are not present in this genome); the second contains the single strain RC01, related to the first by only 96.98% 16S rRNA sequence identity (DDH and ANI values were not compared forthe incomplete genome sequence of this strain, and it was excluded from the genome tree); the third (a borderline species of the genus) is comprised of a strain named as Calothrix rhizosoleniae SC01, related to strain HH01 by 20.6% isDDH, 75.78% ANI and 96.84% 16S rRNA sequence identity. A further cluster, containing metagenomic sequences UBA3308, UBA3957 and UBA3958, is difficult to place due to the absence of 16S rRNA sequences in the genomes, but probably represents a distinct genus.

The remaining heterocystous cyanobacteria (genera 1.15.19 to 1.19.13) are represented by sequenced genomes of only 11 strains and one metagenome. Within monospecific genus 1.19.1, the draft genomes of Scytonema spp. strains HK-05 (2 sequences) and NIES-4073 show 99.06% 16S rRNA sequence identity, 91.94% ANI (with 65% coverage), 57% isDDH and98.89% CGS, and are therefore confirmed as members of a single species. The draft genomes of Brasilonema sennae CENA114 and B. octagenarum UFV-E1 are identical, with

Page 30: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

100% isDDH, 99.96% ANI and 100% CGS, and their extracted 16S rRNA genes are identical. Their genome sizes are 7.78-7.82 Mbp. These strains, despite their different specific names, are therefore members of a single species (1.19.4A). They show 93.1-93.2% isDDH and 99.11-99.14% ANI (with about 93% coverage) with the draft genome of B. octagenarum strain UFV-OR1, which is therefore a member of the same species. The 16S rRNA gene extracted from the latter genome, however, shares only 97.78% identity with those of the complete genomes; this gene sequence shows many mispairings in helices 36-39, indicative of the presence of foreign DNA, and does not correspond to the gene recovered from a 16S rRNA-specific PCR reaction, which nevertheless falls into the same species. Strain CENA114 shows40.7% isDDH and 98.52% 16S rRNA sequence identity with Scytonema tolypothrichoides VB-61278, showing them to be members of different species of a single genus (species 1.19.4A and 1.19.4C), despite their different appended generic names; ANI values, calculated for the latter strain after the removal of an excessive number of ambiguous sites, is 88.73%, confirming their assignment to different species. The metagenome labeled as Scytonema sp. RU_4_4 (species 1.19.4B), shares 96.77% 16S rRNA sequence identity and 33.4% isDDH with S. tolypothrichoides VB-61278, suggesting them to be members of different species, although the latter value may not be discriminatory in this region. However, the genome of strain RU_4_4 (7.33 Mbp from the combined lengths of the contigs) is potentially chimeric andhas been excluded from the genome tree; the genome of strain VB-61278 has a size (calculated as the combined lengths of the contigs) of 10.01 Mbp, of which 7.9% of the bases are ambiguous. JSpeciesWS is unable to calculate ANI values for this strain unless the ambiguous sites are removed; the ANI value between the genome of strain VB-61278 and themetagenome RU_4_4 is then 84.85%, confirming their specific separation. The remaining genomes represent strains identified, on the basis of 16S rRNA sequence identity, as members of different genera: Nostocales cyanobacterium HT-58-2 (complete genome, genus 1.19.5), Mastigocladopsis repens PCC 10914 (genus 1.19.3), S. hofmanni PCC 7110 (genus 1.19.6), S. millei VB511283 (genus 1.19.11) and Scytonema sp. strain UIC 10036 (a "loner" genus in family 1.19). A plasmid-borne 16S rRNA gene of strain HK-05 lies in genus 1.19.11. Note that three genomic sequences are available for "Scytonema millei" strain VB511283; one(JTJC00000000.1) is not of cyanobacterial origin or is chimeric; the single 16S rRNA sequence extracted from the second (JTJC00000000.2) lies with other Scytonema strains in genus 1.19.11 and the third (QVFW00000000) falls into Chroococcidiopsis species 3.1.14B, described below. The first sequence version of Tolypothrix campylonemoides VB511288 (JXCB00000000.1) contains no rrn operon and cannot be shown in the tree based on 16S rRNA sequences, but does group with the above strains in the genome tree; one of three 16S rRNA genes of the second version (JXCB00000000.2) unfortunately shows high similarity to Rhodopseudomonas spp., the others contain uncharacteristic segments and are impossible toalign.

Genomic sequences are available for Chamaesiphon minutus strain PCC 6605 (complete genome) and C. polymorphus CCALA 037 (draft) in genus 2.1.1. Their extracted 16S rRNA sequences share 98.58% identity and the entire genomes show 87.45% ANI (but with only ~ 54% coverage), 98.07% CGS and 36.8% isDDH, confirming their assignment to two species. Note that the given CGS value may be slightly inaccurate, since the genome of strain CCALA 037 contains only 77 of our 79 core marker genes, obliging us to use a distance matrix that does not exclude such strains.

Page 31: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Sequenced genomes exist for Moorea producens strains 3L (represented by two sequences, both draft), JHB (draft) and PAL-8-15-08-1 (complete), together with the draft sequence of M. bouillonii strain PNG5-198 (genus 3.1.3). Twelve metagenomic sequences have been excluded from this analysis. The two draft genome sequences of strain 3L show 99.90% isDDH and 99.99% ANI, their extracted 16S rRNA sequences being identical. They show 98.99% rRNA sequence identity and 67.7% isDDH, 94.64-94.65% ANI with M. producens strain JHB, which is therefore a member of the same species. The lower values of 98.65-98.79% rRNA sequence identity and 48.0-49.0% isDDH, 92.02-92.50% ANI with M. bouillonii PNG5-198 confirm their con-specificity.

Despite their very different geographic origins (Leao et al., 2017), strains JHB and PAL-8-15-08-1 share 99.06% rRNA sequence identity; however, they give only 47.6% isDDH, 98.64% CGS and 91.62% ANI (with 62% coverage). The genome metrics, unlike the high 16S rRNA sequence identity, suggest these strains to be borderline members of the same species. As found by the RepSeek programme, they contain an exceptionally high number of repeat sequences, as shown in the figure above.

The two M. producens genomes contain 3.85x105 and 4.44x105 repeat sequences (black lozenge symbols), many more than any other cyanobacterial strain. The complete genome of their nearest relative in the genome tree, Microcoleus sp. strain PCC 7113, contains only 3.35x103 repeat sequences. Only three of seven strains of Microcystis (red lozenge symbols) contain significantly more than 1x105, these being described in more detail in the appropriate paragraph. In the case of M. producens, one would expect this high number of repeats to prevent accurate determination of genetic relationships. The M. producens genomes contain acomplete hgl gene cluster and hetR, but not nif genes (Leao et al., 2017); their genomes, of size 9.37-9.67 Mbp, are not unlike those of many heterocystous cyanobacteria. Leao et al. suggested that these organisms may have evolved from a heterocystous organism, a situationanalogous to that observed in Raphidiopsis brookii D9. However, the latter organism lacks nifH and hglD, hetR being found in many cyanobacteria. If the hypothesis is true, one would expect members of the genus Moorea to cluster with the heterocystous cyanobacteria; this is not the case for any of the trees shown on this site.

0 2 4 6 8 10 120

50

100

150

200

250

300

350

400

450

Genome size (Mbp)

Nu

mb

er

of r

ep

ea

ts/1

00

0

Page 32: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Within genus 3.1.14, Chroococcidiopsis spp. PCC 7203, PCC 8201 (both having completed genome sequences) and SAG 39.79 (PCC 7433) fall into species 3.1.14A. They share 99.40-99.93% 16S rRNA sequence identity, 96.46-98.24% ANI and 69.6-75.1% isDDH values. The 16S rRNA gene extracted from a fourth Chroococcidiopsis sp. genome, PCC 7434, lies in a distinct species (3.1.14B), sharing only 98.12-98.39% 16S rRNA sequence identity with the members of species 3.1.14A; the genome metrics are 90.47-90.73% ANI and 43.7% isDDH. Species 3.1.14A also contains the draft genomic sequence of "Scytonema millei" VB511283 (QVFW00000000); the extracted 16S rRNA gene of this strain shows 97.52% sequence identity to the member of species 3.1.14B; the genome metrics are 39.6% isDDH and 89.7% ANI. This is the third version of the genome of this organism to be submitted (by the same authors) to NCBI; the first two (JTJC00000000.1 and JTJC00000000.2), showing 56.5% isDDH, seem to be chimeric (containing cyanobacterial and bacterial segments), although a 16S rRNA gene (truncated to 1086 nt at the end of a contig) extracted from the second groupswith other Scytonema sp. strains in genus 1.19.11. The latest version appears to be the genome of a Chroococcidiopsis sp. as evident from the similarity values given above, not of a heterocystous organism; this is supported by our NCBI BLAST studies using the extracted marker genes, that always find Chroococcidiopsis sp. PCC 7203 at identities greater than 90%, high ANI values with other members of genus 3.1.14, and the placement of this genome by CheckM and pplacer as a close relative of strain PCC 7203. The genome shares 91.3% isDDH with the first version but only 21.7% with the second. A third species of genus 3.1.14 contains the isolate named as unidentified cyanobacterium strain TDX16; the high degree of contamination of the genome sequence makes DDH studies pointless. The Chroococcidiopsisspp. of genus 3.1.14 are distinct from strain Chroococcidiopsis sp. PCC 6712 (genus 5.1.5.8), with which they share < 90% 16S rRNA sequence identity.

The complete genome of Gloeocapsa sp. PCC 7428 and the draft genome of Chroogloeocystis siderophila strain 5.2 s.c.1 (species 3.1.16B) show 99.40% 16S rRNA sequence identity, 92.55% ANI, 50.9% isDDH and 99.27% CGS. The 16S rRNA identity and genome metrics therefore assign them to a single species. The genomes of Gloeocapsopsis sp. AAB1 (represented by two identical sequences) and Chroococcales cyanobacterium IPPAS B-1203 share 98.59% 16S rRNA sequence identity, 86.54% ANI, 98.29% CGS and 34.3% isDDH, thereby representing two distinct species (3.1.16A, D) of the same genus. Chroococcidiopsis sp TS-821 shows 81.58-87.24% ANI, 97.48-97.49% CGS and 25.8-34.1% isDDH with the other species, thus forming a fourth (3.1.16C). This phylogenetically-defined genus therefore contains strains named as 4 different genera, plus one unidentified. All methods of similarity measurement agree for all clusters.

Three draft metagenomes of genus 4.2.1 (Roseofilum reptotaenium AO1-A and the unidentified cyanobacteria UBA1583 and UBA2566) appear to be con-specific, with 16S rRNAsequence identities of 99.66-100%, isDDH values of 57.0%-57.1% and 94.01-94.02% ANI. The UBA1583 and UBA2566 metagenomes show an isDDH value of 97.7% and 99.60% ANI. Five other metagenomes assigned to this genus, BLZD bin1, BLZ4 bin2, Guam bin12, UBA6050 and UBA6047, (not shown in the 16S rRNA-based tree, because they do not contain the gene) group with AO1-A, UBA1583 and UBA2566 with isDDH values of 97.4-99.5%. Four metagenomes, not identified to the generic level, have been excluded from the genome tree.

Four strains of Desertifilum sp. (FACHB-866, FACHB-868, FACHB-1129 and IPPAS B-1220) lie in cluster 4.3.1 and are identical in 16S rRNA sequence. They show 98.8-99.9% isDDH,

Page 33: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

99.95-100% CGS and 99.94-99.99% ANI, confirming their con-specifity. For comparison, they share only 92.44% 16S rRNA sequence identity, 68.96% ANI, 87.98% CGS and 19.7% isDDHwith their nearest relative, the metagenome Hormoscilla sp. GUM202, in genus 4.4.1. Genericseparation is therefore confirmed by all metrics.

Family 4.4 contains a single genus and named species, Hormoscilla spongeliae, whose members are all cyanobionts with various marine sponges as hosts. The metagenome GM102CHS1 does not contain a 16S rRNA gene The 16S rRNA genes extracted from the metagenomes GM7CHS1pb, GUM202, SP5CHS1 and SP12CHS1 share 98.99-100% sequence identity, assigning them to a single species (4.4.1C). Con-specificity (including GM102CHS1) is confirmed by ANI values of 93.74-99.94% and isDDH values of 55.2-98.6%. The five metagenomes divide into two clusters: (a) GM7CHS1pb with GUM202, with 99.94% ANI, isDDH 98.6%, and (b) GM102CHS1, SP5CHS1 and SP12CHS1 showing 92.88-93.22% ANI, 55.2-55.3% isDDH with the members of cluster (a). The three members of cluster (b) internally show 98.12-99.54% ANI, 83.2-83.8% isDDH. The additional GUM007 metagenome is chimeric and has been excluded from similarity estimation. The 16S rRNA sequence extracted from this genome shows 97.90-97.98% identity to the others, suggesting placement into a second species. Two other species of this genus are defined by 16S rRNA sequence identity in our tree, but are not represented by genomic sequences.

The two available genomic sequences of Gloeothece, Gloeothece citriformis PCC 7424 and Gloeothece verrucosa PCC 7822, both previously named as Cyanothece spp., form two genera (species 5.1.3.3C and 5.1.3.4A). Their completed genomes share only 17.4% isDDH (GGDC formula 3), 78.22% ANI (with 50% coverage), 80.39% Mash similarity, 91.52% CGS and 95.2% identity of their extracted 16S rRNA genes. The genome metrics values suggest that they are either borderline members of different genera or assignable to two species of a single genus.

Genus 5.1.3.5 contains two draft metagenomic sequences of pleurocapsalean taxa named asHydrococcus sp. These are identical in 16S rRNA sequence, show 99.61% ANI, 99.2% isDDHand share (with their nearest relatives) only 93.54% 16S rRNA sequence identity (70.0-70.18% ANI, 19.4% isDDH) with the complete genome of Gloeothece verrucosa PCC 7822 (genus 5.1.3.4) and 94.88% 16S rRNA sequence identity (76.39-76.64% ANI, 22.6% isDDH) with the singleton Pleurocapsa sp. PCC 7327. However, the two Hydrococcus sp. metagenomes are chimeric, and have been omitted from the genome tree.

Species 5.1.4.1B contains six strains (Crocosphaera watsonii WH 0003, WH 0005, WH 0401, WH 0402, WH 8501 and WH 8502) that share 99.64-100% 16S rRNA sequence identity. These strains show 63.3-93.7% isDDH, 98.29-99.58% ANI and 99.37-100% CGS; all genome metrics therefore confirm their con-specificity. This is unlike the situation in the Crocosphaera members of species 5.1.4.1B, described above.

Candidatus Atelocyanobacterium thalassa ALOHA (submitted as a complete genome, CP001842), contains only 77 of 79 core marker genes (lacking purM and rbcS) but two complete rrn operons and falls into species 5.1.4.2A, with SIO64986 (=UCYN-A, also lacking purM and rbcS and containing 2 complete rrn operons) lying in species 5.1.1.2B. The extracted 16S rRNA genes share 98.72% sequence identity; the genomes show 26.1% isDDH, 84.1% ANI, 94.05% CGS and cluster loosely in the genome tree. The isDDH value

Page 34: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

(GGDC formula 2) is ambiguous in this region, but the ANI and CGS values confirm specific separation of the two organisms represented by these metagenomes. CheckM finds them to be only 73.9% and 74.2% complete, respectively.

The complete genomes of the two endosymbionts in genus 5.1.4.3, unidentified endosymbiont of Epithemia turgida strain EtSB and cyanobacterium endosymbiont of Rhopalodia gibberula strain RgSB show 58.8% isDDH (GGDC formula 3) and 87.48% ANI; their extracted 16S rRNA genes share 98.58% sequence identity. These values are confirmedby the Mash similarity of 89.73% and CGS of 93.64%, indicating that the strains are members of two separate species. They are shown in the 16S rRNA trees in species 5.1.4.3A and 5.1.43C, respectively.

Within genus 5.1.4.5, the complete genomes of Rippkaea orientalis (formerly Cyanothece sp.)strains PCC 8801 and PCC 8802 are con-specific, showing 96.2% isDDH (GGDC formula 3) and 98.89% ANI, in good agreement with the 99.93% identity of their extracted 16S rRNA genes, Mash similarity of 99.17% and CGS of 99.86%.

The genus 5.1.4.6 contains the draft genome sequences of 2 organisms, Aphanothece sacrum strains FPU1 and FPU3. These show 100% 16S rRNA sequence identity, 99.99% ANIand 100% isDDH, and are therefore identical.

Two Stanieria spp. strains (NIES-3757 and PCC 7437, genus 5.1.5.10) with completed genomes share 98.72% 16S rRNA sequence identity, and fall into species 5.1.5.10B and 5.1.5.10A, respectively. Their genomes show 56.1% isDDH (GGDC formula 3), 89.64% ANI, 91.31% Mash similarity and 97.80% CGS, confirming their specific separation.

Three Prochloron didemni metagenomes (genus 5.1.6.1), from wide geographic origins, exhibit 99.70-100% 16S rRNA sequence identity, 96.90-99.19% ANI and isDDH values of 78.4-93.4%, confirming their membership of a single species. The genome P1 was omitted from the genome tree, since it is incomplete.

Geminocystis spp. strains NIES-3708, NIES-3709 and PCC 6308 (genus 5.1.7.1) are assigned to three species on the basis of their 16S rRNA sequence identities of 97.64-98.04%. This is confirmed by low identities of isDDH (25.8-27.4%), ANI (79.75-82.40%) and CGS (94.70-97.80%).

Separation of Cyanobacterium aponinum (represented by strain PCC 10605) and C. stanierii (strain PCC 7202) at the generic rather than specific levels is confirmed by both 16S rRNA sequence identity (93.40%), ANI (72.85%), isDDH (13.9%, GGDC formula 3), Mash similarity (75.70%) and CGS (85.80%) with their completed genome sequences; these strains fall into genera 5.1.7.4 (species 5.1.7.4A) and 5.1.7.6 (species 5.1.7.6A) in the tree. The three strains of species 5.1.7.4A, PCC 10605 (complete genome), 0216 (draft) and IPPAS B-1201 (draft), are almost identical (100% 16S rRNA sequence identity, 85.0-86.3% isDDH, 98.15-98.42% ANI). The potentially chimeric nature of the genome of strain 0216 cannot be excluded, since both 16S rRNA and 23S rRNA genes of one of two operons fall into the bacterial outgroup. Within genus 5.1.7.6, C. stanierii strain PCC 7202 is separated at the specific level from Cyanobacterium sp. strains IPPAS B-1200 and HL69, sharing 97.91% 16S rRNA sequence identity, 82.96-83.11% ANI and isDDH values of 27.1-27.2%. The latter two strains show

Page 35: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

99.46% 16S rRNA sequence identity, 95.81% ANI and an isDDH value of 66.2%. The complete genomes of the strains of the two genera show a marked difference in size (4.11 Mbp for C. aponinum PCC 10605, 3.16 Mbp for the two members of genus 5.1.7.6) and mol%GC (35.0 and 37.8-38.7, respectively).

Within genus 5.1.7.9 (“Synechococcus“ I), in which all strains are of marine origin, 2 species (5.1.7.9A and C) defined by 16S rRNA sequence identity (sharing only 98.05% identity) are supported by low isDDH (20.8-30.7%) and ANI (77.76-77.95%) values of their genomes. In species 5.1.7.9A, the six 16S rRNA sequences extracted from the respective completed genomes of unicellular strains (Synechococcus sp. strains PCC 7002, PCC 7003, PCC 7117, PCC 8807, PCC 11901 and PCC 73109) show a range of 99.53-100% identity; the isDDH values for these genomes are 77.6-93.7% (GGDC formula 3), with ANI estimates ranging from96.10 to 99.91%, CGS from 99.49 to 99.71% and Mash similarities from 97.28 to 97.60%. These strains are therefore confirmed as con-specific by all estimates. The two strains of species 5.1.7.9C, Synechococcus sp. NIES-970 and NKBG15041c, show 100% rRNA sequence identity, 98.51% ANI and 87.6% isDDH, and are therefore again con-specific.

Two strains of Spirulina, PCC 6313 and PCC 9445, carrying different specific epithets (major and subsalsa) are clearly separate genera, sharing only 92.4% 16S rRNA sequence identity, 69.26% ANI, 87.99% CGS and an isDDH value of 20.3%. Although isDDH values are not discriminatory in this range, the ANI and CGS values confirm the generic separation of these strains. They are shown in the 16S rRNA tree in genera 5.1.7.19 and 5.1.7.16, respectively.

As described on the main page, members of the genus Arthrospira (5.2.1.3) have been renamed to Limnospira (with L. fusiformis as the type species) since some uncultured organisms named as the type species of Arthrospira, A. jenneri, fall elsewhere in the tree (genus 5.3.3.1). We do not accept this change, based only on environmental samples. Of the many cyanobacterial strains assigned to the genus only 24 16S rRNA sequences, plus two extracted from the complete genome of L. fusiformis strain SAG 85.79 and the draft genome of L. fusiformis strain KN, are shown in the 16S rRNA trees. A second 16S rRNA sequence extracted from the metagenome Limnospira sp. RM-2019 contains an unlikely number of mismatches in paired regions and is not shown. The metagenome contains 2.98% ambiguouspositions and is not further discussed below. The strains of genus 5.2.1.3 are all closely related, sharing 99.13-100% 16S rRNA sequence identity, but appear to divide into two groups. The 18 strains of the first group exhibit 99.85-100% 16S rRNA sequence identity; the second contains 7 strains that internally show 99.71-100% 16S rRNA sequence identity. The two groups are related by 99.42-100% 16S rRNA sequence identity, suggesting that all members of Arthrospira can be assigned to a single species. This conclusion is supported by our genome trees based on 79 conserved marker genes, and by isDDH analysis of the available genomes. The 4 genomes of group 2 (strains NIES-39, Paraca, YZ and NIES-46), all draft, show 91.6-97.4% isDDH and their extracted 16S rRNA genes are identical, despite the diverse geographical origins of the strains (Africa, China, Mexico, Switzerland). The 9 available genomic sequences of group 1 (strains PCC 8005 [2 sequences], PCC 9438, TJSD092, O9.13F, CS-328, TJSD091 and "L. fusiformis" strains KN and SAG 85.79) (only 3 complete), again from diverse origins, show 89.8-99.2% isDDH; their extracted 16S rRNA sequences exhibit 99.80-100% identity. The genome sequence of strain YZ (CP013008) in group 2 shares 51.9% isDDH with those of strains PCC 8005 (FO818640), TJSD092 (CP028914) and L. fusiformis SAG 85.79 (CP051185) in group 1. ANI values of 93.37-99.10%, Mash similarities of 95.58-99.52% and CGS values of 98.56-98.59% confirm that all

Page 36: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Arthrospira strains (carrying 6 specific epithets) can be assigned to a single species, for which

we suggest the name A. platensis. Similar grouping is shown by Xu et al (2016) using a more restricted dataset, and these authors showed "unprecedented extensive chromosomal rearrangements" among the genomes. This is evidenced by our own analysis, showing the positions of the 79 conserved marker genes on 2 genomes representing the 2 groups (see figure, below). Note that strain O9.13F has been excluded from DDH calculation because the genome is only 84.1% complete as estimated by CheckM and contains only 72 of our 79 conserved marker genes. A. maxima strain CS-328, 96.6% complete and containing only 73 of the 79 marker genes, has also been excluded.

Within genus 5.2.1.1, genomic sequences of Planktothrix spp. are available only for 4 of the 10 species in the tree, some described in detail by Gaget et al. (2015), and only two (of strainsNIVA-CYA 126/8 and PCC 7805) are complete. The genomes of the 9 strains assigned to P. agardhii, P. rubescens and P. prolifica in species 5.2.1.1A can be distinguished neither by 16SrRNA sequence identity (99.25-100%) nor by isDDH, which gives values of 73.8-89.0% (GGDC formula 2); this cluster therefore corresponds to a single species, which we propose as P. agardhii. The cohesion of species 5.2.1.1A is further confirmed by isDDH of the 2 complete genomes (91.8%, GGDC formula 3), an ANI value of 98.67%, Mash similarity 99.11% and CGS 99.85%. The 2 members of species 5.2.1.1B (P. paucivesiculata) whose genome sequences are available share 99.52% 16S rRNA sequence identity. However, they show a relatively low (43.1%) isDDH value and only 90.41% ANI; there is therefore a conflict in the distance estimates obtained with these draft genomes by the different methods. The two strains differ from the first species by 16S rRNA sequence identities of 97.91% and 98.12%, isDDH values of 44.5% and 89.95-90.94% ANI. Within this species, the inclusion of further strains gives internal values of 98.74-99.51% 16S rRNA sequence identity. The genomic sequence of strain P. tepida PCC 9214 is sufficiently distinct from all others (isDDH values 27.4-32.6%) to support the assignment of this strain to a separate species (5.2.1.1H). Species 5.2.1.1I (P. serta) contains 2 strains, and a single genome sequence is available (strain PCC 8927). This shows only 29.1-32.6% isDDH with members of species 5.2.1.1A,

Page 37: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

5.2.1.1B and 5.2.1.1H, with which the strain shares 96.93-97.42% 16S rRNA sequence identity. Although genome sequences are not available for the remaining 6 species, separatedon the basis of 16S rRNA sequence identity, we briefly describe them here for completeness. Species 5.2.1.1C contains only strains assigned to P. spiroides; isolates named as P. pseudagardhii fall into two clusters separated at the specific level (5.2.1.1D and 5.2.1.1E), as do those assigned to P mougeotii (5.2.1.1F and 5.2.1.1G); species 5.2.1.1J contains two sequences of a single strain of P. iranica.

The 2 available genome sequences in species 5.2.1.5B, Lyngbya aestuarii BL J and Lyngbya sp. PCC 8106, appear to be con-specific, sharing 99.79% 16S rRNA sequence identity, 91.69% ANI and 46.8% isDDH.

Within genus 5.2.4.9, the 23 metagenomes are identical, with 100% 16S rRNA sequence identity, 93.1-100% isDDH, and separated at the generic level from the sister taxon QS_8_64_29 (90.26% 16S rRNA sequence identity, 19.3% isDDH, 67.57% ANI). These genomes are not shown in our genomic tree.

Geitlerinema strains FC II and PCC 7105 (species 5.2.4.7B) are identical on the basis of their 16S rRNA sequence identity (100%); their con-specificity is confirmed by values of 74.3% isDDH and 96.57% ANI of their draft genomes; the sequence of strain PCC 7105 contains only 75 of the 79 marker genes and has been excluded from the genome tree.

Oscillatoria nigro-viridis PCC 7112 and Microcoleus vaginatus strains FGP-2 and PCC 9802 inspecies 5.3.1.1A share a minimum of 99.87% 16S rRNA sequence identity, their genomes showing 49.4-96.8% isDDH and 91.53-99.97% ANI. The two M. vaginatus strains within this species show 96.8% isDDH and 99.97% ANI, their 16S rRNA sequences being identical. The genome metrics therefore confirm the con-specificity of these strains. M. vaginatus strain PCC9802 shows only 91.36% 16S rRNA sequence identity with Microcoleus sp. PCC 7113 in genus 3.1.10; generic separation is confirmed by values of 24.4% isDDH (not discriminatory at this low level), 68.63% ANI and 87.83% CGS. Strain PCC 7113 (species 3.1.10A) shares 98.72-98.79% 16S rRNA sequence identity with strains FACHB-1 and FACHB-53 in species 3.1.10B; these are borderline values for separation at the specific level, which is confirmed by 41.8% isDDH and 89.19-89.25% ANI. That strains FACHB-1 and FACHB-53 are con-specific is shown by values of 99.93% 16S rRNA sequence identity, 85.0% isDDH and 97.72% ANI. Microcoleus sp. PCC 7113 shows only 90.51% 16S rRNA sequence identity with Microcoleus sp. strain IPPAS B-353 in genus 5.2.4.8 (described in more detail with other halophiles,above); although isDDH (27.8%) may not be of great value within this range, generic separation is supported by values of 66.96% ANI and 85.81% CGS.

Eight metagenomic sequences each named as Oscillatoriales cyanobacterium in species 5.3.1.1B share 99.33-100% 16S rRNA sequence identity and 85.7-99.3% isDDH. Two further sequences, in species 5.3.1.1C, have identical 16S rRNA sequences and show 98.1% isDDH,99.85% ANI; they separate from the members of the preceding species with 98.84-99.30% 16S rRNA sequence identity, ~27.5% isDDH and ~85% ANI. These sequences have been omitted from our genomic tree.

Species 5.3.1.1E contains two metagenomes named as Microcoleaceae bacterium UBA9251 and UBA11344; these show 64.0% isDDH and 94.41% ANI, the extracted 16S rRNA genes

Page 38: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

sharing 99.4% sequence identity. The data therefore confirm the con-specificity of the uncultured organisms. However, the 16S rRNA sequence extracted from the UBA11344 genome is short (1115 nt), revealing the fragmented nature of this genome.

Within species 5.3.2.1A, the draft genomes of Kamptonema spp. strains PCC 6407 and PCC 6506 show 96.1% isDDH and 99.95% ANI, their 16S rRNA sequences being identical. Strain PCC 6407 shows only 27.2% isDDH (80.34% ANI) and 97.91% 16S rRNA sequence identity with the unidentified Oscillatoriales strain USR001, a member of the second species of this genus. The latter has been excluded from the genome tree, containing only 31 of our 76 conserved markers; the isDDH and ANI values are evidently approximate, since the genome is incomplete.

Two genomes from Synechococcus sp. strains PCC 7502 (complete genome) and PCC 9635 (draft genome) in the mono-specific genus 5.4.1.4, falling among the "Pseudanabaena" genera, share 99.73% 16S rRNA sequence identity but only 24.1% isDDH and 79.14% ANI. However, the genome metrics may be artificially low, since the genome of strain PCC 9635 contains extensive duplications, as detected by CheckM, additionally evidenced by the larger genome size (4.81 Mbp versus 3.51 Mbp for strain PCC 7502).

Two Limnothrix sp. sequences of species 5.4.2.1A (isolate PR1529 and the metagenome P13C2) are identical in 16S rRNA sequence, and show 99.9% isDDH, 99.96% ANI. They share 99.79% 16S rRNA sequence identity, 54.2% is DDH, 93.83% ANI and 99.45% CGS with version 2 of the Limnothrix metagenome CACIAM 69d (MKGP00000000.2); the genome metrics therefore confirm their con-specifity.

Although many genomic sequences of Leptolyngbya isolates and generic clusters thereof which have recently been renamed (e.g. Nodosilinea) are available, they fall as single members of separate genera and species; comparisons are therefore pointless, with the few following exceptions. The 2 strains lying in species 6.1.1B, the unidentified filamentous strainsCCP1 and CCT1, share 99.19% 16S rRNA sequence identity and only 22.0% isDDH, 68.07% ANI. Strain CCP1 was excluded from the genome tree since the genome sequence is incomplete, carrying only 58 of the 79 marker genes; this may explain the low DDH and ANI values. The two sequences in species 6.1.1M, Leptolyngbya antarctica ULC041 and Leptolyngbya sp. ULC073, are both metagenomic assemblages. They are identical in 16S rRNA sequence and show an isDDH value of 86.1% and 98.08% ANI, confirming their con-specificity. Halomicronema excentricum strain Lakshadweep and Lyngbya confervoides BDU141951 (species 6.1.5A) are similarly identical in 16S rRNA sequence and share 98.7% isDDH, 99.98% ANI. Unfortunately, the sequence of the former genome is chimeric and version 1 of the latter genome is 23.1% contaminated; both have been excluded from the genome tree. Version 2 of the Lyngbya confervoides BDU141951 genome is identical to version 1 in 16S rRNA sequence, has little contamination, and is included in the genome tree. In species 6.1.6A, two strains, Leptolyngbyaceae cyanobacterium CCMR0081 and CCMR0082, are identical in 16S rRNA sequence and show 87.8% isDDH, 98.27% ANI; they share 99.33% rRNA sequence identity, 63.2% isDDH and 94.30-94.37% ANI with Leptolyngbya sp. PCC 7375 in the same species. We have shown that the genomes of the two CCMR strains are chimeric; they are therefore excluded from the genome trees. The metagenomic Phormidesmis priestleyi Ana ITZX contains no rrn operons and cannot be included in the 16S rRNA tree. The 16S rRNA gene extracted from the genomic sequence of

Page 39: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Leptolyngbya sp. strain 2LT21S03, occupying a single branch within family 6.2, does not match the normal 16S rRNA sequence derived by PCR amplification (FM177494, family 6.3), sharing only 87.5% rRNA sequence identity; both are included in the 16S rRNA tree. The genome sequence of strain 2LT21S03 is too incomplete to be included in the genome tree. Strains Leptolyngbya sp. O-77 and Thermoleptolyngbya sp. PKUAC-SCTA183 (species 6.2.8A), both from hot or thermal springs, sharing 99.53% 16S rRNA sequence identity, are similar in genome size (5.48 and 5.52 Mbp) and GC content (55.9, 56.4 Mol%); their completegenomes show 66.7% isDDH, 89.65% ANI, 92.75% Mash similarity and 98.99% CGS, confirming their con-specificity Three strains of "Candidatus curcubocaldaceae cyanobacterium", Thermoleptolyngbya albertanoea ETS-08 and two strains identified only as "thermophilic cyanobacterium", not represented by genome sequences, also fall within species 6.2.8A; all members of this species were isolated from thermal environments. Within the mono-specific genus 6.3.1 (Leptolyngbya sensu stricto), the 5 strains of L. boryana (3 having completed genome sequences) are identical (100% 16S rRNA sequence identity) and show high values of 97.4-100.0% isDDH, 99.14-100% ANI. Although the sizes of the completegenomes are virtually identical (6.18-6.26 Mbp), the draft genomes are larger (7.19 and 7.26 Mbp). The low isDDH and ANI values are found with pairwise comparisons involving the two draft genomes, illustrating the inaccurate genome metrics found with such sequences. Isolates within this cluster are unfortunately named as 5 different genera and 7 species. StrainNIES-2135 carries a 16S rRNA gene on a plasmid, in addition to the chromosomal copy; both are identical. See Section B2 (above) for strains whose placement is ambiguous. Five metagenomes, not identified to the generic level, have been excluded from the genome tree.

The genus Phormidesmis is problematic, since strains bearing this name are found in genera of two families: 6.1.9, 6.1.10, 6.1.11, 6.3.18, 6.3.20 and 6.3.27, with 3 additional loner genera in family 6.3. In genus 6.3.18, the only cluster that contains strains represented by genomic sequences, two strains named as Phormidesmis (BC1401 and ULC007, the latter representedby 3 identical sequences), are assigned to different species (6.3.18A and 6.3.18B, respectively, showing only 97.31% 16S rRNA sequence identity) in the 16S rRNA tree. Values of 26.3% isDDH (not discriminatory in this range), 81.90% ANI and 97.50% CGS confirm specific separation. They show even lower values (93.5-94.1% 16S rRNA sequence identity, 20.0-21.9% isDDH, 68.34-68.48% ANI and 88.99-89.34% CGS) with the metagenomic sequence Alkalinema sp. CACIAM 70d, which therefore lies in a distinct genus (6.3.22). Interestingly, the latter was found in a freshwater lake, whereas the other sequences of Alkalinema in genus 6.3.22 were isolated from an alkaline environment. The genome sequence of Phormidesmis priestleyi strain Ana ITZX contains neither 16S rRNA nor 23S rRNA genes and cannot be shown in the 16S rRNA trees; this sequence is clearly also a member of a different genus to strains BC1401 and ULC007, showing only 19.6-20.1% isDDH, 66.79-66.85% ANI and 83.80-84.14% CGS. The metagenome CACIAM 70d and strainAna ITZX, showing 66.43% ANI and 62.65% CGS, are themselves assigned to different genera. Two genome sequences of Phormidesmis priestleyi strain ULC027 are incomplete and lack both 16S rRNA and 23S rRNA genes; they are shown neither in the trees based on 16S rRNA nor in the genome tree.

The mono-specific genus 7.1.4 contains the metagenomic sequences Aphanocapsa feldmannii 277cI and 277cV. These are identical on the basis of 16S rRNA sequence identity and show 90.9% isDDH, 98.27% ANI. However, the genome sequence of 277cI is incomplete,containing only 70 of our 79 core marker genes; this is reflected by a large difference in apparent genome size (impossible to determine precisely with draft genomes), that of 277cV

Page 40: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

being 2.55 Mbp versus only 1.92 Mbp for 277cI. The latter sequence has been excluded from the genome tree. Two uncultured organisms, for which genomic sequences are not available, also fall into this genus on the basis of 16S rRNA sequence identity. The 16S rRNA genes extracted from five metagenomes of Candidatus Synechococcus spongiarum and one named as Synechococcus sp. (mono-specific genus 7.1.5) share a minimum of 99.12% identity. The genomes vary in size (estimated as the combined lengths of their contigs) from 1.45 Mbp to 2.27 Mbp, resembling many of the Prochlorococcus and the Synechococcus OMF sister group described above. Of the genomes common to both 16S rRNA and genomic trees, we can compare only 3, all Candidatus Synechococcus spongiarum metagenomes: 15L, bin9 and142. The genome 15L is identical in 16S rRNA sequence to bin9; the two genomes show 91.0% isDDH and 99.24% ANI, confirming their con-specificity. Although 15L is suggested by 16S rRNA identity of 99.33% to be con-specific with 142, the ANI value of 86.35% indicates that the organisms represented by these metagenomes should be assigned to different species. The value of 32.2% obtained with isDDH formula 2 is ambiguous in this region. Although both metagenomes lack 2 of our 79 core marker genes, the missing genes are different: 15L lacks gmk and rimM, whereas 142 lacks purM and smpB. This indicates that thetwo genomes lack different segments, which will decrease the ANI value, in contrast to the Candidatus Atelocyanobacterium thalassa metagenomes of genus 5.1.4.2 which both lack thesame pair of marker genes (purM and rbcS). In showing 94.60% 16S rRNA sequence identity,19.3% isDDH and 69.70% ANI with 15L, strain A. feldmannii 277cV is confirmed as a memberof a distinct genus. As described in the discussion of the Synechoccocus OMF clade above, and shown in the figure, Synechococcus spongiarum was suggested to be re-named to Synechospongium by Salazar et al. (2020).

In the mono-specific genus 7.2.1, Synechococcus elongatus strains PCC 6301 and UTEX 3055, together with Synechococcus sp. strains PCC 7942 and UTEX 2973 (all complete genome sequences) and FACHB-1061 (draft sequence) are identical, showing 100% 16S rRNA sequence identity, 96-100% isDDH, 98.31-99.99% ANI. 98.74-99.93% Mash similarity (for the complete genomes only) and 99.72-99.85% CGS. Strain UTEX 2973 is a mutant of strain UTEX 625, of which the axenic isolate is strain PCC 6301, therefore the identity values for these strains are not surprising. Strain UTEX 3055 was isolated from the same habitat as strain PCC 6301 (Waller's Creek, Texas, USA), but over 50 years later. We have been unable to confirm the origin of strain PCC 7942. Strain PCC 11802 (draft genome sequence) is not shown in the 16S rRNA tree since the genome lacks 16S rRNA genes; values of 25.5% isDDH, 82.20% ANI and 98.36% CGS place this strain and PCC 6301 into different species. The draft genome of strain PCC 11801 shows only 52.4% isDDH, 98.40% CGS and 82.19% ANI with strain PCC 6301; the genome of strain PCC 11801 contains only a short (618 nt) 16SrRNA fragment that shares 99.35% identity with that of strain PCC 6301 over the comparable region. Since the strain is similar in genome size to the other members of this cluster (2.69 Mbp to 2.77 Mbp), the low genome metric values must indicate that it has both duplicated andmissing regions. Strain FACHB-242 is UTEX 625, from which the axenic strain PCC 6301 wasderived; the draft genome has been excluded from the genome tree and genome metric calculations. Prochlorothrix hollandica strains CALU 1027 and PCC 9006, which are identical isolates held in different culture collections, in the single species of genus 7.3.1, share 100% rRNA sequence identity. Since the genome sequence of strain CALU 1027 is incomplete, a DDH value was not computed and the sequence was omitted from the genome tree.

Page 41: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Unlike the situation in Thermosynechococcus described in Section B2 above, a second cluster of thermophilic Synechococcus spp. show a speciation pattern that is coherent with all methods of homology measurement. The separation of Synechococcus spp. JA-3-3Ab (species 10.1.1A) and JA-2-3B’a (species 10.1.1B) into two species is justified by their low (96.65%) 16S rRNA sequence identity, isDDH (44.9%), ANI (84.61%), Mash similarity (89.44%) and CGS (94.16%) values of their complete genome sequences. Strain JA-3-3Ab shares 99.86-100% 16S rRNA sequence identity and isDDH values of 82.9-93.4% with the draft genomes of 6 other thermophilic members of the same species. The distance estimates from the different methods therefore agree. This is also true for the Thermoleptolyngbya spp. described above.

Within species 8.2.3A, Acaryochloris spp. CCMEE 5410 (draft genome) and MBIC11017 (complete genome) share 99.33% 16S rRNA sequence identity, 53.2% isDDH, 92.83% ANI and 99.38% CGS. Acaryochloris thomasi RCC1774 (draft), despite the generic name, has a unique pigment composition (lacking Chl d, a pigment characteristic of the genus Acaryochloris) and is clearly a member of a distinct genus, showing only 94.53-94.66% 16S rRNA sequence identity and around 20.8% isDDH, 68% ANI with the 2 members of genus 8.2.3. Note that three metagenomes (CRU 2 0, RU 4 1 and SU 5 25) are incomplete, containing respectively 74, 73 and 37 of our 79 core markers), and have been excluded from the genome tree and genome metric calculations.

Two metagenomes, Candidatus Aurora vandensis MP9P1 and Candidatus A. vandensis LV9, were described by Grettenberger et al. (2020) from benthic mats in Lake Vanda, the McMurdoDry Valleys, Antarctica, and found to be closely related to Gloeobacter spp. The genomes are smaller in size (3.07, 2.96 Mbp) and have a lower mol% G+C content (55.3, 55.4) than the Gloeobacter strains. The genome of organism MP9P1 contains only 76 of our 79 core marker gene set, and has been excluded from the genome tree. The single 16S rRNA gene places the organism into genus 11.2.1, grouping loosely with the Gloeobacter strains (genus 11.1.1), with which it shares only 91.46-91.89% 16S rRNA sequence identity. The genomic sequence of organism LV9 does not contain a 16S rRNA gene but does contain 78 of the marker genes and is included in our genome tree. The two Candidatus Aurora vandensis genomes together show 99.90% ANI and 99.9% isDDH, but only 65.75-65.90% ANI and 21.8-24.4% isDDH with the Gloeobacter strains, confirming their assignment to a distinct genus.

C: Conclusions.Return to Contents R

There is good agreement between the 16S rRNA sequence identity and genome metrics obtained for many of the strain clusters. Discrepancies mostly involve draft genomes; the possible incomplete nature or contamination of these cannot be excluded.

● Using only the 16S rRNA gene as marker, we can show the incomplete status of many draft genomes, lacking this gene, including those of Cyanobium sp. CACIAM 14 (JMRP00000000), Prochlorococcus marinus SCGC AAA795-J16 (CVSX00000000), Prochloron didemni P2-Fiji (from JGI), Hydrococcus rivularis NIES-593

Page 42: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

(MRCB00000000), Hydrocoleum_sp._CS-953 (LGSU00000000), Nostoc calcicola FACHB-389 (MRBZ00000000), Calothrix elsteri CCALA 953 (NTFS00000000), Calothrix sp. NIES-2101 (MRCD00000000), Phormidium ambiguum NIES-2119 (MRCE00000000), Leptolyngbya valderiana BDU 20041 (LSYZ00000000), Leptolyngbya sp. JSC-1 (JMKF00000000), Trichodesmium thiebautii H9-4 (LAMW00000000), Anabaena sp. 39858 (LJOR00000000), Anabaena sp. CRK533 (LJOT00000000), Anabaena sp. AL93 (LJOU00000000), Cylindrospermopsis sp. CR12(LMVE00000000), Pseudanabaena sp. SR411 (NDHW00000000), Microcystis spp. strains LE013-01 (MTBU00000000), LSC13-02 (MTBT00000000), LE3 (MTBS00000000), Cyanobacteria bacterium UBA791 (DBIX00000000), all of which completely lack the gene. Additionally, many strains contain only a short fragment of the gene, including: Crocosphaera watsonii WH 0003 (AESD00000000), Gloeocapsa sp. PCC 73106 (ALVY00000000), Synechococcus sp. CB0101 (ADXL00000000), Synechocystis sp. PCC 7509 (ALVU00000000), Leptolyngbya sp. Heron Island (JAWNH00000000) and the Candidatus Synechococcus spongiarum isolates mentioned above.

● Contamination is evident when at least one copy of the 16S or 23S rRNA gene is that of a bacterium or of an unrelated cyanobacterium, such as: Aphanocapsa montana BDHKU210001 (JTJD00000000), Hydrococcus rivularis NIES-593 (MRCB00000000), Lyngbya confervoides BDU141951 (JTHE00000000), Phormidium tenue NIES-30 (MRCG00000000), Prochlorothrix hollandica CALU 1027 (AJTX02000000), Anabaena sp. MDT14b (LJOV00000000), Chrysosporum ovalisporum UAM-MAO (CDHJ00000000), Scytonema tolypothrichoides VB-61278 (JXCA00000000), Tolypothrix campylonemoides VB511288 (JXCB00000000), Mastigocladus laminosus UU774 (JXIJ00000000).

● The above examples use only the rrn operon to demonstrate incomplete status. The CheckM programme finds additional incomplete genomes, not listed above, for example: Synechococcus sp. GFB01 (LFEK00000000) 84.81% complete, Aphanizomenon flos-aquae 2012/KM1/D3 (JSDP00000000) 87.52%, Calothrix sp. XPORK 5E (from JGI) 81.69%, unidentified endosymbiont of Epithemia turgida (AP012549) 84.86%, Prochlorococcus sp. HOT208 60m 813E23 (MWOT00000000) 25.86%, unidentified cyanobacterium 13 1 40CM 2 61 4 (MNHI00000000) 13.79%. We note that the genome of the Epithemia turgida symbiont) was deposited in NCBI as complete.

● CheckM also finds contamination in many genomes; where this exceeds 10%, the sequences have been excluded from the genome tree. These include: Fischerella thermalis CCMEE 5319 (NMQD00000000), Leptolyngbya valderiana BDU 20041 (LSYZ00000000), Limnoraphis robusta CS-951 (LATL00000000), Microcystis aeruginosa strain SPC777 (ASZQ00000000), Mastigocoleus testarum BC008 (AXAQ00000000). Other genomes are heavily contaminated with bacterial DNA and fall into the bacterial outgroup of genome trees, for example: Aphanocapsa montana BDHKU210001 (JTJD00000000, version 1), Chrysosporum ovalisporum UAM-MAO (CDHJ00000000), Cyanobacterium TDX16 (NDGV00000000), Fischerella ambigua UTEX 1903 (obtained from author), Oscillatoriales cyanobacterium MTP1 (LNAA00000000), Scytonema millei VB511283 (1 of 2 genomic sequences: JTJC00000000.1). A full listing is given in the excluded-genomes document.

Page 43: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

There are almost certainly other genomes in our dataset that we are unable to detect as incomplete, and some may contain undetected contaminant DNA sequences. Other possible causes of conflict include marked differences in genome size or mol% G+C content, and major or multiple deletions. All of these parameters will reduce the genome metrics values. Where relatedness estimates based on 16S rRNA sequence identity and genome metrics diverge dramatically, one or both of the genomes compared is often in the draft stage.

Leptolyngbya sp. PCC 7376 is a special case worthy of further investigation; this may be the first clear example of HGT involving the acquisition, by a filamentous organism, of the 16S rRNA gene from a unicellular organism (see discussion of species 5.1.7.9, above). The alternative possibility, that an ancestor was unicellular, is equally plausible.

In addition, the con-specificity of Synechocystis sp. strains PCC 6714 and PCC 6803 (99.60%16S rRNA sequence identity) is not confirmed by genome metrics of their complete genome sequences; although an HGT event involving the ancestors of both groups may have occurred, further studies are required to confirm this possibility. Also, the planktonic cyanobacteria of the genera Prochlorococcus, the Synechococcus OMF sister group of Prochlorococcus and the halophiles show major conflicts between genetic distances obtained with measurement of 16S rRNA sequence identity and genome metrics. These are explained in the relevant paragraphs on this page. The two isolates of Gloeobacter are also unusually divergent; we have no explanation for this except for the possible contamination of the genome of strain JS1, mentioned above.

We should bear in mind, the statement of Hayashi Sant’Anna et al. (2019) [where the term "dDDH" means isDDH] "The species circumscription threshold of many genomic metrics such as ANI and dDDH was defined using the wet-lab DNA-DNA hybridization criterion as reference, which in turn was defined by observing phenotypic coherence of strains. Paradoxically, along the years studies exploring bacterial diversity showed that genomic cohesion is not always accompanied by phenotypic homogeneity. Therefore, it is clear that genomic metrics are not a panacea, and they could not be appropriate in some cases, particularly when circumscribing a genomospecies (species defined based on genome data), whose members have undergone ecological diversification and substantial genome modifications (e.g. reduction), forming subgroups that could be interpreted as distinct species".

D: References.Return to Contents R

Achaz, G., Boyer, F., Rocha, E.P.C., Viari, A. and Coissac, E. (2006). Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics 23: 119-121.

Chisholm, S.W., Frankel, S.L., Goericke, R., Olson, R.J., Palenik, B., Waterbury, J.B., West-Johnsrud, L. and Zettler, E.R. (1992). Prochlorococcus marinus nov. gen. nov. sp.: An

Page 44: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

oxyphototrophic marine prokaryote containing divinyl chlorophyll a and b. Arch Microbiol 157: 297–300.

Coutinho, F., Tschoeke, D.A., Thompson, F. and Thompson, C. (2016a). Comparative genomics of Synechococcus and proposal of the new genus Parasynechococcus. PeerJ 4: e1522.

Coutinho, F., Dutilh, B., Thompson, C. and Thompson, F. (2016b). Proposal of fifteen new species of Parasynechococcus based on genomic, physiological and ecological features. Arch. Microbiol. 198: 973–986.

Driscoll C.B., Meyer K.A., Šulčius S, Brown N.M., Dick G.J. and 8 other authors (2018). A closely-related clade of globally distributed bloom-forming cyanobacteria within the Nostocales. Harmful Algae 77: 93–107.

Gaget, V., Welker, M., Rippka, R. and Tandeau de Marsac, N. (2015). A polyphasic approach leading to the revision of the genus Planktothrix (Cyanobacteria) and its type species, P. agardhii, and proposal for integrating the emended valid Botanical taxa, as well as three new species, Planktothrix paucivesiculata sp. nov.ICNP, Planktothrix tepida sp. nov.ICNP, and Planktothrix serta sp. nov.ICNP, as genus and species names with nomenclatural standing under the ICNP. Systematic and Applied Microbiology 38: 141–58.

Gagunashvili, A.N. and Andrésson, Ó.S. (2018). Distinctive characters of Nostoc genomes in cyanolichens. BMC Genomics;19. Epub ahead of print. DOI: 10.1186/s12864-018-4743-5.

Giovannoni, S.J., Cameron Thrash,J. and Temperton, B. (2014). Implications of streamlining theoryfor microbial ecology. The ISME Journal 8: 1553–1565.

Grettenberger, C.L., Sumner, D.Y., Wall, K., Brown, C.T., Eisen, J.A. and 4 other authors (2020).A phylogenetically novel cyanobacterium most closely related to Gloeobacter. The ISME Journal; Epub ahead of print. DOI: 10.1038/s41396-020-0668-5.

Hayashi Sant’Anna, F., Bach, E., Porto, R.Z., Guella, F., Hayashi Sant’Anna, E., and Passaglia, L.M.P. (2019). Genomic metrics made easy: what to do and where to go in the new era of bacterial taxonomy. Critical Reviews in Microbiology 45:1 82–200.

Humbert, J.-F., Barbe, V., Latifi, A., Gugger, M., Calteau, A., and 9 other authors (2013). A Tribute to Disorder in the Genome of the Bloom-Forming Freshwater Cyanobacterium Microcystis aeruginosa. PLoS ONE 8: e70747.

Jain, C., Rodriguez, R.L.M., Phillippy, A.M., Konstantinidis, K.T. and Aluru, S. (2018). High

throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature

Communications 9: 5114.

Komárek, J. and Anagnostidis, K. (1989). Modern approach to the classification system of

cyanophytes. 4 - Nostocales. Arch Hydrobiol Suppl. 82: 247–345.

Page 45: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Kopf, M., Klahn, S., Pade, N., Weingartner, C., Hagemann, M., Voss, B. and Hess, W.R. (2014a). Comparative genome analysis of the closely related Synechocystis strains PCC 6714 and PCC 6803. DNA Research 21: 255–266.

Kopf, M., Klahn, S., Voss, B., Stuber, K., Huettel, B., Reinhardt, R. and Hess, W.R. (2014b). Finished Genome Sequence of the Unicellular Cyanobacterium Synechocystis sp. Strain PCC 6714. Genome Announcements 2: e00757-14.

Lachance, M.A. (1981). Genetic relatedness of heterocystous Cyanobacteria by Deoxyribonucleic Acid-Deoxyribonucleic Acid reassociation. International Journal of Systematic Bacteriology 31: 139–147.

Laloui, W., Palinska, K.A., Rippka, R., Partensky, F., Tandeau de Marsac, N., Herdman, M. and Iteman, I. (2002). Genotyping of axenic and non-axenic isolates of the genus Prochlorococcus and the OMF-’Synechococcus’ clade by size, sequence analyses or RFLP of the Internal Transcribed Spacer of the ribosomal operon. Microbiology 148: 453–465.

Leao, T., Castelão, G., Korobeynikov, A., Monroe, E.A., Podell, S., Glukhov, E., Allen, E.E., Gerwick, W.H. and Gerwick, L. (2017). Comparative genomics uncovers the prolific and distinctive metabolic potential of the cyanobacterial genus Moorea. Proceedings of the National Academy of Sciences 114: 3198–3203.

Li, X., Huang, Y. and Whitman, W.B. (2015). The relationship of the whole genome sequence identity to DNA hybridization varies between genera of prokaryotes. Antonie van Leeuwenhoek 107: 241–249.

Mikhodyuk, O.S., Zavarzin, G.A. and Ivanovsky, R.N. (2008). Transport systems for carbonate in the extremely natronophilic cyanobacterium Euhalothece sp. Microbiology 77: 412–418.

Miscoe, L.H., Johansen, J.R., Kociolek, J.P., Pietrasiak, N., Sherwood, A.R. and Vaccarino, M.A. (2016). Diatom flora and cyanobacteria from caves on Kauai, Hawaii. Bibliotheca Phycologica 120: 3-152.

Nakamura, Y., Kaneko, T., Sato, S., Mimuro, M., Miyashita, H. and 14 other authors (2003). Complete genome structure of Gloeobacter violaceus PCC 7421, a cyanobacterium that lacks thylakoids. DNA Research 10: 137–145.

Nelson, J.M., Hauser, D.A., Gudiño, J.A., Guadalupe, Y.A., Meeks, J.C., Allen, N.S., Villarreal, J.C. and Li, F.-W. (2019). Complete Genomes of Symbiotic Cyanobacteria Clarify the Evolution of Vanadium-Nitrogenase. Genome Biol. Evol. 11: 1959–1964.

Ondov, B.D., Treangen, T.J., Melsted, P., Mallonee, A.B., Bergman, N.H., Koren, S. and Phillippy,A.M. (2016). Mash: fast genome and metagenome distance estimation using MinHash. Genome Biology 17: 132.

Page 46: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Österholm, J., Popin, R.V., Fewer, D.P. and Sivonen, K. (2020). Phylogenomic Analysis of Secondary Metabolism in the Toxic Cyanobacterial Genera Anabaena, Dolichospermum and Aphanizomenon. Toxins 212: 248.

Ozer, E.A., Allen, J.P. and Hauser A.R. (2014). Characterization of the core and accessory genomes of Pseudomonas aeruginosa using bioinformatic tools Spine and AGEnt. BMC Genomics 15: 737.

Palmer, M., Steenkamp, E.T., Blom J., Hedlund, B.P. and Venter, S.N. (2020). All ANIs are not created equal: implications for prokaryotic species boundaries and integration of ANIs into polyphasic taxonomy. International Journal of Systematic and Evolutionary Microbiology. Epub ahead of print. DOI: 10.1099/ijsem.0.004124.

Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. and Tyson, G.W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research 25: 1043–1055.

Rajaniemi, P., Hrouzek, P., Kastovska, K., Willame, R., Rantala, A. and 3 other authors (2005a). Phylogenetic and morphological evaluation of the genera Anabaena, Aphanizomenon, Trichormus and Nostoc (Nostocales, Cyanobacteria). International Journal of Systematic and Evolutionary Microbiology 55: 11–26.

Rajaniemi, P., Komárek, J., Willame, R., Hrouzek, P., Kaštovská, K., Hoffmann, L. and Sivonen, K. (2005). Taxonomic consequences from the combined molecular and phenotype evaluation of selected Anabaena and Aphanizomenon strains. Algological Studies 117: 371–391.

Ran, L., Larsson, J., Vigil-Stenman, T., Nylander, J.A.A., Ininbergs, K. and 5 other authors (2010). Genome erosion in a nitrogen-fixing vertically transmitted endosymbiotic multicellular Cyanobacterium. PLoS ONE 5: e11486.

Richter, M. and Rosselló-Móra, R. (2009). Shifting the genomic gold standard for the prokaryotic species definition. Proceedings of the National Academy of Sciences 106: 19126–19131.

Rippka, R., Deruelles, J., Waterbury, J.B., Herdman, M. and Stanier, R.Y. (1979). Generic assignments, strain histories and properties of pure cultures of Cyanobacteria. Journal of general Microbiology 111: 1–61.

Salazar, V.W., Thompson, C.C., Tschoeke, D.A., Swings, J., Mattoso, M. and Thompson, F.L. (2020). Insights on the taxonomy and ecogenomics of the Synechococcus collective.

bioRxiv;submitted. Epub ahead of print. DOI: 10.1101/2020.03.20.999532.

Saw, J.H.W., Schatz, M., Brown, M.V., Kunkel, D.D., Foster, J.S. and 6 other authors (2013). Cultivation and Complete Genome Sequencing of Gloeobacter kilaueensis sp. nov., from a Lava Cave in Kilauea Caldera, Hawai’i. PLoS ONE 8: e76376.

Stackebrandt, E. and Goebel, B. M. (1994). Taxonomic note: A place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Bacteriol. 44: 846–849.

Page 47: The clusters, based on 16S rRNA identities and compared with …cyanophylogeny.scienceontheweb.net/pdf/clusters.pdf · 2020. 7. 8. · B: 16S rRNA sequence identities and genomic

Stucken, K., John, U., Cembella, A., Murillo, A.A., Soto-Liebe, K. and 5 other authors (2010). Thesmallest known genomes of multicellular and toxic Cyanobacteria:comparison, minimal gene sets for linked traits and the evolutionary implications. PLoS ONE 5: e9235.

Thompson, C.C., Silva, G.G.Z., Vieira, N.M., Edwards, R., Vicente, A.C.P. and Thompson, F.L. (2013). Genomic Taxonomy of the Genus Prochlorococcus. Microbial Ecology 66: 752–762.

Tschoeke, D., Vidal, L., Campeão, M., Salazar, V.W., Swings, J., Thompson, F. and Thompson, C.(2020). Unlocking the genomic taxonomy of the Prochlorococcus collective. bioRxiv. Epub ahead of print. DOI: 10.1101/2020.03.09.980698.

Walter, J.M., Coutinho, F.H., Dutilh, B.E., Swings J., Thompson, F.L. and Thompson, C.C. (2017). Ecogenomics and Taxonomy of Cyanobacteria Phylum. Frontiers in Microbiology8: 2132.

Walter, J.M., Coutinho, F.H., Leomil, L., Hargreaves, P.I., Campeão, M.E. and 13 other authors (2020). Ecogenomics of the Marine Benthic Filamentous Cyanobacterium Adonisia. Microbial Ecology: Epub ahead of print. DOI: 10.1007/s00248-019-01480-x

Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, P., Krichevsky, M.I., Moore,L.H. and 6 other authors (1987). Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics. Int. J. Syst. Bacteriol. 37: 463–464.

Xu, T., Qin, S., Hu, Y., Song, Z., Ying, J., Li, P., Dong, W., Zhao, F., Yang, H., and Bao, Q. (2016). Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity. DNA Research 23: 325-338.

Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H., Whitman, W.B., Uzébey, J., Amann, R. and Rosselló-Móra, R. (2014). Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology 12: 635-645.


Recommended