General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors andor other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights
Users may download and print one copy of any publication from the public portal for the purpose of private study or research
You may not further distribute the material or use it for any profit-making activity or commercial gain
You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details and we will remove access to the work immediately and investigate your claim
Downloaded from orbitdtudk on Mar 30 2020
antiSMASH 50 updates to the secondary metabolite genome mining pipeline
Blin Kai Shaw Simon Steinke Katharina Villebro Rasmus Ziemert Nadine Lee Sang Yup MedemaMarnix H Weber TilmannPublished inNucleic acids research
Link to article DOI101093nargkz310
Publication date2019
Document VersionPublishers PDF also known as Version of record
Link back to DTU Orbit
Citation (APA)Blin K Shaw S Steinke K Villebro R Ziemert N Lee S Y Weber T (2019) antiSMASH 50 updatesto the secondary metabolite genome mining pipeline Nucleic acids research 87(W2) W81-W87httpsdoiorg101093nargkz310
Published online 29 April 2019 Nucleic Acids Research 2019 Vol 47 Web Server issue W81ndashW87doi 101093nargkz310
antiSMASH 50 updates to the secondary metabolitegenome mining pipelineKai Blin1 Simon Shaw1 Katharina Steinke2 Rasmus Villebro1 Nadine Ziemert2 SangYup Lee13 Marnix H Medema 4 and Tilmann Weber 1
1The Novo Nordisk Foundation Center for Biosustainability Technical University of Denmark Kemitorvet bygning 2202800 Kgs Lyngby Denmark 2German Centre for Infection Research (DZIF) Interfaculty Institute of Microbiology andInfection Medicine Auf der Morgenstelle 28 University of Tubingen 72076 Tubingen DE Germany 3Department ofChemical and Biomolecular Engineering (BK21 Plus Program) and BioInformatics Research Center Korea AdvancedInstitute of Science and Technology 291 Daehak-ro Yuseong-gu Daejeon 34141 South Korea and 4BioinformaticsGroup Wageningen University Droevendaalsesteeg 1 6708PB Wageningen the Netherlands
Received February 07 2019 Revised April 02 2019 Editorial Decision April 16 2019 Accepted April 17 2019
ABSTRACT
Secondary metabolites produced by bacteria andfungi are an important source of antimicro-bials and other bioactive compounds In recentyears genome mining has seen broad applica-tions in identifying and characterizing new com-pounds as well as in metabolic engineeringSince 2011 the lsquoantibiotics and secondary metabo-lite analysis shellndashndashantiSMASHrsquo (httpsantismashsecondarymetabolitesorg) has assisted researchersin this both as a web server and a standalone toolIt has established itself as the most widely usedtool for identifying and analysing biosynthetic geneclusters (BGCs) in bacterial and fungal genome se-quences Here we present an entirely redesignedand extended version 5 of antiSMASH antiSMASH5 adds detection rules for clusters encoding thebiosynthesis of acyl-amino acids -lactones fungalRiPPs RaS-RiPPs polybrominated diphenyl ethersC-nucleosides PPY-like ketones and lipolanthinesFor type II polyketide synthase-encoding gene clus-ters antiSMASH 5 now offers more detailed predic-tions The HTML output visualization has been re-designed to improve the navigation and visual repre-sentation of annotations We have again improvedthe runtime of analysis steps making it possibleto deliver comprehensive annotations for bacterialgenomes within a few minutes A new output file inthe standard JavaScript object notation (JSON) for-mat is aimed at downstream tools that process anti-SMASH results programmatically
INTRODUCTION
Bacterial and fungal natural products constitute a keysource of scaffolds for the development of antimicrobialsand other drugs (1) and mediate ecological interactions be-tween organisms in various ways (2)
Mining genomic data for the presence of biosyntheticpathways that enable organisms to produce such moleculeswhich are also referred to as secondary or specializedmetabolites have become an essential approach that com-plements activity- and chemistry-guided isolation and iden-tification approaches (3) Several computational tools suchas CLUSEAN (4) or PRISM (5) have been developed tosupport scientists with this task The lsquoantibiotics and sec-ondary metabolites analysis shellrsquo antiSMASH is a pio-neer amongst these tools Initially released in 2011 (6) ithas since been further extended and improved (7ndash12) andis currently used by thousands of academic and industrialscientists worldwide to identify so called secondary metabo-lite lsquobiosynthetic gene clustersrsquo (BGCs) in their genomes ofinterest In 2017 a database component was added to theantiSMASH framework which provides instant access tothousands of pre-computed antiSMASH genome miningresults of publicly available genomes (1314) Furthermoreseveral independent tools such as the mass-spectrometryguided peptide mining tool Pep2Path (15) the lsquoAntibioticResistance Target Seekerrsquo ARTS (16) the sgRNA designtool CRISPy-web (17) a reverse-tailoring tool to match fin-ished NRPSPKS structures to antiSMASH-predicted corestructures (18) and the BGC clustering and classificationplatform BiG-SCAPE (19) were developed that directly in-teract with and interpret results generated by antiSMASHand provide information that is outside the scope of a core-antiSMASH analysis
Here we present version 5 of antiSMASH which con-tains many improvements In addition to many features
To whom correspondence should be addressed Tel +45 24896132 Email tiwebiosustaindtudkCorrespondence may also be addressed to Marnix H Medema Tel +31 317484706 Email marnixmedemawurnl
Ccopy The Author(s) 2019 Published by Oxford University Press on behalf of Nucleic Acids ResearchThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W82 Nucleic Acids Research 2019 Vol 47 Web Server issue
visible to the end users such as extended and improvedBGC detection and analysis capabilities and a mod-ernized and improved User Interface (see below) anti-SMASH version 5 was completely rewritten in Pythonversion 3 and the code was restructured to increaseperformance reliability and ease of maintenance Thishas led to a significant speed increase of the pipelineA complete list of antiSMASH 5 features is includedin the antiSMASH documentation httpsdocsantismashsecondarymetabolitesorgantiSMASH5features
NEW FEATURES AND UPDATES
New gene cluster classes and refinement of cluster detectionrules
The most widely used and recommended mode to detectBGCs in genomic data is via manually curated and val-idated gene cluster rules These are based on identifyingco-occurring conserved core enzymes in the genome usingHMM-profiles that were derived from Pfam (20) SMART(21) BAGEL (22) or Yadav et al (23) or that were cre-ated specifically for antiSMASH While antiSMASH ver-sion 4 supported the rule-based detection of 44 differ-ent biosynthetic types antiSMASH 5 now includes rulesfor 52 different BGC types In version 5 new rules wereadded to detect BGCs encoding the biosynthesis of N-acylamino acids (24) -lactones (25) polybrominated diphenylethers (26) C-nucleosides (27) pseudopyronines (28) fun-gal RiPPs (29ndash31) and RaS-RiPPs (3233) Furthermore anew lsquonrps-likersquo rule was defined for NRPS-fragments ieatypical NRPSs that donrsquot have the typical C-A-T modulearchitecture The previous lsquootherksrsquo rule was split into tworules to individually assign heterocyst glycolipid synthase-like clusters and other atypical PKSs In addition somerules were improved based on user case reports The rulesdescribing lanthipeptides and trans-AT type I PKS were re-fined to reduce the number of false positive hybrid calls onother cluster types For trans-AT- type I PKS and type IIPKS we increased the size of the cluster cutoffs to capturepreviously missed tailoring enzymes in published clustersThe rule for linear azoleazoline-containing peptides wasmade more generic to better cover the range of describedclusters
The rule describing microcin clusters was removed as mi-crocins are a class of RiPPs defined via their productionin Enterobacteriaceae and are already captured by one ofour other specific RiPP cluster rules depending on their re-spective biosynthesis pathway (eg microcin J25-like RiPPswere previously covered by the old microcin cluster rules butchemically are lasso peptides while microcin B17 is a linearazol(in)e-containing peptide)
Improved type II PKS prediction
Bacterial type II PKS BGCs code for the biosynthesis ofaromatic polyketides such as the antibiotic tetracycline orthe anti-tumour drug doxorubicin From the beginning an-tiSMASH has had rules that were able to detect type IIPKS BGCs by checking for the presence of the KSα andKSβCLF component of the minimal PKS However no
detailed prediction methods had been added since anti-SMASHrsquos first version In antiSMASH 5 we introduce anew PKS II analysis module (12) which uses a collection ofmanually curated HMMs to predict potential starter unitsthe number of elongation cycles (and thus a rough esti-mation of the putative molecular weight of the core com-pound) cyclization patterns and some conserved type IIPKS specific tailoring reactions This module is automat-ically triggered whenever a type II PKS BGC is detected
Annotation of resistance genes via Resfams
The Resfams database (34) is a curated database of proteinfamilies with confirmed antibiotic resistance function an-tiSMASH 5 uses the profile Hidden Markov Models (pH-MMs) from Resfams to annotate potential resistance genesfound in predicted gene regions Potential resistance gene-hits are displayed in the lsquogene detailsrsquo panel along with otherfunctional annotations
GO-term annotations
The Gene Ontology (GO) is a controlled vocabulary for de-scribing biological processes molecular functions and cel-lular components in a consistent way to enable comparisonof these between different species Amongst its wide rangeof uses the GO has been used to predict gene clusters ineukaryotes and bacteria (35) and in conjunction with anti-SMASH to refine cluster boundaries in antiSMASH out-put for Aspergillus species (36)
To facilitate these and other GO-based analyses anti-SMASH 5 includes an option to automatically annotateGO terms on Pfam domains This functionality makes useof the fact that GO terms may be linked not only to spe-cific gene products but also to other means of classifica-tion in so-called lsquomappingsrsquo (httpgeneontologyorgpagedownload-mappings) As antiSMASH can automaticallyannotate Pfam domains the GO annotation functionalitymakes use of the Pfam to GO mapping supplied by theGene Ontology Consortiumrsquos website (37) If the ID of apredicted Pfam domain in an antiSMASH record is presentin the Pfam to GO mapping the respective GO terms areassigned and presented in the lsquogene detailsrsquo panel
Link to the antiSMASH database
antiSMASH provides options to search for similar geneclusters in public datasets As already implemented in previ-ous versions of the software the KnownClusterBlast func-tionality searches each identified region against the man-ually curated MIBiG (38) repository The KnownClus-terBlast and ClusterBlast search functions use an algo-rithm first described in antiSMASH 1 (6) which also isin use in a generalized version in MultiGeneBlast (39)In the previous versions of antiSMASH the ClusterBlastdatabase was generated by scripts that used the antiSMASHBGC detection logic on sequences downloaded from theNCBI GenbankRefSeq databases As version 2 of theantiSMASH database now also contains BGCs of draftgenomes (14) starting with antiSMASH 5 the ClusterBlastdatabases will be directly generated from the new anti-SMASH database and complemented with individual BGC
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W83
records that were submitted to NCBI outside of whole-genome submissions This provides several advantages Theabundance of entries for selected generaspecies in the pub-lic databases (and thus also in the previous ClusterBlastdatabase) is strongly skewed towards clinically or industri-ally relevant organisms There are for example more than15 000 assemblies for Escherichia coli deposited at NCBIFor the antiSMASH database a sequence-based dereplica-tion workflow was established (14) that reduced the num-ber of redundant entries with very high sequence similarityThus the updated ClusterBlast database contains fewer en-tries than the previous release despite the increase in pub-licly available sequence data This decrease has resulted inreduced computation times while simultaneously providingmore relevant hits Furthermore as the entries of the Clus-terBlast database are directly related to the BGCs in the an-tiSMASH database a link to the respective BGC is now in-cluded for all ClusterBlast hits promptly directing the userto the detailed report of the similar gene clusters
New lsquoregionrsquo concept
In previous versions antiSMASH referred to all co-locatedhybrid and independent BGCs with the single label lsquoclusterrsquoIn many cases this led to confusing structure predictionswhen distinct BGCs are encoded side-by-side For examplemany Streptomyces plasmids exist for which all BGCs lie soclose to each other that all were joined into a single largelsquoclusterrsquo In order to better distinguish the different biolog-ical options that lead to BGCs antiSMASH 5 introducessome new terminology
The definitions now used in antiSMASH 5 areCore The minimum area containing one or more genes
that code for enzymes for a single BGC type that are de-tected by the manually curated detection rules These genesdo not have to be contiguous but can be within a certaincutoff distance as defined by the detection rule for the BGCtype in question
Neighbourhood Distance up- and downstream of thecluster core that is used to find tailoring genesenzymesthe neighbourhood distances for the individual biosynthetictypes were empirically determined and defined in the detec-tion rules
Protocluster Contains core + neighbourhoods at bothsides of the core each protocluster always will have one sin-gle product type (for example NRPS) Protoclusters mayoverlap partially or completely with other protoclusters Inthe result webpage protoclusters are displayed as boxesabove the gene arrows The cores are shown as solid colourboxes the neighbourhoods are the half-transparent areasaround the cores
Candidate cluster Contains one or more protoclustersthe candidate clusters are defined as described below Thesedefinitions better allow modelling of hybrid clusters suchas PKSNRPS hybrids which combine two or more differ-ent biosynthetic classes (as identified in the detection rules)or cases where one class is used to biosynthesize a precur-sor for a second class An example of the latter is found inglycopeptide biosynthesis where one of the amino acids issynthesized by a type III PKS which is then incorporatedinto the product by a NRPS Candidate clusters may overlap
partially or completely with other candidate clusters In theresult webpage candidate clusters are shown as boxes abovethe protoclusters
Region Contains one or more candidate clusters The re-gions in antiSMASH 5 correspond to the entities calledlsquoclustersrsquo in antiSMASH 1 ndash 4 and now constitute what isdisplayed on a page of the results webpage Sometimes a re-gion will contain multiple mutually exclusive candidate clus-ters in such cases comparative genomic analysis andor ex-perimental work is required to assess which of these candi-date clusters constitute actual BGCs Regions will not over-lap with each other At least one of the contained candidateclusters will cover the full length of the region
There are four kinds of candidate clusters chemical hy-brids interleaved neighbouring and single
Chemical hybrid candidate clusters contain at least twoprotoclusters that share at least one gene that codes for en-zymes of two or more separate BGC types (eg a single genecoding for type I PKS and NRPS modules) (Figure 1A) Anexample of this type are hybrid PKSNRPSs Please notethat this type of candidate cluster can also include protoclus-ters within that shared range that do not share a coding se-quence provided that they are completely contained withinthe candidate cluster
Interleaved candidate clusters contain protoclusters thatdo not share cluster-type-defining coding sequences buttheir core locations overlap (Figure 1B)
Neighbouring candidate clusters contain protoclusterswhich transitively overlap in their neighbourhoods (Figure1C)
Single candidate clusters (Figure 1D) exist for consistencyof access they contain only a single protocluster Note thatindividual protoclusters can be contained by more than onecandidate cluster (typically a neighbouring candidate clusterand one of single interleaved or chemical hybrid)
Each candidate cluster assignment is transitive for ex-ample if a protocluster would form a chemical hybrid witheach of two neighbouring protoclusters but these neigh-bours would not form a chemical hybrid on their own allthree together will still form a chemical hybrid candidatecluster
Improved user interface
A central aim of antiSMASH is to provide very detailedand specific information via an easy to use and under-stand user interface (UI) The UI remained principally un-changed from the initial release of antiSMASH in 2011 de-spite the increased functionality added with each new ver-sion In this version we have modernized the UI using up-dated web technologies that allow a better structuring ofthe result-content of the antiSMASH results pages For re-designing the UI it was important that the reliable andwell-established look-and-feel was conserved while also re-taining the ability to download the whole web-based resultsfolder and to display it locally in a variety of web-browsers
We and others (such as (40)) have realized that anti-SMASH results using the heuristic ClusterFinder algorithm(41) were more often than not wrongly interpreted At thesame time ClusterFinder contributed significantly to thecomputational workload For these reasons we decided to
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W84 Nucleic Acids Research 2019 Vol 47 Web Server issue
Figure 1 Candidate cluster types 1234 Greyyellow gene involved in protocluster AB (A) Chemical Hybrids Since cluster type A and cluster typeB share a CDS that defines those protoclusters they are classified as lsquochemical hybridrsquo (B) Interleaved Since none of the protoclusters share any definingCDS with any other protocluster it is not annotated as a chemical hybrid even though the biosynthetic product may or may not be The two protoclustersform an interleaved candidate clusters since the core of A overlaps with the core of B (C) Neighbouring Neighbouring candidate clusters are defined if theneighbourhoods of two protoclusters but not their cores overlap (D) Singles If protoclusters donrsquot have any overlaprelation with other protoclusters theterm single candidate cluster is assigned
remove this feature from the pubic antiSMASH web serverIt is of course still included in the download version of an-tiSMASH and can be enabled via the command line
In the Regions overview section (Figure 2) a graph-ical overview showing the location of the identified re-gions on the chromosomeplasmidscaffoldscontig is dis-played In the detailed view regions that are located oncontig-borders are now clearly labelled This often indi-cates that parts of the BGC are missing or that several sec-tions of a BGC are located on different contigs and aretherefore reported individually (for a more detailed discus-sion on this phenomenon please see (42)) For the firsttime antiSMASH 5 now offers interactive browsing of the
BGCs including selection of lsquofunctionalrsquo units ie core en-zymes transporters etc zooming to individual genes orregionscandidate clustersprotoclusters Details of the se-lection are now provided in side panels instead of pop-upwindows using a hierarchical view of the analysis sum-maries (which can be expanded by clicking lsquo+rsquo) to provideadditional details For the display of the PKSNRPS do-main organization the user now can choose whether tolimit the shown domains to the currently selected genes orjust display the results of the selected gene(s)enzyme(s)Furthermore the information is now organized in lsquotabsrsquothat do not require scrolling down along an often very longresults page
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W85
Figure 2 Screenshot of the antiSMASH 5 user interface (example NCBI-acc Y16952 balhimycin BGC) The new region overview now allowspanningzooming The candidate cluster and protocluster boxes are explained in the lsquonew region conceptrsquo section above Information about the currentlyselected gene are displayed at the right lsquoGene detailsrsquo panel For PKS or NRPS regions the detailed domain annotation is displayed by pressing the tabsusers can select the domain overview (shown) or the ClusterBlast KnownClusterBlast or SubClusterBlast results At the right the structure predictionand details of specificity predictions are displayed upon selecting the plus sign
CODE REFACTORING AND SPEED-UP
Large parts of the pre-antiSMASH 5 code base were stillderived from antiSMASH version 1 which was released in2011 In order to maintain future compatibility the anti-SMASH code base had to be migrated from python 27which will reach end-of-life in 2020 to the current versions35ndash37 As this transition required significant modificationto the antiSMASH code we decided to take this as a chanceto completely rewrite the software with a special considera-tion on runtime code stability and code maintainability Aunit test and integration test framework was implementedthat covers most parts of the antiSMASH 5 code allowing amuch easier debugging andndashndashmost importantlyndashndashextensionof the code while at the same time ensuring that new fea-tures do not negatively impact the results of existing mod-ules For some of the externally contributed modules (Sand-puma trans-AT PKS comparisons terpene PrediCAT) ourcontributors are currently preparing updated and compli-ant versions which will be added to antiSMASH 5 in mi-nor releases once they are finished and tested Like the ear-lier antiSMASH versions antiSMASH 5 provides the anal-ysis results in an interactive webpage and richly annotatedGenBank-format files for the whole genome and individualclusters As a new feature in version 5 all data are also avail-able as a computer readable JSON container which allowsthird party tools to easily process antiSMASH annotationsThis JSON output has superseded some other output typessuch as BioSynML and XLS
In addition to the advantages mentioned above the coderefactoring and cleanup has also led to a significant speed
increase of the new version by a factor of 4-11times (dependingon genome and selected options) instead of waiting timesof several hours antiSMASH results are now usually deliv-ered within 30ndash40 min after the start of the job for a typicalsubmission at the public web server
CONCLUSIONS AND FUTURE PERSPECTIVES
With the help of software like antiSMASH genome miningfor specialized metabolites has established itself as a com-plementary approach for the identification of novel metabo-lites which is routinely used within the natural products re-search community and increasingly applied in related fieldssuch as metagenomics environmental biology or metabolicengineering With the improvements to the antiSMASHuser interface and performance we keep pace with these de-velopments Furthermore the complete refactoring of theantiSMASH 5 code base will allow us to increasingly useantiSMASH as a tool that provides analysis data on whichother software can perform additional analyses
DATA AVAILABILITY
antiSMASH is available from httpsantismashsecondarymetabolitesorg (bacterial version) orhttpsfungismashsecondarymetabolitesorg (fun-gal version) The antiSMASH documentationincluding a PDF user guide is available fromhttpsdocsantismashsecondarymetabolitesorg Thesewebsites are free and open to all users and there is no loginrequirement The antiSMASH source code is available from
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W86 Nucleic Acids Research 2019 Vol 47 Web Server issue
httpsgithubcomantismashantismash antiSMASH isalso available via Docker
ACKNOWLEDGEMENTS
We thank Justin JJ van der Hooft for critical comments onthe manuscript and providing documentation and EmiliaPalazzotto and Tetiana Gren for helpful discussions anduser testing of the new features
FUNDING
Novo Nordisk Foundation [NNF10CC1016517 toSYLTW NNF16OC0021746 to TW] Center forMicrobial Secondary Metabolites (CeMiSt) Danish Na-tional Research Foundation [DNRF137 to TW] Reinholdand Maria Teufel Foundation (to KS) Funding for openaccess charge The Novo Nordisk FoundationConflict of interest statement None declared
REFERENCES1 NewmanDJ and CraggGM (2016) Natural products as sources of
new drugs from 1981 to 2014 J Nat Prod 79 629ndash6612 van der MeijA WorsleySF HutchingsMI and van WezelGP
(2017) Chemical ecology of antibiotic production by actinomycetesFEMS Microbiol Rev 41 392ndash416
3 ZiemertN AlanjaryM and WeberT (2016) The evolution ofgenome mining in microbes - a review Nat Prod Rep 33 988ndash1005
4 WeberT RauschC LopezP HoofI GaykovaV HusonDHand WohllebenW (2009) CLUSEAN a computer-based frameworkfor the automated analysis of bacterial secondary metabolitebiosynthetic gene clusters J Biotechnol 140 13ndash17
5 SkinniderMA MerwinNJ JohnstonCW and MagarveyNA(2017) PRISM 3 expanded prediction of natural product chemicalstructures from microbial genomes Nucleic Acids Res 45W49ndashW54
6 MedemaMH BlinK CimermancicP de JagerV ZakrzewskiPFischbachMA WeberT TakanoE and BreitlingR (2011)antiSMASH rapid identification annotation and analysis ofsecondary metabolite biosynthesis gene clusters in bacterial andfungal genome sequences Nucleic Acids Res 39 W339ndashW346
7 BlinK MedemaMH KazempourD FischbachMABreitlingR TakanoE and WeberT (2013) antiSMASH 20ndashndashaversatile platform for genome mining of secondary metaboliteproducers Nucleic Acids Res 41 W204ndashW212
8 WeberT BlinK DuddelaS KrugD KimHU BruccoleriRLeeSY FischbachMA MullerR WohllebenW et al (2015)antiSMASH 30ndashndasha comprehensive resource for the genome miningof biosynthetic gene clusters Nucleic Acids Res 43 W237ndashW243
9 BlinK WolfT ChevretteMG LuX SchwalenCJKautsarSA Suarez DuranHG de Los SantosELC KimHUNaveM et al (2017) antiSMASH 40-improvements in chemistryprediction and gene cluster boundary identification Nucleic AcidsRes 45 W36ndashW41
10 KautsarSA Suarez DuranHG BlinK OsbournA andMedemaMH (2017) plantiSMASH automated identificationannotation and expression analysis of plant biosynthetic geneclusters Nucleic Acids Res 45 W55ndashW63
11 BlinK KazempourD WohllebenW and WeberT (2014)Improved lanthipeptide detection and prediction for antiSMASHPLoS One 9 e89420
12 VillebroR ShawS BlinK and WeberT (2019) Sequence-basedclassification of type II polyketide synthase biosynthetic gene clustersfor antiSMASH J Ind Microbiol Biotechnol 46 469ndash475
13 BlinK MedemaMH KottmannR LeeSY and WeberT (2017)The antiSMASH database a comprehensive database of microbialsecondary metabolite biosynthetic gene clusters Nucleic Acids Res45 D555ndashD559
14 BlinK Pascal AndreuV de Los SantosELC Del CarratoreFLeeSY MedemaMH and WeberT (2019) The antiSMASHdatabase version 2 a comprehensive resource on secondarymetabolite biosynthetic gene clusters Nucleic Acids Res 47D625ndashD630
15 MedemaMH PaalvastY NguyenDD MelnikADorresteinPC TakanoE and BreitlingR (2014) Pep2Pathautomated mass spectrometry-guided genome mining of peptidicnatural products PLoS Comput Biol 10 e1003822
16 AlanjaryM KronmillerB AdamekM BlinK WeberTHusonD PhilmusB and ZiemertN (2017) The AntibioticResistant Target Seeker (ARTS) an exploration engine for antibioticcluster prioritization and novel drug target discovery Nucleic AcidsRes 45 W42ndashW48
17 BlinK PedersenLE WeberT and LeeSY (2016) CRISPy-webAn online resource to design sgRNAs for CRISPR applicationsSynth Syst Biotechnol 1 118ndash121
18 ShirleyWA KelleyBP PotierY KoschwanezJH BruccoleriRand TarselliM (2018) Unzipping natural products improved naturalproduct structure predictions by ensemble modeling and fingerprintmatching ChemRxiv doi httpdoi1026434chemrxiv6863864 26July 2018 preprint not peer reviewed
19 Navarro-MunozJ Selem-MojicaN MullowneyM KautsarSTryonJ ParkinsonE De Los SantosE YeongMCruz-MoralesP AbubuckerS et al (2018) A computationalframework for systematic exploration of biosynthetic diversity fromlarge-scale genomic data bioRxiv doi httpdoi101101445270 17October 2018 preprint not peer reviewed
20 FinnRD CoggillP EberhardtRY EddySR MistryJMitchellAL PotterSC PuntaM QureshiMSangrador-VegasA et al (2016) The Pfam protein families databasetowards a more sustainable future Nucleic Acids Res 44D279ndashD285
21 LetunicI and BorkP (2018) 20 years of the SMART protein domainannotation resource Nucleic Acids Res 46 D493ndashD496
22 de JongA van HeelAJ KokJ and KuipersOP (2010) BAGEL2mining for bacteriocins in genomic data Nucleic Acids Res 38W647ndashW651
23 YadavG GokhaleRS and MohantyD (2009) Towards predictionof metabolic products of polyketide synthases an in silico analysisPLoS Comput Biol 5 e1000351
24 CraigJW CherryMA and BradySF (2011) Long-chain N-acylamino acid synthases are linked to the putativePEP-CTERMexosortase protein-sorting system in Gram-negativebacteria J Bacteriol 193 5707ndash5715
25 RobinsonSL ChristensonJK and WackettLP (2018)Biosynthesis and chemical diversity of -lactone natural productsNat Prod Rep 36 458ndash475
26 AgarwalV BlantonJM PodellS TatonA SchornMABuschJ LinZ SchmidtEW JensenPR PaulVJ et al (2017)Metagenomic discovery of polybrominated diphenyl etherbiosynthesis by marine sponges Nat Chem Biol 13 537ndash543
27 SosioM GaspariE IorioM PessinaS MedemaMHBernasconiA SimoneM MaffioliSI EbrightRH andDonadioS (2018) Analysis of the Pseudouridimycin biosyntheticpathway provides Insights into the formation of C-nucleosideantibiotics Cell Chem Biol 25 540ndash549
28 BauerJS GhequireMGK NettM JostenM SahlH-G DeMotR and GrossH (2015) Biosynthetic origin of the antibioticpseudopyronines A and B in Pseudomonas putida BW11M1Chembiochem 16 2491ndash2497
29 LuoH Hallen-AdamsHE Scott-CraigJS and WaltonJD (2012)Ribosomal biosynthesis of -amanitin in Galerina marginata FungalGenet Biol 49 123ndash129
30 NaganoN UmemuraM IzumikawaM KawanoJ IshiiTKikuchiM TomiiK KumagaiT YoshimiA MachidaM et al(2016) Class of cyclic ribosomal peptide synthetic genes infilamentous fungi Fungal Genet Biol 86 58ndash70
31 DingW LiuW-Q JiaY LiY van der DonkWA and ZhangQ(2016) Biosynthetic investigation of phomopsins reveals a widespreadpathway for ribosomal natural products in Ascomycetes Proc NatlAcad Sci USA 113 3521ndash3526
32 BushinLB ClarkKA PelczerI and SeyedsayamdostMR (2018)Charting an unexplored streptococcal biosynthetic landscape reveals
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W87
a unique peptide cyclization motif J Am Chem Soc 14017674ndash17684
33 CarusoA BushinLB ClarkKA MartinieRJ andSeyedsayamdostMR (2019) A radical approach to enzymatic-Thioether bond formation J Am Chem Soc 141 990ndash997
34 GibsonMK ForsbergKJ and DantasG (2015) Improvedannotation of antibiotic resistance determinants reveals microbialresistomes cluster by ecology ISME J 9 207ndash216
35 YiG SzeSH and ThonMR (2007) Identifying clusters offunctionally related genes in genomes Bioinformatics 23 1053ndash1060
36 InglisDO BinkleyJ SkrzypekMS ArnaudMBCerqueiraGC ShahP WymoreF WortmanJR and SherlockG(2013) Comprehensive annotation of secondary metabolitebiosynthetic genes and gene clusters of Aspergillus nidulans Afumigatus A niger and A oryzae BMC Microbiol 13 91
37 The Gene Ontology Consortium (2016) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338
38 MedemaMH KottmannR YilmazP CummingsMBigginsJB BlinK de BruijnI ChooiYH ClaesenJ
CoatesRC et al (2015) Minimum information about a biosyntheticgene cluster Nat Chem Biol 11 625ndash631
39 MedemaMH TakanoE and BreitlingR (2013) Detectingsequence homology at the gene cluster level with MultiGeneBlastMol Biol Evol 30 1218ndash1223
40 BaltzRH (2018) Natural product drug discovery in the genomic erarealities conjectures misconceptions and opportunities J IndMicrobiol Biotechnol 46 281ndash299
41 CimermancicP MedemaMH ClaesenJ KuritaK WielandBrownLC MavrommatisK PatiA GodfreyPA KoehrsenMClardyJ et al (2014) Insights into secondary metabolism from aglobal analysis of prokaryotic biosynthetic gene clusters Cell 158412ndash421
42 BlinK KimHU MedemaMH and WeberT (2017) Recentdevelopment of antiSMASH and other computational approaches tomine secondary metabolite biosynthetic gene clusters BriefBioinform doi101093bibbbx146
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Published online 29 April 2019 Nucleic Acids Research 2019 Vol 47 Web Server issue W81ndashW87doi 101093nargkz310
antiSMASH 50 updates to the secondary metabolitegenome mining pipelineKai Blin1 Simon Shaw1 Katharina Steinke2 Rasmus Villebro1 Nadine Ziemert2 SangYup Lee13 Marnix H Medema 4 and Tilmann Weber 1
1The Novo Nordisk Foundation Center for Biosustainability Technical University of Denmark Kemitorvet bygning 2202800 Kgs Lyngby Denmark 2German Centre for Infection Research (DZIF) Interfaculty Institute of Microbiology andInfection Medicine Auf der Morgenstelle 28 University of Tubingen 72076 Tubingen DE Germany 3Department ofChemical and Biomolecular Engineering (BK21 Plus Program) and BioInformatics Research Center Korea AdvancedInstitute of Science and Technology 291 Daehak-ro Yuseong-gu Daejeon 34141 South Korea and 4BioinformaticsGroup Wageningen University Droevendaalsesteeg 1 6708PB Wageningen the Netherlands
Received February 07 2019 Revised April 02 2019 Editorial Decision April 16 2019 Accepted April 17 2019
ABSTRACT
Secondary metabolites produced by bacteria andfungi are an important source of antimicro-bials and other bioactive compounds In recentyears genome mining has seen broad applica-tions in identifying and characterizing new com-pounds as well as in metabolic engineeringSince 2011 the lsquoantibiotics and secondary metabo-lite analysis shellndashndashantiSMASHrsquo (httpsantismashsecondarymetabolitesorg) has assisted researchersin this both as a web server and a standalone toolIt has established itself as the most widely usedtool for identifying and analysing biosynthetic geneclusters (BGCs) in bacterial and fungal genome se-quences Here we present an entirely redesignedand extended version 5 of antiSMASH antiSMASH5 adds detection rules for clusters encoding thebiosynthesis of acyl-amino acids -lactones fungalRiPPs RaS-RiPPs polybrominated diphenyl ethersC-nucleosides PPY-like ketones and lipolanthinesFor type II polyketide synthase-encoding gene clus-ters antiSMASH 5 now offers more detailed predic-tions The HTML output visualization has been re-designed to improve the navigation and visual repre-sentation of annotations We have again improvedthe runtime of analysis steps making it possibleto deliver comprehensive annotations for bacterialgenomes within a few minutes A new output file inthe standard JavaScript object notation (JSON) for-mat is aimed at downstream tools that process anti-SMASH results programmatically
INTRODUCTION
Bacterial and fungal natural products constitute a keysource of scaffolds for the development of antimicrobialsand other drugs (1) and mediate ecological interactions be-tween organisms in various ways (2)
Mining genomic data for the presence of biosyntheticpathways that enable organisms to produce such moleculeswhich are also referred to as secondary or specializedmetabolites have become an essential approach that com-plements activity- and chemistry-guided isolation and iden-tification approaches (3) Several computational tools suchas CLUSEAN (4) or PRISM (5) have been developed tosupport scientists with this task The lsquoantibiotics and sec-ondary metabolites analysis shellrsquo antiSMASH is a pio-neer amongst these tools Initially released in 2011 (6) ithas since been further extended and improved (7ndash12) andis currently used by thousands of academic and industrialscientists worldwide to identify so called secondary metabo-lite lsquobiosynthetic gene clustersrsquo (BGCs) in their genomes ofinterest In 2017 a database component was added to theantiSMASH framework which provides instant access tothousands of pre-computed antiSMASH genome miningresults of publicly available genomes (1314) Furthermoreseveral independent tools such as the mass-spectrometryguided peptide mining tool Pep2Path (15) the lsquoAntibioticResistance Target Seekerrsquo ARTS (16) the sgRNA designtool CRISPy-web (17) a reverse-tailoring tool to match fin-ished NRPSPKS structures to antiSMASH-predicted corestructures (18) and the BGC clustering and classificationplatform BiG-SCAPE (19) were developed that directly in-teract with and interpret results generated by antiSMASHand provide information that is outside the scope of a core-antiSMASH analysis
Here we present version 5 of antiSMASH which con-tains many improvements In addition to many features
To whom correspondence should be addressed Tel +45 24896132 Email tiwebiosustaindtudkCorrespondence may also be addressed to Marnix H Medema Tel +31 317484706 Email marnixmedemawurnl
Ccopy The Author(s) 2019 Published by Oxford University Press on behalf of Nucleic Acids ResearchThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W82 Nucleic Acids Research 2019 Vol 47 Web Server issue
visible to the end users such as extended and improvedBGC detection and analysis capabilities and a mod-ernized and improved User Interface (see below) anti-SMASH version 5 was completely rewritten in Pythonversion 3 and the code was restructured to increaseperformance reliability and ease of maintenance Thishas led to a significant speed increase of the pipelineA complete list of antiSMASH 5 features is includedin the antiSMASH documentation httpsdocsantismashsecondarymetabolitesorgantiSMASH5features
NEW FEATURES AND UPDATES
New gene cluster classes and refinement of cluster detectionrules
The most widely used and recommended mode to detectBGCs in genomic data is via manually curated and val-idated gene cluster rules These are based on identifyingco-occurring conserved core enzymes in the genome usingHMM-profiles that were derived from Pfam (20) SMART(21) BAGEL (22) or Yadav et al (23) or that were cre-ated specifically for antiSMASH While antiSMASH ver-sion 4 supported the rule-based detection of 44 differ-ent biosynthetic types antiSMASH 5 now includes rulesfor 52 different BGC types In version 5 new rules wereadded to detect BGCs encoding the biosynthesis of N-acylamino acids (24) -lactones (25) polybrominated diphenylethers (26) C-nucleosides (27) pseudopyronines (28) fun-gal RiPPs (29ndash31) and RaS-RiPPs (3233) Furthermore anew lsquonrps-likersquo rule was defined for NRPS-fragments ieatypical NRPSs that donrsquot have the typical C-A-T modulearchitecture The previous lsquootherksrsquo rule was split into tworules to individually assign heterocyst glycolipid synthase-like clusters and other atypical PKSs In addition somerules were improved based on user case reports The rulesdescribing lanthipeptides and trans-AT type I PKS were re-fined to reduce the number of false positive hybrid calls onother cluster types For trans-AT- type I PKS and type IIPKS we increased the size of the cluster cutoffs to capturepreviously missed tailoring enzymes in published clustersThe rule for linear azoleazoline-containing peptides wasmade more generic to better cover the range of describedclusters
The rule describing microcin clusters was removed as mi-crocins are a class of RiPPs defined via their productionin Enterobacteriaceae and are already captured by one ofour other specific RiPP cluster rules depending on their re-spective biosynthesis pathway (eg microcin J25-like RiPPswere previously covered by the old microcin cluster rules butchemically are lasso peptides while microcin B17 is a linearazol(in)e-containing peptide)
Improved type II PKS prediction
Bacterial type II PKS BGCs code for the biosynthesis ofaromatic polyketides such as the antibiotic tetracycline orthe anti-tumour drug doxorubicin From the beginning an-tiSMASH has had rules that were able to detect type IIPKS BGCs by checking for the presence of the KSα andKSβCLF component of the minimal PKS However no
detailed prediction methods had been added since anti-SMASHrsquos first version In antiSMASH 5 we introduce anew PKS II analysis module (12) which uses a collection ofmanually curated HMMs to predict potential starter unitsthe number of elongation cycles (and thus a rough esti-mation of the putative molecular weight of the core com-pound) cyclization patterns and some conserved type IIPKS specific tailoring reactions This module is automat-ically triggered whenever a type II PKS BGC is detected
Annotation of resistance genes via Resfams
The Resfams database (34) is a curated database of proteinfamilies with confirmed antibiotic resistance function an-tiSMASH 5 uses the profile Hidden Markov Models (pH-MMs) from Resfams to annotate potential resistance genesfound in predicted gene regions Potential resistance gene-hits are displayed in the lsquogene detailsrsquo panel along with otherfunctional annotations
GO-term annotations
The Gene Ontology (GO) is a controlled vocabulary for de-scribing biological processes molecular functions and cel-lular components in a consistent way to enable comparisonof these between different species Amongst its wide rangeof uses the GO has been used to predict gene clusters ineukaryotes and bacteria (35) and in conjunction with anti-SMASH to refine cluster boundaries in antiSMASH out-put for Aspergillus species (36)
To facilitate these and other GO-based analyses anti-SMASH 5 includes an option to automatically annotateGO terms on Pfam domains This functionality makes useof the fact that GO terms may be linked not only to spe-cific gene products but also to other means of classifica-tion in so-called lsquomappingsrsquo (httpgeneontologyorgpagedownload-mappings) As antiSMASH can automaticallyannotate Pfam domains the GO annotation functionalitymakes use of the Pfam to GO mapping supplied by theGene Ontology Consortiumrsquos website (37) If the ID of apredicted Pfam domain in an antiSMASH record is presentin the Pfam to GO mapping the respective GO terms areassigned and presented in the lsquogene detailsrsquo panel
Link to the antiSMASH database
antiSMASH provides options to search for similar geneclusters in public datasets As already implemented in previ-ous versions of the software the KnownClusterBlast func-tionality searches each identified region against the man-ually curated MIBiG (38) repository The KnownClus-terBlast and ClusterBlast search functions use an algo-rithm first described in antiSMASH 1 (6) which also isin use in a generalized version in MultiGeneBlast (39)In the previous versions of antiSMASH the ClusterBlastdatabase was generated by scripts that used the antiSMASHBGC detection logic on sequences downloaded from theNCBI GenbankRefSeq databases As version 2 of theantiSMASH database now also contains BGCs of draftgenomes (14) starting with antiSMASH 5 the ClusterBlastdatabases will be directly generated from the new anti-SMASH database and complemented with individual BGC
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W83
records that were submitted to NCBI outside of whole-genome submissions This provides several advantages Theabundance of entries for selected generaspecies in the pub-lic databases (and thus also in the previous ClusterBlastdatabase) is strongly skewed towards clinically or industri-ally relevant organisms There are for example more than15 000 assemblies for Escherichia coli deposited at NCBIFor the antiSMASH database a sequence-based dereplica-tion workflow was established (14) that reduced the num-ber of redundant entries with very high sequence similarityThus the updated ClusterBlast database contains fewer en-tries than the previous release despite the increase in pub-licly available sequence data This decrease has resulted inreduced computation times while simultaneously providingmore relevant hits Furthermore as the entries of the Clus-terBlast database are directly related to the BGCs in the an-tiSMASH database a link to the respective BGC is now in-cluded for all ClusterBlast hits promptly directing the userto the detailed report of the similar gene clusters
New lsquoregionrsquo concept
In previous versions antiSMASH referred to all co-locatedhybrid and independent BGCs with the single label lsquoclusterrsquoIn many cases this led to confusing structure predictionswhen distinct BGCs are encoded side-by-side For examplemany Streptomyces plasmids exist for which all BGCs lie soclose to each other that all were joined into a single largelsquoclusterrsquo In order to better distinguish the different biolog-ical options that lead to BGCs antiSMASH 5 introducessome new terminology
The definitions now used in antiSMASH 5 areCore The minimum area containing one or more genes
that code for enzymes for a single BGC type that are de-tected by the manually curated detection rules These genesdo not have to be contiguous but can be within a certaincutoff distance as defined by the detection rule for the BGCtype in question
Neighbourhood Distance up- and downstream of thecluster core that is used to find tailoring genesenzymesthe neighbourhood distances for the individual biosynthetictypes were empirically determined and defined in the detec-tion rules
Protocluster Contains core + neighbourhoods at bothsides of the core each protocluster always will have one sin-gle product type (for example NRPS) Protoclusters mayoverlap partially or completely with other protoclusters Inthe result webpage protoclusters are displayed as boxesabove the gene arrows The cores are shown as solid colourboxes the neighbourhoods are the half-transparent areasaround the cores
Candidate cluster Contains one or more protoclustersthe candidate clusters are defined as described below Thesedefinitions better allow modelling of hybrid clusters suchas PKSNRPS hybrids which combine two or more differ-ent biosynthetic classes (as identified in the detection rules)or cases where one class is used to biosynthesize a precur-sor for a second class An example of the latter is found inglycopeptide biosynthesis where one of the amino acids issynthesized by a type III PKS which is then incorporatedinto the product by a NRPS Candidate clusters may overlap
partially or completely with other candidate clusters In theresult webpage candidate clusters are shown as boxes abovethe protoclusters
Region Contains one or more candidate clusters The re-gions in antiSMASH 5 correspond to the entities calledlsquoclustersrsquo in antiSMASH 1 ndash 4 and now constitute what isdisplayed on a page of the results webpage Sometimes a re-gion will contain multiple mutually exclusive candidate clus-ters in such cases comparative genomic analysis andor ex-perimental work is required to assess which of these candi-date clusters constitute actual BGCs Regions will not over-lap with each other At least one of the contained candidateclusters will cover the full length of the region
There are four kinds of candidate clusters chemical hy-brids interleaved neighbouring and single
Chemical hybrid candidate clusters contain at least twoprotoclusters that share at least one gene that codes for en-zymes of two or more separate BGC types (eg a single genecoding for type I PKS and NRPS modules) (Figure 1A) Anexample of this type are hybrid PKSNRPSs Please notethat this type of candidate cluster can also include protoclus-ters within that shared range that do not share a coding se-quence provided that they are completely contained withinthe candidate cluster
Interleaved candidate clusters contain protoclusters thatdo not share cluster-type-defining coding sequences buttheir core locations overlap (Figure 1B)
Neighbouring candidate clusters contain protoclusterswhich transitively overlap in their neighbourhoods (Figure1C)
Single candidate clusters (Figure 1D) exist for consistencyof access they contain only a single protocluster Note thatindividual protoclusters can be contained by more than onecandidate cluster (typically a neighbouring candidate clusterand one of single interleaved or chemical hybrid)
Each candidate cluster assignment is transitive for ex-ample if a protocluster would form a chemical hybrid witheach of two neighbouring protoclusters but these neigh-bours would not form a chemical hybrid on their own allthree together will still form a chemical hybrid candidatecluster
Improved user interface
A central aim of antiSMASH is to provide very detailedand specific information via an easy to use and under-stand user interface (UI) The UI remained principally un-changed from the initial release of antiSMASH in 2011 de-spite the increased functionality added with each new ver-sion In this version we have modernized the UI using up-dated web technologies that allow a better structuring ofthe result-content of the antiSMASH results pages For re-designing the UI it was important that the reliable andwell-established look-and-feel was conserved while also re-taining the ability to download the whole web-based resultsfolder and to display it locally in a variety of web-browsers
We and others (such as (40)) have realized that anti-SMASH results using the heuristic ClusterFinder algorithm(41) were more often than not wrongly interpreted At thesame time ClusterFinder contributed significantly to thecomputational workload For these reasons we decided to
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W84 Nucleic Acids Research 2019 Vol 47 Web Server issue
Figure 1 Candidate cluster types 1234 Greyyellow gene involved in protocluster AB (A) Chemical Hybrids Since cluster type A and cluster typeB share a CDS that defines those protoclusters they are classified as lsquochemical hybridrsquo (B) Interleaved Since none of the protoclusters share any definingCDS with any other protocluster it is not annotated as a chemical hybrid even though the biosynthetic product may or may not be The two protoclustersform an interleaved candidate clusters since the core of A overlaps with the core of B (C) Neighbouring Neighbouring candidate clusters are defined if theneighbourhoods of two protoclusters but not their cores overlap (D) Singles If protoclusters donrsquot have any overlaprelation with other protoclusters theterm single candidate cluster is assigned
remove this feature from the pubic antiSMASH web serverIt is of course still included in the download version of an-tiSMASH and can be enabled via the command line
In the Regions overview section (Figure 2) a graph-ical overview showing the location of the identified re-gions on the chromosomeplasmidscaffoldscontig is dis-played In the detailed view regions that are located oncontig-borders are now clearly labelled This often indi-cates that parts of the BGC are missing or that several sec-tions of a BGC are located on different contigs and aretherefore reported individually (for a more detailed discus-sion on this phenomenon please see (42)) For the firsttime antiSMASH 5 now offers interactive browsing of the
BGCs including selection of lsquofunctionalrsquo units ie core en-zymes transporters etc zooming to individual genes orregionscandidate clustersprotoclusters Details of the se-lection are now provided in side panels instead of pop-upwindows using a hierarchical view of the analysis sum-maries (which can be expanded by clicking lsquo+rsquo) to provideadditional details For the display of the PKSNRPS do-main organization the user now can choose whether tolimit the shown domains to the currently selected genes orjust display the results of the selected gene(s)enzyme(s)Furthermore the information is now organized in lsquotabsrsquothat do not require scrolling down along an often very longresults page
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W85
Figure 2 Screenshot of the antiSMASH 5 user interface (example NCBI-acc Y16952 balhimycin BGC) The new region overview now allowspanningzooming The candidate cluster and protocluster boxes are explained in the lsquonew region conceptrsquo section above Information about the currentlyselected gene are displayed at the right lsquoGene detailsrsquo panel For PKS or NRPS regions the detailed domain annotation is displayed by pressing the tabsusers can select the domain overview (shown) or the ClusterBlast KnownClusterBlast or SubClusterBlast results At the right the structure predictionand details of specificity predictions are displayed upon selecting the plus sign
CODE REFACTORING AND SPEED-UP
Large parts of the pre-antiSMASH 5 code base were stillderived from antiSMASH version 1 which was released in2011 In order to maintain future compatibility the anti-SMASH code base had to be migrated from python 27which will reach end-of-life in 2020 to the current versions35ndash37 As this transition required significant modificationto the antiSMASH code we decided to take this as a chanceto completely rewrite the software with a special considera-tion on runtime code stability and code maintainability Aunit test and integration test framework was implementedthat covers most parts of the antiSMASH 5 code allowing amuch easier debugging andndashndashmost importantlyndashndashextensionof the code while at the same time ensuring that new fea-tures do not negatively impact the results of existing mod-ules For some of the externally contributed modules (Sand-puma trans-AT PKS comparisons terpene PrediCAT) ourcontributors are currently preparing updated and compli-ant versions which will be added to antiSMASH 5 in mi-nor releases once they are finished and tested Like the ear-lier antiSMASH versions antiSMASH 5 provides the anal-ysis results in an interactive webpage and richly annotatedGenBank-format files for the whole genome and individualclusters As a new feature in version 5 all data are also avail-able as a computer readable JSON container which allowsthird party tools to easily process antiSMASH annotationsThis JSON output has superseded some other output typessuch as BioSynML and XLS
In addition to the advantages mentioned above the coderefactoring and cleanup has also led to a significant speed
increase of the new version by a factor of 4-11times (dependingon genome and selected options) instead of waiting timesof several hours antiSMASH results are now usually deliv-ered within 30ndash40 min after the start of the job for a typicalsubmission at the public web server
CONCLUSIONS AND FUTURE PERSPECTIVES
With the help of software like antiSMASH genome miningfor specialized metabolites has established itself as a com-plementary approach for the identification of novel metabo-lites which is routinely used within the natural products re-search community and increasingly applied in related fieldssuch as metagenomics environmental biology or metabolicengineering With the improvements to the antiSMASHuser interface and performance we keep pace with these de-velopments Furthermore the complete refactoring of theantiSMASH 5 code base will allow us to increasingly useantiSMASH as a tool that provides analysis data on whichother software can perform additional analyses
DATA AVAILABILITY
antiSMASH is available from httpsantismashsecondarymetabolitesorg (bacterial version) orhttpsfungismashsecondarymetabolitesorg (fun-gal version) The antiSMASH documentationincluding a PDF user guide is available fromhttpsdocsantismashsecondarymetabolitesorg Thesewebsites are free and open to all users and there is no loginrequirement The antiSMASH source code is available from
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W86 Nucleic Acids Research 2019 Vol 47 Web Server issue
httpsgithubcomantismashantismash antiSMASH isalso available via Docker
ACKNOWLEDGEMENTS
We thank Justin JJ van der Hooft for critical comments onthe manuscript and providing documentation and EmiliaPalazzotto and Tetiana Gren for helpful discussions anduser testing of the new features
FUNDING
Novo Nordisk Foundation [NNF10CC1016517 toSYLTW NNF16OC0021746 to TW] Center forMicrobial Secondary Metabolites (CeMiSt) Danish Na-tional Research Foundation [DNRF137 to TW] Reinholdand Maria Teufel Foundation (to KS) Funding for openaccess charge The Novo Nordisk FoundationConflict of interest statement None declared
REFERENCES1 NewmanDJ and CraggGM (2016) Natural products as sources of
new drugs from 1981 to 2014 J Nat Prod 79 629ndash6612 van der MeijA WorsleySF HutchingsMI and van WezelGP
(2017) Chemical ecology of antibiotic production by actinomycetesFEMS Microbiol Rev 41 392ndash416
3 ZiemertN AlanjaryM and WeberT (2016) The evolution ofgenome mining in microbes - a review Nat Prod Rep 33 988ndash1005
4 WeberT RauschC LopezP HoofI GaykovaV HusonDHand WohllebenW (2009) CLUSEAN a computer-based frameworkfor the automated analysis of bacterial secondary metabolitebiosynthetic gene clusters J Biotechnol 140 13ndash17
5 SkinniderMA MerwinNJ JohnstonCW and MagarveyNA(2017) PRISM 3 expanded prediction of natural product chemicalstructures from microbial genomes Nucleic Acids Res 45W49ndashW54
6 MedemaMH BlinK CimermancicP de JagerV ZakrzewskiPFischbachMA WeberT TakanoE and BreitlingR (2011)antiSMASH rapid identification annotation and analysis ofsecondary metabolite biosynthesis gene clusters in bacterial andfungal genome sequences Nucleic Acids Res 39 W339ndashW346
7 BlinK MedemaMH KazempourD FischbachMABreitlingR TakanoE and WeberT (2013) antiSMASH 20ndashndashaversatile platform for genome mining of secondary metaboliteproducers Nucleic Acids Res 41 W204ndashW212
8 WeberT BlinK DuddelaS KrugD KimHU BruccoleriRLeeSY FischbachMA MullerR WohllebenW et al (2015)antiSMASH 30ndashndasha comprehensive resource for the genome miningof biosynthetic gene clusters Nucleic Acids Res 43 W237ndashW243
9 BlinK WolfT ChevretteMG LuX SchwalenCJKautsarSA Suarez DuranHG de Los SantosELC KimHUNaveM et al (2017) antiSMASH 40-improvements in chemistryprediction and gene cluster boundary identification Nucleic AcidsRes 45 W36ndashW41
10 KautsarSA Suarez DuranHG BlinK OsbournA andMedemaMH (2017) plantiSMASH automated identificationannotation and expression analysis of plant biosynthetic geneclusters Nucleic Acids Res 45 W55ndashW63
11 BlinK KazempourD WohllebenW and WeberT (2014)Improved lanthipeptide detection and prediction for antiSMASHPLoS One 9 e89420
12 VillebroR ShawS BlinK and WeberT (2019) Sequence-basedclassification of type II polyketide synthase biosynthetic gene clustersfor antiSMASH J Ind Microbiol Biotechnol 46 469ndash475
13 BlinK MedemaMH KottmannR LeeSY and WeberT (2017)The antiSMASH database a comprehensive database of microbialsecondary metabolite biosynthetic gene clusters Nucleic Acids Res45 D555ndashD559
14 BlinK Pascal AndreuV de Los SantosELC Del CarratoreFLeeSY MedemaMH and WeberT (2019) The antiSMASHdatabase version 2 a comprehensive resource on secondarymetabolite biosynthetic gene clusters Nucleic Acids Res 47D625ndashD630
15 MedemaMH PaalvastY NguyenDD MelnikADorresteinPC TakanoE and BreitlingR (2014) Pep2Pathautomated mass spectrometry-guided genome mining of peptidicnatural products PLoS Comput Biol 10 e1003822
16 AlanjaryM KronmillerB AdamekM BlinK WeberTHusonD PhilmusB and ZiemertN (2017) The AntibioticResistant Target Seeker (ARTS) an exploration engine for antibioticcluster prioritization and novel drug target discovery Nucleic AcidsRes 45 W42ndashW48
17 BlinK PedersenLE WeberT and LeeSY (2016) CRISPy-webAn online resource to design sgRNAs for CRISPR applicationsSynth Syst Biotechnol 1 118ndash121
18 ShirleyWA KelleyBP PotierY KoschwanezJH BruccoleriRand TarselliM (2018) Unzipping natural products improved naturalproduct structure predictions by ensemble modeling and fingerprintmatching ChemRxiv doi httpdoi1026434chemrxiv6863864 26July 2018 preprint not peer reviewed
19 Navarro-MunozJ Selem-MojicaN MullowneyM KautsarSTryonJ ParkinsonE De Los SantosE YeongMCruz-MoralesP AbubuckerS et al (2018) A computationalframework for systematic exploration of biosynthetic diversity fromlarge-scale genomic data bioRxiv doi httpdoi101101445270 17October 2018 preprint not peer reviewed
20 FinnRD CoggillP EberhardtRY EddySR MistryJMitchellAL PotterSC PuntaM QureshiMSangrador-VegasA et al (2016) The Pfam protein families databasetowards a more sustainable future Nucleic Acids Res 44D279ndashD285
21 LetunicI and BorkP (2018) 20 years of the SMART protein domainannotation resource Nucleic Acids Res 46 D493ndashD496
22 de JongA van HeelAJ KokJ and KuipersOP (2010) BAGEL2mining for bacteriocins in genomic data Nucleic Acids Res 38W647ndashW651
23 YadavG GokhaleRS and MohantyD (2009) Towards predictionof metabolic products of polyketide synthases an in silico analysisPLoS Comput Biol 5 e1000351
24 CraigJW CherryMA and BradySF (2011) Long-chain N-acylamino acid synthases are linked to the putativePEP-CTERMexosortase protein-sorting system in Gram-negativebacteria J Bacteriol 193 5707ndash5715
25 RobinsonSL ChristensonJK and WackettLP (2018)Biosynthesis and chemical diversity of -lactone natural productsNat Prod Rep 36 458ndash475
26 AgarwalV BlantonJM PodellS TatonA SchornMABuschJ LinZ SchmidtEW JensenPR PaulVJ et al (2017)Metagenomic discovery of polybrominated diphenyl etherbiosynthesis by marine sponges Nat Chem Biol 13 537ndash543
27 SosioM GaspariE IorioM PessinaS MedemaMHBernasconiA SimoneM MaffioliSI EbrightRH andDonadioS (2018) Analysis of the Pseudouridimycin biosyntheticpathway provides Insights into the formation of C-nucleosideantibiotics Cell Chem Biol 25 540ndash549
28 BauerJS GhequireMGK NettM JostenM SahlH-G DeMotR and GrossH (2015) Biosynthetic origin of the antibioticpseudopyronines A and B in Pseudomonas putida BW11M1Chembiochem 16 2491ndash2497
29 LuoH Hallen-AdamsHE Scott-CraigJS and WaltonJD (2012)Ribosomal biosynthesis of -amanitin in Galerina marginata FungalGenet Biol 49 123ndash129
30 NaganoN UmemuraM IzumikawaM KawanoJ IshiiTKikuchiM TomiiK KumagaiT YoshimiA MachidaM et al(2016) Class of cyclic ribosomal peptide synthetic genes infilamentous fungi Fungal Genet Biol 86 58ndash70
31 DingW LiuW-Q JiaY LiY van der DonkWA and ZhangQ(2016) Biosynthetic investigation of phomopsins reveals a widespreadpathway for ribosomal natural products in Ascomycetes Proc NatlAcad Sci USA 113 3521ndash3526
32 BushinLB ClarkKA PelczerI and SeyedsayamdostMR (2018)Charting an unexplored streptococcal biosynthetic landscape reveals
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W87
a unique peptide cyclization motif J Am Chem Soc 14017674ndash17684
33 CarusoA BushinLB ClarkKA MartinieRJ andSeyedsayamdostMR (2019) A radical approach to enzymatic-Thioether bond formation J Am Chem Soc 141 990ndash997
34 GibsonMK ForsbergKJ and DantasG (2015) Improvedannotation of antibiotic resistance determinants reveals microbialresistomes cluster by ecology ISME J 9 207ndash216
35 YiG SzeSH and ThonMR (2007) Identifying clusters offunctionally related genes in genomes Bioinformatics 23 1053ndash1060
36 InglisDO BinkleyJ SkrzypekMS ArnaudMBCerqueiraGC ShahP WymoreF WortmanJR and SherlockG(2013) Comprehensive annotation of secondary metabolitebiosynthetic genes and gene clusters of Aspergillus nidulans Afumigatus A niger and A oryzae BMC Microbiol 13 91
37 The Gene Ontology Consortium (2016) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338
38 MedemaMH KottmannR YilmazP CummingsMBigginsJB BlinK de BruijnI ChooiYH ClaesenJ
CoatesRC et al (2015) Minimum information about a biosyntheticgene cluster Nat Chem Biol 11 625ndash631
39 MedemaMH TakanoE and BreitlingR (2013) Detectingsequence homology at the gene cluster level with MultiGeneBlastMol Biol Evol 30 1218ndash1223
40 BaltzRH (2018) Natural product drug discovery in the genomic erarealities conjectures misconceptions and opportunities J IndMicrobiol Biotechnol 46 281ndash299
41 CimermancicP MedemaMH ClaesenJ KuritaK WielandBrownLC MavrommatisK PatiA GodfreyPA KoehrsenMClardyJ et al (2014) Insights into secondary metabolism from aglobal analysis of prokaryotic biosynthetic gene clusters Cell 158412ndash421
42 BlinK KimHU MedemaMH and WeberT (2017) Recentdevelopment of antiSMASH and other computational approaches tomine secondary metabolite biosynthetic gene clusters BriefBioinform doi101093bibbbx146
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W82 Nucleic Acids Research 2019 Vol 47 Web Server issue
visible to the end users such as extended and improvedBGC detection and analysis capabilities and a mod-ernized and improved User Interface (see below) anti-SMASH version 5 was completely rewritten in Pythonversion 3 and the code was restructured to increaseperformance reliability and ease of maintenance Thishas led to a significant speed increase of the pipelineA complete list of antiSMASH 5 features is includedin the antiSMASH documentation httpsdocsantismashsecondarymetabolitesorgantiSMASH5features
NEW FEATURES AND UPDATES
New gene cluster classes and refinement of cluster detectionrules
The most widely used and recommended mode to detectBGCs in genomic data is via manually curated and val-idated gene cluster rules These are based on identifyingco-occurring conserved core enzymes in the genome usingHMM-profiles that were derived from Pfam (20) SMART(21) BAGEL (22) or Yadav et al (23) or that were cre-ated specifically for antiSMASH While antiSMASH ver-sion 4 supported the rule-based detection of 44 differ-ent biosynthetic types antiSMASH 5 now includes rulesfor 52 different BGC types In version 5 new rules wereadded to detect BGCs encoding the biosynthesis of N-acylamino acids (24) -lactones (25) polybrominated diphenylethers (26) C-nucleosides (27) pseudopyronines (28) fun-gal RiPPs (29ndash31) and RaS-RiPPs (3233) Furthermore anew lsquonrps-likersquo rule was defined for NRPS-fragments ieatypical NRPSs that donrsquot have the typical C-A-T modulearchitecture The previous lsquootherksrsquo rule was split into tworules to individually assign heterocyst glycolipid synthase-like clusters and other atypical PKSs In addition somerules were improved based on user case reports The rulesdescribing lanthipeptides and trans-AT type I PKS were re-fined to reduce the number of false positive hybrid calls onother cluster types For trans-AT- type I PKS and type IIPKS we increased the size of the cluster cutoffs to capturepreviously missed tailoring enzymes in published clustersThe rule for linear azoleazoline-containing peptides wasmade more generic to better cover the range of describedclusters
The rule describing microcin clusters was removed as mi-crocins are a class of RiPPs defined via their productionin Enterobacteriaceae and are already captured by one ofour other specific RiPP cluster rules depending on their re-spective biosynthesis pathway (eg microcin J25-like RiPPswere previously covered by the old microcin cluster rules butchemically are lasso peptides while microcin B17 is a linearazol(in)e-containing peptide)
Improved type II PKS prediction
Bacterial type II PKS BGCs code for the biosynthesis ofaromatic polyketides such as the antibiotic tetracycline orthe anti-tumour drug doxorubicin From the beginning an-tiSMASH has had rules that were able to detect type IIPKS BGCs by checking for the presence of the KSα andKSβCLF component of the minimal PKS However no
detailed prediction methods had been added since anti-SMASHrsquos first version In antiSMASH 5 we introduce anew PKS II analysis module (12) which uses a collection ofmanually curated HMMs to predict potential starter unitsthe number of elongation cycles (and thus a rough esti-mation of the putative molecular weight of the core com-pound) cyclization patterns and some conserved type IIPKS specific tailoring reactions This module is automat-ically triggered whenever a type II PKS BGC is detected
Annotation of resistance genes via Resfams
The Resfams database (34) is a curated database of proteinfamilies with confirmed antibiotic resistance function an-tiSMASH 5 uses the profile Hidden Markov Models (pH-MMs) from Resfams to annotate potential resistance genesfound in predicted gene regions Potential resistance gene-hits are displayed in the lsquogene detailsrsquo panel along with otherfunctional annotations
GO-term annotations
The Gene Ontology (GO) is a controlled vocabulary for de-scribing biological processes molecular functions and cel-lular components in a consistent way to enable comparisonof these between different species Amongst its wide rangeof uses the GO has been used to predict gene clusters ineukaryotes and bacteria (35) and in conjunction with anti-SMASH to refine cluster boundaries in antiSMASH out-put for Aspergillus species (36)
To facilitate these and other GO-based analyses anti-SMASH 5 includes an option to automatically annotateGO terms on Pfam domains This functionality makes useof the fact that GO terms may be linked not only to spe-cific gene products but also to other means of classifica-tion in so-called lsquomappingsrsquo (httpgeneontologyorgpagedownload-mappings) As antiSMASH can automaticallyannotate Pfam domains the GO annotation functionalitymakes use of the Pfam to GO mapping supplied by theGene Ontology Consortiumrsquos website (37) If the ID of apredicted Pfam domain in an antiSMASH record is presentin the Pfam to GO mapping the respective GO terms areassigned and presented in the lsquogene detailsrsquo panel
Link to the antiSMASH database
antiSMASH provides options to search for similar geneclusters in public datasets As already implemented in previ-ous versions of the software the KnownClusterBlast func-tionality searches each identified region against the man-ually curated MIBiG (38) repository The KnownClus-terBlast and ClusterBlast search functions use an algo-rithm first described in antiSMASH 1 (6) which also isin use in a generalized version in MultiGeneBlast (39)In the previous versions of antiSMASH the ClusterBlastdatabase was generated by scripts that used the antiSMASHBGC detection logic on sequences downloaded from theNCBI GenbankRefSeq databases As version 2 of theantiSMASH database now also contains BGCs of draftgenomes (14) starting with antiSMASH 5 the ClusterBlastdatabases will be directly generated from the new anti-SMASH database and complemented with individual BGC
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W83
records that were submitted to NCBI outside of whole-genome submissions This provides several advantages Theabundance of entries for selected generaspecies in the pub-lic databases (and thus also in the previous ClusterBlastdatabase) is strongly skewed towards clinically or industri-ally relevant organisms There are for example more than15 000 assemblies for Escherichia coli deposited at NCBIFor the antiSMASH database a sequence-based dereplica-tion workflow was established (14) that reduced the num-ber of redundant entries with very high sequence similarityThus the updated ClusterBlast database contains fewer en-tries than the previous release despite the increase in pub-licly available sequence data This decrease has resulted inreduced computation times while simultaneously providingmore relevant hits Furthermore as the entries of the Clus-terBlast database are directly related to the BGCs in the an-tiSMASH database a link to the respective BGC is now in-cluded for all ClusterBlast hits promptly directing the userto the detailed report of the similar gene clusters
New lsquoregionrsquo concept
In previous versions antiSMASH referred to all co-locatedhybrid and independent BGCs with the single label lsquoclusterrsquoIn many cases this led to confusing structure predictionswhen distinct BGCs are encoded side-by-side For examplemany Streptomyces plasmids exist for which all BGCs lie soclose to each other that all were joined into a single largelsquoclusterrsquo In order to better distinguish the different biolog-ical options that lead to BGCs antiSMASH 5 introducessome new terminology
The definitions now used in antiSMASH 5 areCore The minimum area containing one or more genes
that code for enzymes for a single BGC type that are de-tected by the manually curated detection rules These genesdo not have to be contiguous but can be within a certaincutoff distance as defined by the detection rule for the BGCtype in question
Neighbourhood Distance up- and downstream of thecluster core that is used to find tailoring genesenzymesthe neighbourhood distances for the individual biosynthetictypes were empirically determined and defined in the detec-tion rules
Protocluster Contains core + neighbourhoods at bothsides of the core each protocluster always will have one sin-gle product type (for example NRPS) Protoclusters mayoverlap partially or completely with other protoclusters Inthe result webpage protoclusters are displayed as boxesabove the gene arrows The cores are shown as solid colourboxes the neighbourhoods are the half-transparent areasaround the cores
Candidate cluster Contains one or more protoclustersthe candidate clusters are defined as described below Thesedefinitions better allow modelling of hybrid clusters suchas PKSNRPS hybrids which combine two or more differ-ent biosynthetic classes (as identified in the detection rules)or cases where one class is used to biosynthesize a precur-sor for a second class An example of the latter is found inglycopeptide biosynthesis where one of the amino acids issynthesized by a type III PKS which is then incorporatedinto the product by a NRPS Candidate clusters may overlap
partially or completely with other candidate clusters In theresult webpage candidate clusters are shown as boxes abovethe protoclusters
Region Contains one or more candidate clusters The re-gions in antiSMASH 5 correspond to the entities calledlsquoclustersrsquo in antiSMASH 1 ndash 4 and now constitute what isdisplayed on a page of the results webpage Sometimes a re-gion will contain multiple mutually exclusive candidate clus-ters in such cases comparative genomic analysis andor ex-perimental work is required to assess which of these candi-date clusters constitute actual BGCs Regions will not over-lap with each other At least one of the contained candidateclusters will cover the full length of the region
There are four kinds of candidate clusters chemical hy-brids interleaved neighbouring and single
Chemical hybrid candidate clusters contain at least twoprotoclusters that share at least one gene that codes for en-zymes of two or more separate BGC types (eg a single genecoding for type I PKS and NRPS modules) (Figure 1A) Anexample of this type are hybrid PKSNRPSs Please notethat this type of candidate cluster can also include protoclus-ters within that shared range that do not share a coding se-quence provided that they are completely contained withinthe candidate cluster
Interleaved candidate clusters contain protoclusters thatdo not share cluster-type-defining coding sequences buttheir core locations overlap (Figure 1B)
Neighbouring candidate clusters contain protoclusterswhich transitively overlap in their neighbourhoods (Figure1C)
Single candidate clusters (Figure 1D) exist for consistencyof access they contain only a single protocluster Note thatindividual protoclusters can be contained by more than onecandidate cluster (typically a neighbouring candidate clusterand one of single interleaved or chemical hybrid)
Each candidate cluster assignment is transitive for ex-ample if a protocluster would form a chemical hybrid witheach of two neighbouring protoclusters but these neigh-bours would not form a chemical hybrid on their own allthree together will still form a chemical hybrid candidatecluster
Improved user interface
A central aim of antiSMASH is to provide very detailedand specific information via an easy to use and under-stand user interface (UI) The UI remained principally un-changed from the initial release of antiSMASH in 2011 de-spite the increased functionality added with each new ver-sion In this version we have modernized the UI using up-dated web technologies that allow a better structuring ofthe result-content of the antiSMASH results pages For re-designing the UI it was important that the reliable andwell-established look-and-feel was conserved while also re-taining the ability to download the whole web-based resultsfolder and to display it locally in a variety of web-browsers
We and others (such as (40)) have realized that anti-SMASH results using the heuristic ClusterFinder algorithm(41) were more often than not wrongly interpreted At thesame time ClusterFinder contributed significantly to thecomputational workload For these reasons we decided to
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W84 Nucleic Acids Research 2019 Vol 47 Web Server issue
Figure 1 Candidate cluster types 1234 Greyyellow gene involved in protocluster AB (A) Chemical Hybrids Since cluster type A and cluster typeB share a CDS that defines those protoclusters they are classified as lsquochemical hybridrsquo (B) Interleaved Since none of the protoclusters share any definingCDS with any other protocluster it is not annotated as a chemical hybrid even though the biosynthetic product may or may not be The two protoclustersform an interleaved candidate clusters since the core of A overlaps with the core of B (C) Neighbouring Neighbouring candidate clusters are defined if theneighbourhoods of two protoclusters but not their cores overlap (D) Singles If protoclusters donrsquot have any overlaprelation with other protoclusters theterm single candidate cluster is assigned
remove this feature from the pubic antiSMASH web serverIt is of course still included in the download version of an-tiSMASH and can be enabled via the command line
In the Regions overview section (Figure 2) a graph-ical overview showing the location of the identified re-gions on the chromosomeplasmidscaffoldscontig is dis-played In the detailed view regions that are located oncontig-borders are now clearly labelled This often indi-cates that parts of the BGC are missing or that several sec-tions of a BGC are located on different contigs and aretherefore reported individually (for a more detailed discus-sion on this phenomenon please see (42)) For the firsttime antiSMASH 5 now offers interactive browsing of the
BGCs including selection of lsquofunctionalrsquo units ie core en-zymes transporters etc zooming to individual genes orregionscandidate clustersprotoclusters Details of the se-lection are now provided in side panels instead of pop-upwindows using a hierarchical view of the analysis sum-maries (which can be expanded by clicking lsquo+rsquo) to provideadditional details For the display of the PKSNRPS do-main organization the user now can choose whether tolimit the shown domains to the currently selected genes orjust display the results of the selected gene(s)enzyme(s)Furthermore the information is now organized in lsquotabsrsquothat do not require scrolling down along an often very longresults page
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W85
Figure 2 Screenshot of the antiSMASH 5 user interface (example NCBI-acc Y16952 balhimycin BGC) The new region overview now allowspanningzooming The candidate cluster and protocluster boxes are explained in the lsquonew region conceptrsquo section above Information about the currentlyselected gene are displayed at the right lsquoGene detailsrsquo panel For PKS or NRPS regions the detailed domain annotation is displayed by pressing the tabsusers can select the domain overview (shown) or the ClusterBlast KnownClusterBlast or SubClusterBlast results At the right the structure predictionand details of specificity predictions are displayed upon selecting the plus sign
CODE REFACTORING AND SPEED-UP
Large parts of the pre-antiSMASH 5 code base were stillderived from antiSMASH version 1 which was released in2011 In order to maintain future compatibility the anti-SMASH code base had to be migrated from python 27which will reach end-of-life in 2020 to the current versions35ndash37 As this transition required significant modificationto the antiSMASH code we decided to take this as a chanceto completely rewrite the software with a special considera-tion on runtime code stability and code maintainability Aunit test and integration test framework was implementedthat covers most parts of the antiSMASH 5 code allowing amuch easier debugging andndashndashmost importantlyndashndashextensionof the code while at the same time ensuring that new fea-tures do not negatively impact the results of existing mod-ules For some of the externally contributed modules (Sand-puma trans-AT PKS comparisons terpene PrediCAT) ourcontributors are currently preparing updated and compli-ant versions which will be added to antiSMASH 5 in mi-nor releases once they are finished and tested Like the ear-lier antiSMASH versions antiSMASH 5 provides the anal-ysis results in an interactive webpage and richly annotatedGenBank-format files for the whole genome and individualclusters As a new feature in version 5 all data are also avail-able as a computer readable JSON container which allowsthird party tools to easily process antiSMASH annotationsThis JSON output has superseded some other output typessuch as BioSynML and XLS
In addition to the advantages mentioned above the coderefactoring and cleanup has also led to a significant speed
increase of the new version by a factor of 4-11times (dependingon genome and selected options) instead of waiting timesof several hours antiSMASH results are now usually deliv-ered within 30ndash40 min after the start of the job for a typicalsubmission at the public web server
CONCLUSIONS AND FUTURE PERSPECTIVES
With the help of software like antiSMASH genome miningfor specialized metabolites has established itself as a com-plementary approach for the identification of novel metabo-lites which is routinely used within the natural products re-search community and increasingly applied in related fieldssuch as metagenomics environmental biology or metabolicengineering With the improvements to the antiSMASHuser interface and performance we keep pace with these de-velopments Furthermore the complete refactoring of theantiSMASH 5 code base will allow us to increasingly useantiSMASH as a tool that provides analysis data on whichother software can perform additional analyses
DATA AVAILABILITY
antiSMASH is available from httpsantismashsecondarymetabolitesorg (bacterial version) orhttpsfungismashsecondarymetabolitesorg (fun-gal version) The antiSMASH documentationincluding a PDF user guide is available fromhttpsdocsantismashsecondarymetabolitesorg Thesewebsites are free and open to all users and there is no loginrequirement The antiSMASH source code is available from
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W86 Nucleic Acids Research 2019 Vol 47 Web Server issue
httpsgithubcomantismashantismash antiSMASH isalso available via Docker
ACKNOWLEDGEMENTS
We thank Justin JJ van der Hooft for critical comments onthe manuscript and providing documentation and EmiliaPalazzotto and Tetiana Gren for helpful discussions anduser testing of the new features
FUNDING
Novo Nordisk Foundation [NNF10CC1016517 toSYLTW NNF16OC0021746 to TW] Center forMicrobial Secondary Metabolites (CeMiSt) Danish Na-tional Research Foundation [DNRF137 to TW] Reinholdand Maria Teufel Foundation (to KS) Funding for openaccess charge The Novo Nordisk FoundationConflict of interest statement None declared
REFERENCES1 NewmanDJ and CraggGM (2016) Natural products as sources of
new drugs from 1981 to 2014 J Nat Prod 79 629ndash6612 van der MeijA WorsleySF HutchingsMI and van WezelGP
(2017) Chemical ecology of antibiotic production by actinomycetesFEMS Microbiol Rev 41 392ndash416
3 ZiemertN AlanjaryM and WeberT (2016) The evolution ofgenome mining in microbes - a review Nat Prod Rep 33 988ndash1005
4 WeberT RauschC LopezP HoofI GaykovaV HusonDHand WohllebenW (2009) CLUSEAN a computer-based frameworkfor the automated analysis of bacterial secondary metabolitebiosynthetic gene clusters J Biotechnol 140 13ndash17
5 SkinniderMA MerwinNJ JohnstonCW and MagarveyNA(2017) PRISM 3 expanded prediction of natural product chemicalstructures from microbial genomes Nucleic Acids Res 45W49ndashW54
6 MedemaMH BlinK CimermancicP de JagerV ZakrzewskiPFischbachMA WeberT TakanoE and BreitlingR (2011)antiSMASH rapid identification annotation and analysis ofsecondary metabolite biosynthesis gene clusters in bacterial andfungal genome sequences Nucleic Acids Res 39 W339ndashW346
7 BlinK MedemaMH KazempourD FischbachMABreitlingR TakanoE and WeberT (2013) antiSMASH 20ndashndashaversatile platform for genome mining of secondary metaboliteproducers Nucleic Acids Res 41 W204ndashW212
8 WeberT BlinK DuddelaS KrugD KimHU BruccoleriRLeeSY FischbachMA MullerR WohllebenW et al (2015)antiSMASH 30ndashndasha comprehensive resource for the genome miningof biosynthetic gene clusters Nucleic Acids Res 43 W237ndashW243
9 BlinK WolfT ChevretteMG LuX SchwalenCJKautsarSA Suarez DuranHG de Los SantosELC KimHUNaveM et al (2017) antiSMASH 40-improvements in chemistryprediction and gene cluster boundary identification Nucleic AcidsRes 45 W36ndashW41
10 KautsarSA Suarez DuranHG BlinK OsbournA andMedemaMH (2017) plantiSMASH automated identificationannotation and expression analysis of plant biosynthetic geneclusters Nucleic Acids Res 45 W55ndashW63
11 BlinK KazempourD WohllebenW and WeberT (2014)Improved lanthipeptide detection and prediction for antiSMASHPLoS One 9 e89420
12 VillebroR ShawS BlinK and WeberT (2019) Sequence-basedclassification of type II polyketide synthase biosynthetic gene clustersfor antiSMASH J Ind Microbiol Biotechnol 46 469ndash475
13 BlinK MedemaMH KottmannR LeeSY and WeberT (2017)The antiSMASH database a comprehensive database of microbialsecondary metabolite biosynthetic gene clusters Nucleic Acids Res45 D555ndashD559
14 BlinK Pascal AndreuV de Los SantosELC Del CarratoreFLeeSY MedemaMH and WeberT (2019) The antiSMASHdatabase version 2 a comprehensive resource on secondarymetabolite biosynthetic gene clusters Nucleic Acids Res 47D625ndashD630
15 MedemaMH PaalvastY NguyenDD MelnikADorresteinPC TakanoE and BreitlingR (2014) Pep2Pathautomated mass spectrometry-guided genome mining of peptidicnatural products PLoS Comput Biol 10 e1003822
16 AlanjaryM KronmillerB AdamekM BlinK WeberTHusonD PhilmusB and ZiemertN (2017) The AntibioticResistant Target Seeker (ARTS) an exploration engine for antibioticcluster prioritization and novel drug target discovery Nucleic AcidsRes 45 W42ndashW48
17 BlinK PedersenLE WeberT and LeeSY (2016) CRISPy-webAn online resource to design sgRNAs for CRISPR applicationsSynth Syst Biotechnol 1 118ndash121
18 ShirleyWA KelleyBP PotierY KoschwanezJH BruccoleriRand TarselliM (2018) Unzipping natural products improved naturalproduct structure predictions by ensemble modeling and fingerprintmatching ChemRxiv doi httpdoi1026434chemrxiv6863864 26July 2018 preprint not peer reviewed
19 Navarro-MunozJ Selem-MojicaN MullowneyM KautsarSTryonJ ParkinsonE De Los SantosE YeongMCruz-MoralesP AbubuckerS et al (2018) A computationalframework for systematic exploration of biosynthetic diversity fromlarge-scale genomic data bioRxiv doi httpdoi101101445270 17October 2018 preprint not peer reviewed
20 FinnRD CoggillP EberhardtRY EddySR MistryJMitchellAL PotterSC PuntaM QureshiMSangrador-VegasA et al (2016) The Pfam protein families databasetowards a more sustainable future Nucleic Acids Res 44D279ndashD285
21 LetunicI and BorkP (2018) 20 years of the SMART protein domainannotation resource Nucleic Acids Res 46 D493ndashD496
22 de JongA van HeelAJ KokJ and KuipersOP (2010) BAGEL2mining for bacteriocins in genomic data Nucleic Acids Res 38W647ndashW651
23 YadavG GokhaleRS and MohantyD (2009) Towards predictionof metabolic products of polyketide synthases an in silico analysisPLoS Comput Biol 5 e1000351
24 CraigJW CherryMA and BradySF (2011) Long-chain N-acylamino acid synthases are linked to the putativePEP-CTERMexosortase protein-sorting system in Gram-negativebacteria J Bacteriol 193 5707ndash5715
25 RobinsonSL ChristensonJK and WackettLP (2018)Biosynthesis and chemical diversity of -lactone natural productsNat Prod Rep 36 458ndash475
26 AgarwalV BlantonJM PodellS TatonA SchornMABuschJ LinZ SchmidtEW JensenPR PaulVJ et al (2017)Metagenomic discovery of polybrominated diphenyl etherbiosynthesis by marine sponges Nat Chem Biol 13 537ndash543
27 SosioM GaspariE IorioM PessinaS MedemaMHBernasconiA SimoneM MaffioliSI EbrightRH andDonadioS (2018) Analysis of the Pseudouridimycin biosyntheticpathway provides Insights into the formation of C-nucleosideantibiotics Cell Chem Biol 25 540ndash549
28 BauerJS GhequireMGK NettM JostenM SahlH-G DeMotR and GrossH (2015) Biosynthetic origin of the antibioticpseudopyronines A and B in Pseudomonas putida BW11M1Chembiochem 16 2491ndash2497
29 LuoH Hallen-AdamsHE Scott-CraigJS and WaltonJD (2012)Ribosomal biosynthesis of -amanitin in Galerina marginata FungalGenet Biol 49 123ndash129
30 NaganoN UmemuraM IzumikawaM KawanoJ IshiiTKikuchiM TomiiK KumagaiT YoshimiA MachidaM et al(2016) Class of cyclic ribosomal peptide synthetic genes infilamentous fungi Fungal Genet Biol 86 58ndash70
31 DingW LiuW-Q JiaY LiY van der DonkWA and ZhangQ(2016) Biosynthetic investigation of phomopsins reveals a widespreadpathway for ribosomal natural products in Ascomycetes Proc NatlAcad Sci USA 113 3521ndash3526
32 BushinLB ClarkKA PelczerI and SeyedsayamdostMR (2018)Charting an unexplored streptococcal biosynthetic landscape reveals
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W87
a unique peptide cyclization motif J Am Chem Soc 14017674ndash17684
33 CarusoA BushinLB ClarkKA MartinieRJ andSeyedsayamdostMR (2019) A radical approach to enzymatic-Thioether bond formation J Am Chem Soc 141 990ndash997
34 GibsonMK ForsbergKJ and DantasG (2015) Improvedannotation of antibiotic resistance determinants reveals microbialresistomes cluster by ecology ISME J 9 207ndash216
35 YiG SzeSH and ThonMR (2007) Identifying clusters offunctionally related genes in genomes Bioinformatics 23 1053ndash1060
36 InglisDO BinkleyJ SkrzypekMS ArnaudMBCerqueiraGC ShahP WymoreF WortmanJR and SherlockG(2013) Comprehensive annotation of secondary metabolitebiosynthetic genes and gene clusters of Aspergillus nidulans Afumigatus A niger and A oryzae BMC Microbiol 13 91
37 The Gene Ontology Consortium (2016) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338
38 MedemaMH KottmannR YilmazP CummingsMBigginsJB BlinK de BruijnI ChooiYH ClaesenJ
CoatesRC et al (2015) Minimum information about a biosyntheticgene cluster Nat Chem Biol 11 625ndash631
39 MedemaMH TakanoE and BreitlingR (2013) Detectingsequence homology at the gene cluster level with MultiGeneBlastMol Biol Evol 30 1218ndash1223
40 BaltzRH (2018) Natural product drug discovery in the genomic erarealities conjectures misconceptions and opportunities J IndMicrobiol Biotechnol 46 281ndash299
41 CimermancicP MedemaMH ClaesenJ KuritaK WielandBrownLC MavrommatisK PatiA GodfreyPA KoehrsenMClardyJ et al (2014) Insights into secondary metabolism from aglobal analysis of prokaryotic biosynthetic gene clusters Cell 158412ndash421
42 BlinK KimHU MedemaMH and WeberT (2017) Recentdevelopment of antiSMASH and other computational approaches tomine secondary metabolite biosynthetic gene clusters BriefBioinform doi101093bibbbx146
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W83
records that were submitted to NCBI outside of whole-genome submissions This provides several advantages Theabundance of entries for selected generaspecies in the pub-lic databases (and thus also in the previous ClusterBlastdatabase) is strongly skewed towards clinically or industri-ally relevant organisms There are for example more than15 000 assemblies for Escherichia coli deposited at NCBIFor the antiSMASH database a sequence-based dereplica-tion workflow was established (14) that reduced the num-ber of redundant entries with very high sequence similarityThus the updated ClusterBlast database contains fewer en-tries than the previous release despite the increase in pub-licly available sequence data This decrease has resulted inreduced computation times while simultaneously providingmore relevant hits Furthermore as the entries of the Clus-terBlast database are directly related to the BGCs in the an-tiSMASH database a link to the respective BGC is now in-cluded for all ClusterBlast hits promptly directing the userto the detailed report of the similar gene clusters
New lsquoregionrsquo concept
In previous versions antiSMASH referred to all co-locatedhybrid and independent BGCs with the single label lsquoclusterrsquoIn many cases this led to confusing structure predictionswhen distinct BGCs are encoded side-by-side For examplemany Streptomyces plasmids exist for which all BGCs lie soclose to each other that all were joined into a single largelsquoclusterrsquo In order to better distinguish the different biolog-ical options that lead to BGCs antiSMASH 5 introducessome new terminology
The definitions now used in antiSMASH 5 areCore The minimum area containing one or more genes
that code for enzymes for a single BGC type that are de-tected by the manually curated detection rules These genesdo not have to be contiguous but can be within a certaincutoff distance as defined by the detection rule for the BGCtype in question
Neighbourhood Distance up- and downstream of thecluster core that is used to find tailoring genesenzymesthe neighbourhood distances for the individual biosynthetictypes were empirically determined and defined in the detec-tion rules
Protocluster Contains core + neighbourhoods at bothsides of the core each protocluster always will have one sin-gle product type (for example NRPS) Protoclusters mayoverlap partially or completely with other protoclusters Inthe result webpage protoclusters are displayed as boxesabove the gene arrows The cores are shown as solid colourboxes the neighbourhoods are the half-transparent areasaround the cores
Candidate cluster Contains one or more protoclustersthe candidate clusters are defined as described below Thesedefinitions better allow modelling of hybrid clusters suchas PKSNRPS hybrids which combine two or more differ-ent biosynthetic classes (as identified in the detection rules)or cases where one class is used to biosynthesize a precur-sor for a second class An example of the latter is found inglycopeptide biosynthesis where one of the amino acids issynthesized by a type III PKS which is then incorporatedinto the product by a NRPS Candidate clusters may overlap
partially or completely with other candidate clusters In theresult webpage candidate clusters are shown as boxes abovethe protoclusters
Region Contains one or more candidate clusters The re-gions in antiSMASH 5 correspond to the entities calledlsquoclustersrsquo in antiSMASH 1 ndash 4 and now constitute what isdisplayed on a page of the results webpage Sometimes a re-gion will contain multiple mutually exclusive candidate clus-ters in such cases comparative genomic analysis andor ex-perimental work is required to assess which of these candi-date clusters constitute actual BGCs Regions will not over-lap with each other At least one of the contained candidateclusters will cover the full length of the region
There are four kinds of candidate clusters chemical hy-brids interleaved neighbouring and single
Chemical hybrid candidate clusters contain at least twoprotoclusters that share at least one gene that codes for en-zymes of two or more separate BGC types (eg a single genecoding for type I PKS and NRPS modules) (Figure 1A) Anexample of this type are hybrid PKSNRPSs Please notethat this type of candidate cluster can also include protoclus-ters within that shared range that do not share a coding se-quence provided that they are completely contained withinthe candidate cluster
Interleaved candidate clusters contain protoclusters thatdo not share cluster-type-defining coding sequences buttheir core locations overlap (Figure 1B)
Neighbouring candidate clusters contain protoclusterswhich transitively overlap in their neighbourhoods (Figure1C)
Single candidate clusters (Figure 1D) exist for consistencyof access they contain only a single protocluster Note thatindividual protoclusters can be contained by more than onecandidate cluster (typically a neighbouring candidate clusterand one of single interleaved or chemical hybrid)
Each candidate cluster assignment is transitive for ex-ample if a protocluster would form a chemical hybrid witheach of two neighbouring protoclusters but these neigh-bours would not form a chemical hybrid on their own allthree together will still form a chemical hybrid candidatecluster
Improved user interface
A central aim of antiSMASH is to provide very detailedand specific information via an easy to use and under-stand user interface (UI) The UI remained principally un-changed from the initial release of antiSMASH in 2011 de-spite the increased functionality added with each new ver-sion In this version we have modernized the UI using up-dated web technologies that allow a better structuring ofthe result-content of the antiSMASH results pages For re-designing the UI it was important that the reliable andwell-established look-and-feel was conserved while also re-taining the ability to download the whole web-based resultsfolder and to display it locally in a variety of web-browsers
We and others (such as (40)) have realized that anti-SMASH results using the heuristic ClusterFinder algorithm(41) were more often than not wrongly interpreted At thesame time ClusterFinder contributed significantly to thecomputational workload For these reasons we decided to
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W84 Nucleic Acids Research 2019 Vol 47 Web Server issue
Figure 1 Candidate cluster types 1234 Greyyellow gene involved in protocluster AB (A) Chemical Hybrids Since cluster type A and cluster typeB share a CDS that defines those protoclusters they are classified as lsquochemical hybridrsquo (B) Interleaved Since none of the protoclusters share any definingCDS with any other protocluster it is not annotated as a chemical hybrid even though the biosynthetic product may or may not be The two protoclustersform an interleaved candidate clusters since the core of A overlaps with the core of B (C) Neighbouring Neighbouring candidate clusters are defined if theneighbourhoods of two protoclusters but not their cores overlap (D) Singles If protoclusters donrsquot have any overlaprelation with other protoclusters theterm single candidate cluster is assigned
remove this feature from the pubic antiSMASH web serverIt is of course still included in the download version of an-tiSMASH and can be enabled via the command line
In the Regions overview section (Figure 2) a graph-ical overview showing the location of the identified re-gions on the chromosomeplasmidscaffoldscontig is dis-played In the detailed view regions that are located oncontig-borders are now clearly labelled This often indi-cates that parts of the BGC are missing or that several sec-tions of a BGC are located on different contigs and aretherefore reported individually (for a more detailed discus-sion on this phenomenon please see (42)) For the firsttime antiSMASH 5 now offers interactive browsing of the
BGCs including selection of lsquofunctionalrsquo units ie core en-zymes transporters etc zooming to individual genes orregionscandidate clustersprotoclusters Details of the se-lection are now provided in side panels instead of pop-upwindows using a hierarchical view of the analysis sum-maries (which can be expanded by clicking lsquo+rsquo) to provideadditional details For the display of the PKSNRPS do-main organization the user now can choose whether tolimit the shown domains to the currently selected genes orjust display the results of the selected gene(s)enzyme(s)Furthermore the information is now organized in lsquotabsrsquothat do not require scrolling down along an often very longresults page
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W85
Figure 2 Screenshot of the antiSMASH 5 user interface (example NCBI-acc Y16952 balhimycin BGC) The new region overview now allowspanningzooming The candidate cluster and protocluster boxes are explained in the lsquonew region conceptrsquo section above Information about the currentlyselected gene are displayed at the right lsquoGene detailsrsquo panel For PKS or NRPS regions the detailed domain annotation is displayed by pressing the tabsusers can select the domain overview (shown) or the ClusterBlast KnownClusterBlast or SubClusterBlast results At the right the structure predictionand details of specificity predictions are displayed upon selecting the plus sign
CODE REFACTORING AND SPEED-UP
Large parts of the pre-antiSMASH 5 code base were stillderived from antiSMASH version 1 which was released in2011 In order to maintain future compatibility the anti-SMASH code base had to be migrated from python 27which will reach end-of-life in 2020 to the current versions35ndash37 As this transition required significant modificationto the antiSMASH code we decided to take this as a chanceto completely rewrite the software with a special considera-tion on runtime code stability and code maintainability Aunit test and integration test framework was implementedthat covers most parts of the antiSMASH 5 code allowing amuch easier debugging andndashndashmost importantlyndashndashextensionof the code while at the same time ensuring that new fea-tures do not negatively impact the results of existing mod-ules For some of the externally contributed modules (Sand-puma trans-AT PKS comparisons terpene PrediCAT) ourcontributors are currently preparing updated and compli-ant versions which will be added to antiSMASH 5 in mi-nor releases once they are finished and tested Like the ear-lier antiSMASH versions antiSMASH 5 provides the anal-ysis results in an interactive webpage and richly annotatedGenBank-format files for the whole genome and individualclusters As a new feature in version 5 all data are also avail-able as a computer readable JSON container which allowsthird party tools to easily process antiSMASH annotationsThis JSON output has superseded some other output typessuch as BioSynML and XLS
In addition to the advantages mentioned above the coderefactoring and cleanup has also led to a significant speed
increase of the new version by a factor of 4-11times (dependingon genome and selected options) instead of waiting timesof several hours antiSMASH results are now usually deliv-ered within 30ndash40 min after the start of the job for a typicalsubmission at the public web server
CONCLUSIONS AND FUTURE PERSPECTIVES
With the help of software like antiSMASH genome miningfor specialized metabolites has established itself as a com-plementary approach for the identification of novel metabo-lites which is routinely used within the natural products re-search community and increasingly applied in related fieldssuch as metagenomics environmental biology or metabolicengineering With the improvements to the antiSMASHuser interface and performance we keep pace with these de-velopments Furthermore the complete refactoring of theantiSMASH 5 code base will allow us to increasingly useantiSMASH as a tool that provides analysis data on whichother software can perform additional analyses
DATA AVAILABILITY
antiSMASH is available from httpsantismashsecondarymetabolitesorg (bacterial version) orhttpsfungismashsecondarymetabolitesorg (fun-gal version) The antiSMASH documentationincluding a PDF user guide is available fromhttpsdocsantismashsecondarymetabolitesorg Thesewebsites are free and open to all users and there is no loginrequirement The antiSMASH source code is available from
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W86 Nucleic Acids Research 2019 Vol 47 Web Server issue
httpsgithubcomantismashantismash antiSMASH isalso available via Docker
ACKNOWLEDGEMENTS
We thank Justin JJ van der Hooft for critical comments onthe manuscript and providing documentation and EmiliaPalazzotto and Tetiana Gren for helpful discussions anduser testing of the new features
FUNDING
Novo Nordisk Foundation [NNF10CC1016517 toSYLTW NNF16OC0021746 to TW] Center forMicrobial Secondary Metabolites (CeMiSt) Danish Na-tional Research Foundation [DNRF137 to TW] Reinholdand Maria Teufel Foundation (to KS) Funding for openaccess charge The Novo Nordisk FoundationConflict of interest statement None declared
REFERENCES1 NewmanDJ and CraggGM (2016) Natural products as sources of
new drugs from 1981 to 2014 J Nat Prod 79 629ndash6612 van der MeijA WorsleySF HutchingsMI and van WezelGP
(2017) Chemical ecology of antibiotic production by actinomycetesFEMS Microbiol Rev 41 392ndash416
3 ZiemertN AlanjaryM and WeberT (2016) The evolution ofgenome mining in microbes - a review Nat Prod Rep 33 988ndash1005
4 WeberT RauschC LopezP HoofI GaykovaV HusonDHand WohllebenW (2009) CLUSEAN a computer-based frameworkfor the automated analysis of bacterial secondary metabolitebiosynthetic gene clusters J Biotechnol 140 13ndash17
5 SkinniderMA MerwinNJ JohnstonCW and MagarveyNA(2017) PRISM 3 expanded prediction of natural product chemicalstructures from microbial genomes Nucleic Acids Res 45W49ndashW54
6 MedemaMH BlinK CimermancicP de JagerV ZakrzewskiPFischbachMA WeberT TakanoE and BreitlingR (2011)antiSMASH rapid identification annotation and analysis ofsecondary metabolite biosynthesis gene clusters in bacterial andfungal genome sequences Nucleic Acids Res 39 W339ndashW346
7 BlinK MedemaMH KazempourD FischbachMABreitlingR TakanoE and WeberT (2013) antiSMASH 20ndashndashaversatile platform for genome mining of secondary metaboliteproducers Nucleic Acids Res 41 W204ndashW212
8 WeberT BlinK DuddelaS KrugD KimHU BruccoleriRLeeSY FischbachMA MullerR WohllebenW et al (2015)antiSMASH 30ndashndasha comprehensive resource for the genome miningof biosynthetic gene clusters Nucleic Acids Res 43 W237ndashW243
9 BlinK WolfT ChevretteMG LuX SchwalenCJKautsarSA Suarez DuranHG de Los SantosELC KimHUNaveM et al (2017) antiSMASH 40-improvements in chemistryprediction and gene cluster boundary identification Nucleic AcidsRes 45 W36ndashW41
10 KautsarSA Suarez DuranHG BlinK OsbournA andMedemaMH (2017) plantiSMASH automated identificationannotation and expression analysis of plant biosynthetic geneclusters Nucleic Acids Res 45 W55ndashW63
11 BlinK KazempourD WohllebenW and WeberT (2014)Improved lanthipeptide detection and prediction for antiSMASHPLoS One 9 e89420
12 VillebroR ShawS BlinK and WeberT (2019) Sequence-basedclassification of type II polyketide synthase biosynthetic gene clustersfor antiSMASH J Ind Microbiol Biotechnol 46 469ndash475
13 BlinK MedemaMH KottmannR LeeSY and WeberT (2017)The antiSMASH database a comprehensive database of microbialsecondary metabolite biosynthetic gene clusters Nucleic Acids Res45 D555ndashD559
14 BlinK Pascal AndreuV de Los SantosELC Del CarratoreFLeeSY MedemaMH and WeberT (2019) The antiSMASHdatabase version 2 a comprehensive resource on secondarymetabolite biosynthetic gene clusters Nucleic Acids Res 47D625ndashD630
15 MedemaMH PaalvastY NguyenDD MelnikADorresteinPC TakanoE and BreitlingR (2014) Pep2Pathautomated mass spectrometry-guided genome mining of peptidicnatural products PLoS Comput Biol 10 e1003822
16 AlanjaryM KronmillerB AdamekM BlinK WeberTHusonD PhilmusB and ZiemertN (2017) The AntibioticResistant Target Seeker (ARTS) an exploration engine for antibioticcluster prioritization and novel drug target discovery Nucleic AcidsRes 45 W42ndashW48
17 BlinK PedersenLE WeberT and LeeSY (2016) CRISPy-webAn online resource to design sgRNAs for CRISPR applicationsSynth Syst Biotechnol 1 118ndash121
18 ShirleyWA KelleyBP PotierY KoschwanezJH BruccoleriRand TarselliM (2018) Unzipping natural products improved naturalproduct structure predictions by ensemble modeling and fingerprintmatching ChemRxiv doi httpdoi1026434chemrxiv6863864 26July 2018 preprint not peer reviewed
19 Navarro-MunozJ Selem-MojicaN MullowneyM KautsarSTryonJ ParkinsonE De Los SantosE YeongMCruz-MoralesP AbubuckerS et al (2018) A computationalframework for systematic exploration of biosynthetic diversity fromlarge-scale genomic data bioRxiv doi httpdoi101101445270 17October 2018 preprint not peer reviewed
20 FinnRD CoggillP EberhardtRY EddySR MistryJMitchellAL PotterSC PuntaM QureshiMSangrador-VegasA et al (2016) The Pfam protein families databasetowards a more sustainable future Nucleic Acids Res 44D279ndashD285
21 LetunicI and BorkP (2018) 20 years of the SMART protein domainannotation resource Nucleic Acids Res 46 D493ndashD496
22 de JongA van HeelAJ KokJ and KuipersOP (2010) BAGEL2mining for bacteriocins in genomic data Nucleic Acids Res 38W647ndashW651
23 YadavG GokhaleRS and MohantyD (2009) Towards predictionof metabolic products of polyketide synthases an in silico analysisPLoS Comput Biol 5 e1000351
24 CraigJW CherryMA and BradySF (2011) Long-chain N-acylamino acid synthases are linked to the putativePEP-CTERMexosortase protein-sorting system in Gram-negativebacteria J Bacteriol 193 5707ndash5715
25 RobinsonSL ChristensonJK and WackettLP (2018)Biosynthesis and chemical diversity of -lactone natural productsNat Prod Rep 36 458ndash475
26 AgarwalV BlantonJM PodellS TatonA SchornMABuschJ LinZ SchmidtEW JensenPR PaulVJ et al (2017)Metagenomic discovery of polybrominated diphenyl etherbiosynthesis by marine sponges Nat Chem Biol 13 537ndash543
27 SosioM GaspariE IorioM PessinaS MedemaMHBernasconiA SimoneM MaffioliSI EbrightRH andDonadioS (2018) Analysis of the Pseudouridimycin biosyntheticpathway provides Insights into the formation of C-nucleosideantibiotics Cell Chem Biol 25 540ndash549
28 BauerJS GhequireMGK NettM JostenM SahlH-G DeMotR and GrossH (2015) Biosynthetic origin of the antibioticpseudopyronines A and B in Pseudomonas putida BW11M1Chembiochem 16 2491ndash2497
29 LuoH Hallen-AdamsHE Scott-CraigJS and WaltonJD (2012)Ribosomal biosynthesis of -amanitin in Galerina marginata FungalGenet Biol 49 123ndash129
30 NaganoN UmemuraM IzumikawaM KawanoJ IshiiTKikuchiM TomiiK KumagaiT YoshimiA MachidaM et al(2016) Class of cyclic ribosomal peptide synthetic genes infilamentous fungi Fungal Genet Biol 86 58ndash70
31 DingW LiuW-Q JiaY LiY van der DonkWA and ZhangQ(2016) Biosynthetic investigation of phomopsins reveals a widespreadpathway for ribosomal natural products in Ascomycetes Proc NatlAcad Sci USA 113 3521ndash3526
32 BushinLB ClarkKA PelczerI and SeyedsayamdostMR (2018)Charting an unexplored streptococcal biosynthetic landscape reveals
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W87
a unique peptide cyclization motif J Am Chem Soc 14017674ndash17684
33 CarusoA BushinLB ClarkKA MartinieRJ andSeyedsayamdostMR (2019) A radical approach to enzymatic-Thioether bond formation J Am Chem Soc 141 990ndash997
34 GibsonMK ForsbergKJ and DantasG (2015) Improvedannotation of antibiotic resistance determinants reveals microbialresistomes cluster by ecology ISME J 9 207ndash216
35 YiG SzeSH and ThonMR (2007) Identifying clusters offunctionally related genes in genomes Bioinformatics 23 1053ndash1060
36 InglisDO BinkleyJ SkrzypekMS ArnaudMBCerqueiraGC ShahP WymoreF WortmanJR and SherlockG(2013) Comprehensive annotation of secondary metabolitebiosynthetic genes and gene clusters of Aspergillus nidulans Afumigatus A niger and A oryzae BMC Microbiol 13 91
37 The Gene Ontology Consortium (2016) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338
38 MedemaMH KottmannR YilmazP CummingsMBigginsJB BlinK de BruijnI ChooiYH ClaesenJ
CoatesRC et al (2015) Minimum information about a biosyntheticgene cluster Nat Chem Biol 11 625ndash631
39 MedemaMH TakanoE and BreitlingR (2013) Detectingsequence homology at the gene cluster level with MultiGeneBlastMol Biol Evol 30 1218ndash1223
40 BaltzRH (2018) Natural product drug discovery in the genomic erarealities conjectures misconceptions and opportunities J IndMicrobiol Biotechnol 46 281ndash299
41 CimermancicP MedemaMH ClaesenJ KuritaK WielandBrownLC MavrommatisK PatiA GodfreyPA KoehrsenMClardyJ et al (2014) Insights into secondary metabolism from aglobal analysis of prokaryotic biosynthetic gene clusters Cell 158412ndash421
42 BlinK KimHU MedemaMH and WeberT (2017) Recentdevelopment of antiSMASH and other computational approaches tomine secondary metabolite biosynthetic gene clusters BriefBioinform doi101093bibbbx146
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W84 Nucleic Acids Research 2019 Vol 47 Web Server issue
Figure 1 Candidate cluster types 1234 Greyyellow gene involved in protocluster AB (A) Chemical Hybrids Since cluster type A and cluster typeB share a CDS that defines those protoclusters they are classified as lsquochemical hybridrsquo (B) Interleaved Since none of the protoclusters share any definingCDS with any other protocluster it is not annotated as a chemical hybrid even though the biosynthetic product may or may not be The two protoclustersform an interleaved candidate clusters since the core of A overlaps with the core of B (C) Neighbouring Neighbouring candidate clusters are defined if theneighbourhoods of two protoclusters but not their cores overlap (D) Singles If protoclusters donrsquot have any overlaprelation with other protoclusters theterm single candidate cluster is assigned
remove this feature from the pubic antiSMASH web serverIt is of course still included in the download version of an-tiSMASH and can be enabled via the command line
In the Regions overview section (Figure 2) a graph-ical overview showing the location of the identified re-gions on the chromosomeplasmidscaffoldscontig is dis-played In the detailed view regions that are located oncontig-borders are now clearly labelled This often indi-cates that parts of the BGC are missing or that several sec-tions of a BGC are located on different contigs and aretherefore reported individually (for a more detailed discus-sion on this phenomenon please see (42)) For the firsttime antiSMASH 5 now offers interactive browsing of the
BGCs including selection of lsquofunctionalrsquo units ie core en-zymes transporters etc zooming to individual genes orregionscandidate clustersprotoclusters Details of the se-lection are now provided in side panels instead of pop-upwindows using a hierarchical view of the analysis sum-maries (which can be expanded by clicking lsquo+rsquo) to provideadditional details For the display of the PKSNRPS do-main organization the user now can choose whether tolimit the shown domains to the currently selected genes orjust display the results of the selected gene(s)enzyme(s)Furthermore the information is now organized in lsquotabsrsquothat do not require scrolling down along an often very longresults page
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W85
Figure 2 Screenshot of the antiSMASH 5 user interface (example NCBI-acc Y16952 balhimycin BGC) The new region overview now allowspanningzooming The candidate cluster and protocluster boxes are explained in the lsquonew region conceptrsquo section above Information about the currentlyselected gene are displayed at the right lsquoGene detailsrsquo panel For PKS or NRPS regions the detailed domain annotation is displayed by pressing the tabsusers can select the domain overview (shown) or the ClusterBlast KnownClusterBlast or SubClusterBlast results At the right the structure predictionand details of specificity predictions are displayed upon selecting the plus sign
CODE REFACTORING AND SPEED-UP
Large parts of the pre-antiSMASH 5 code base were stillderived from antiSMASH version 1 which was released in2011 In order to maintain future compatibility the anti-SMASH code base had to be migrated from python 27which will reach end-of-life in 2020 to the current versions35ndash37 As this transition required significant modificationto the antiSMASH code we decided to take this as a chanceto completely rewrite the software with a special considera-tion on runtime code stability and code maintainability Aunit test and integration test framework was implementedthat covers most parts of the antiSMASH 5 code allowing amuch easier debugging andndashndashmost importantlyndashndashextensionof the code while at the same time ensuring that new fea-tures do not negatively impact the results of existing mod-ules For some of the externally contributed modules (Sand-puma trans-AT PKS comparisons terpene PrediCAT) ourcontributors are currently preparing updated and compli-ant versions which will be added to antiSMASH 5 in mi-nor releases once they are finished and tested Like the ear-lier antiSMASH versions antiSMASH 5 provides the anal-ysis results in an interactive webpage and richly annotatedGenBank-format files for the whole genome and individualclusters As a new feature in version 5 all data are also avail-able as a computer readable JSON container which allowsthird party tools to easily process antiSMASH annotationsThis JSON output has superseded some other output typessuch as BioSynML and XLS
In addition to the advantages mentioned above the coderefactoring and cleanup has also led to a significant speed
increase of the new version by a factor of 4-11times (dependingon genome and selected options) instead of waiting timesof several hours antiSMASH results are now usually deliv-ered within 30ndash40 min after the start of the job for a typicalsubmission at the public web server
CONCLUSIONS AND FUTURE PERSPECTIVES
With the help of software like antiSMASH genome miningfor specialized metabolites has established itself as a com-plementary approach for the identification of novel metabo-lites which is routinely used within the natural products re-search community and increasingly applied in related fieldssuch as metagenomics environmental biology or metabolicengineering With the improvements to the antiSMASHuser interface and performance we keep pace with these de-velopments Furthermore the complete refactoring of theantiSMASH 5 code base will allow us to increasingly useantiSMASH as a tool that provides analysis data on whichother software can perform additional analyses
DATA AVAILABILITY
antiSMASH is available from httpsantismashsecondarymetabolitesorg (bacterial version) orhttpsfungismashsecondarymetabolitesorg (fun-gal version) The antiSMASH documentationincluding a PDF user guide is available fromhttpsdocsantismashsecondarymetabolitesorg Thesewebsites are free and open to all users and there is no loginrequirement The antiSMASH source code is available from
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W86 Nucleic Acids Research 2019 Vol 47 Web Server issue
httpsgithubcomantismashantismash antiSMASH isalso available via Docker
ACKNOWLEDGEMENTS
We thank Justin JJ van der Hooft for critical comments onthe manuscript and providing documentation and EmiliaPalazzotto and Tetiana Gren for helpful discussions anduser testing of the new features
FUNDING
Novo Nordisk Foundation [NNF10CC1016517 toSYLTW NNF16OC0021746 to TW] Center forMicrobial Secondary Metabolites (CeMiSt) Danish Na-tional Research Foundation [DNRF137 to TW] Reinholdand Maria Teufel Foundation (to KS) Funding for openaccess charge The Novo Nordisk FoundationConflict of interest statement None declared
REFERENCES1 NewmanDJ and CraggGM (2016) Natural products as sources of
new drugs from 1981 to 2014 J Nat Prod 79 629ndash6612 van der MeijA WorsleySF HutchingsMI and van WezelGP
(2017) Chemical ecology of antibiotic production by actinomycetesFEMS Microbiol Rev 41 392ndash416
3 ZiemertN AlanjaryM and WeberT (2016) The evolution ofgenome mining in microbes - a review Nat Prod Rep 33 988ndash1005
4 WeberT RauschC LopezP HoofI GaykovaV HusonDHand WohllebenW (2009) CLUSEAN a computer-based frameworkfor the automated analysis of bacterial secondary metabolitebiosynthetic gene clusters J Biotechnol 140 13ndash17
5 SkinniderMA MerwinNJ JohnstonCW and MagarveyNA(2017) PRISM 3 expanded prediction of natural product chemicalstructures from microbial genomes Nucleic Acids Res 45W49ndashW54
6 MedemaMH BlinK CimermancicP de JagerV ZakrzewskiPFischbachMA WeberT TakanoE and BreitlingR (2011)antiSMASH rapid identification annotation and analysis ofsecondary metabolite biosynthesis gene clusters in bacterial andfungal genome sequences Nucleic Acids Res 39 W339ndashW346
7 BlinK MedemaMH KazempourD FischbachMABreitlingR TakanoE and WeberT (2013) antiSMASH 20ndashndashaversatile platform for genome mining of secondary metaboliteproducers Nucleic Acids Res 41 W204ndashW212
8 WeberT BlinK DuddelaS KrugD KimHU BruccoleriRLeeSY FischbachMA MullerR WohllebenW et al (2015)antiSMASH 30ndashndasha comprehensive resource for the genome miningof biosynthetic gene clusters Nucleic Acids Res 43 W237ndashW243
9 BlinK WolfT ChevretteMG LuX SchwalenCJKautsarSA Suarez DuranHG de Los SantosELC KimHUNaveM et al (2017) antiSMASH 40-improvements in chemistryprediction and gene cluster boundary identification Nucleic AcidsRes 45 W36ndashW41
10 KautsarSA Suarez DuranHG BlinK OsbournA andMedemaMH (2017) plantiSMASH automated identificationannotation and expression analysis of plant biosynthetic geneclusters Nucleic Acids Res 45 W55ndashW63
11 BlinK KazempourD WohllebenW and WeberT (2014)Improved lanthipeptide detection and prediction for antiSMASHPLoS One 9 e89420
12 VillebroR ShawS BlinK and WeberT (2019) Sequence-basedclassification of type II polyketide synthase biosynthetic gene clustersfor antiSMASH J Ind Microbiol Biotechnol 46 469ndash475
13 BlinK MedemaMH KottmannR LeeSY and WeberT (2017)The antiSMASH database a comprehensive database of microbialsecondary metabolite biosynthetic gene clusters Nucleic Acids Res45 D555ndashD559
14 BlinK Pascal AndreuV de Los SantosELC Del CarratoreFLeeSY MedemaMH and WeberT (2019) The antiSMASHdatabase version 2 a comprehensive resource on secondarymetabolite biosynthetic gene clusters Nucleic Acids Res 47D625ndashD630
15 MedemaMH PaalvastY NguyenDD MelnikADorresteinPC TakanoE and BreitlingR (2014) Pep2Pathautomated mass spectrometry-guided genome mining of peptidicnatural products PLoS Comput Biol 10 e1003822
16 AlanjaryM KronmillerB AdamekM BlinK WeberTHusonD PhilmusB and ZiemertN (2017) The AntibioticResistant Target Seeker (ARTS) an exploration engine for antibioticcluster prioritization and novel drug target discovery Nucleic AcidsRes 45 W42ndashW48
17 BlinK PedersenLE WeberT and LeeSY (2016) CRISPy-webAn online resource to design sgRNAs for CRISPR applicationsSynth Syst Biotechnol 1 118ndash121
18 ShirleyWA KelleyBP PotierY KoschwanezJH BruccoleriRand TarselliM (2018) Unzipping natural products improved naturalproduct structure predictions by ensemble modeling and fingerprintmatching ChemRxiv doi httpdoi1026434chemrxiv6863864 26July 2018 preprint not peer reviewed
19 Navarro-MunozJ Selem-MojicaN MullowneyM KautsarSTryonJ ParkinsonE De Los SantosE YeongMCruz-MoralesP AbubuckerS et al (2018) A computationalframework for systematic exploration of biosynthetic diversity fromlarge-scale genomic data bioRxiv doi httpdoi101101445270 17October 2018 preprint not peer reviewed
20 FinnRD CoggillP EberhardtRY EddySR MistryJMitchellAL PotterSC PuntaM QureshiMSangrador-VegasA et al (2016) The Pfam protein families databasetowards a more sustainable future Nucleic Acids Res 44D279ndashD285
21 LetunicI and BorkP (2018) 20 years of the SMART protein domainannotation resource Nucleic Acids Res 46 D493ndashD496
22 de JongA van HeelAJ KokJ and KuipersOP (2010) BAGEL2mining for bacteriocins in genomic data Nucleic Acids Res 38W647ndashW651
23 YadavG GokhaleRS and MohantyD (2009) Towards predictionof metabolic products of polyketide synthases an in silico analysisPLoS Comput Biol 5 e1000351
24 CraigJW CherryMA and BradySF (2011) Long-chain N-acylamino acid synthases are linked to the putativePEP-CTERMexosortase protein-sorting system in Gram-negativebacteria J Bacteriol 193 5707ndash5715
25 RobinsonSL ChristensonJK and WackettLP (2018)Biosynthesis and chemical diversity of -lactone natural productsNat Prod Rep 36 458ndash475
26 AgarwalV BlantonJM PodellS TatonA SchornMABuschJ LinZ SchmidtEW JensenPR PaulVJ et al (2017)Metagenomic discovery of polybrominated diphenyl etherbiosynthesis by marine sponges Nat Chem Biol 13 537ndash543
27 SosioM GaspariE IorioM PessinaS MedemaMHBernasconiA SimoneM MaffioliSI EbrightRH andDonadioS (2018) Analysis of the Pseudouridimycin biosyntheticpathway provides Insights into the formation of C-nucleosideantibiotics Cell Chem Biol 25 540ndash549
28 BauerJS GhequireMGK NettM JostenM SahlH-G DeMotR and GrossH (2015) Biosynthetic origin of the antibioticpseudopyronines A and B in Pseudomonas putida BW11M1Chembiochem 16 2491ndash2497
29 LuoH Hallen-AdamsHE Scott-CraigJS and WaltonJD (2012)Ribosomal biosynthesis of -amanitin in Galerina marginata FungalGenet Biol 49 123ndash129
30 NaganoN UmemuraM IzumikawaM KawanoJ IshiiTKikuchiM TomiiK KumagaiT YoshimiA MachidaM et al(2016) Class of cyclic ribosomal peptide synthetic genes infilamentous fungi Fungal Genet Biol 86 58ndash70
31 DingW LiuW-Q JiaY LiY van der DonkWA and ZhangQ(2016) Biosynthetic investigation of phomopsins reveals a widespreadpathway for ribosomal natural products in Ascomycetes Proc NatlAcad Sci USA 113 3521ndash3526
32 BushinLB ClarkKA PelczerI and SeyedsayamdostMR (2018)Charting an unexplored streptococcal biosynthetic landscape reveals
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W87
a unique peptide cyclization motif J Am Chem Soc 14017674ndash17684
33 CarusoA BushinLB ClarkKA MartinieRJ andSeyedsayamdostMR (2019) A radical approach to enzymatic-Thioether bond formation J Am Chem Soc 141 990ndash997
34 GibsonMK ForsbergKJ and DantasG (2015) Improvedannotation of antibiotic resistance determinants reveals microbialresistomes cluster by ecology ISME J 9 207ndash216
35 YiG SzeSH and ThonMR (2007) Identifying clusters offunctionally related genes in genomes Bioinformatics 23 1053ndash1060
36 InglisDO BinkleyJ SkrzypekMS ArnaudMBCerqueiraGC ShahP WymoreF WortmanJR and SherlockG(2013) Comprehensive annotation of secondary metabolitebiosynthetic genes and gene clusters of Aspergillus nidulans Afumigatus A niger and A oryzae BMC Microbiol 13 91
37 The Gene Ontology Consortium (2016) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338
38 MedemaMH KottmannR YilmazP CummingsMBigginsJB BlinK de BruijnI ChooiYH ClaesenJ
CoatesRC et al (2015) Minimum information about a biosyntheticgene cluster Nat Chem Biol 11 625ndash631
39 MedemaMH TakanoE and BreitlingR (2013) Detectingsequence homology at the gene cluster level with MultiGeneBlastMol Biol Evol 30 1218ndash1223
40 BaltzRH (2018) Natural product drug discovery in the genomic erarealities conjectures misconceptions and opportunities J IndMicrobiol Biotechnol 46 281ndash299
41 CimermancicP MedemaMH ClaesenJ KuritaK WielandBrownLC MavrommatisK PatiA GodfreyPA KoehrsenMClardyJ et al (2014) Insights into secondary metabolism from aglobal analysis of prokaryotic biosynthetic gene clusters Cell 158412ndash421
42 BlinK KimHU MedemaMH and WeberT (2017) Recentdevelopment of antiSMASH and other computational approaches tomine secondary metabolite biosynthetic gene clusters BriefBioinform doi101093bibbbx146
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W85
Figure 2 Screenshot of the antiSMASH 5 user interface (example NCBI-acc Y16952 balhimycin BGC) The new region overview now allowspanningzooming The candidate cluster and protocluster boxes are explained in the lsquonew region conceptrsquo section above Information about the currentlyselected gene are displayed at the right lsquoGene detailsrsquo panel For PKS or NRPS regions the detailed domain annotation is displayed by pressing the tabsusers can select the domain overview (shown) or the ClusterBlast KnownClusterBlast or SubClusterBlast results At the right the structure predictionand details of specificity predictions are displayed upon selecting the plus sign
CODE REFACTORING AND SPEED-UP
Large parts of the pre-antiSMASH 5 code base were stillderived from antiSMASH version 1 which was released in2011 In order to maintain future compatibility the anti-SMASH code base had to be migrated from python 27which will reach end-of-life in 2020 to the current versions35ndash37 As this transition required significant modificationto the antiSMASH code we decided to take this as a chanceto completely rewrite the software with a special considera-tion on runtime code stability and code maintainability Aunit test and integration test framework was implementedthat covers most parts of the antiSMASH 5 code allowing amuch easier debugging andndashndashmost importantlyndashndashextensionof the code while at the same time ensuring that new fea-tures do not negatively impact the results of existing mod-ules For some of the externally contributed modules (Sand-puma trans-AT PKS comparisons terpene PrediCAT) ourcontributors are currently preparing updated and compli-ant versions which will be added to antiSMASH 5 in mi-nor releases once they are finished and tested Like the ear-lier antiSMASH versions antiSMASH 5 provides the anal-ysis results in an interactive webpage and richly annotatedGenBank-format files for the whole genome and individualclusters As a new feature in version 5 all data are also avail-able as a computer readable JSON container which allowsthird party tools to easily process antiSMASH annotationsThis JSON output has superseded some other output typessuch as BioSynML and XLS
In addition to the advantages mentioned above the coderefactoring and cleanup has also led to a significant speed
increase of the new version by a factor of 4-11times (dependingon genome and selected options) instead of waiting timesof several hours antiSMASH results are now usually deliv-ered within 30ndash40 min after the start of the job for a typicalsubmission at the public web server
CONCLUSIONS AND FUTURE PERSPECTIVES
With the help of software like antiSMASH genome miningfor specialized metabolites has established itself as a com-plementary approach for the identification of novel metabo-lites which is routinely used within the natural products re-search community and increasingly applied in related fieldssuch as metagenomics environmental biology or metabolicengineering With the improvements to the antiSMASHuser interface and performance we keep pace with these de-velopments Furthermore the complete refactoring of theantiSMASH 5 code base will allow us to increasingly useantiSMASH as a tool that provides analysis data on whichother software can perform additional analyses
DATA AVAILABILITY
antiSMASH is available from httpsantismashsecondarymetabolitesorg (bacterial version) orhttpsfungismashsecondarymetabolitesorg (fun-gal version) The antiSMASH documentationincluding a PDF user guide is available fromhttpsdocsantismashsecondarymetabolitesorg Thesewebsites are free and open to all users and there is no loginrequirement The antiSMASH source code is available from
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W86 Nucleic Acids Research 2019 Vol 47 Web Server issue
httpsgithubcomantismashantismash antiSMASH isalso available via Docker
ACKNOWLEDGEMENTS
We thank Justin JJ van der Hooft for critical comments onthe manuscript and providing documentation and EmiliaPalazzotto and Tetiana Gren for helpful discussions anduser testing of the new features
FUNDING
Novo Nordisk Foundation [NNF10CC1016517 toSYLTW NNF16OC0021746 to TW] Center forMicrobial Secondary Metabolites (CeMiSt) Danish Na-tional Research Foundation [DNRF137 to TW] Reinholdand Maria Teufel Foundation (to KS) Funding for openaccess charge The Novo Nordisk FoundationConflict of interest statement None declared
REFERENCES1 NewmanDJ and CraggGM (2016) Natural products as sources of
new drugs from 1981 to 2014 J Nat Prod 79 629ndash6612 van der MeijA WorsleySF HutchingsMI and van WezelGP
(2017) Chemical ecology of antibiotic production by actinomycetesFEMS Microbiol Rev 41 392ndash416
3 ZiemertN AlanjaryM and WeberT (2016) The evolution ofgenome mining in microbes - a review Nat Prod Rep 33 988ndash1005
4 WeberT RauschC LopezP HoofI GaykovaV HusonDHand WohllebenW (2009) CLUSEAN a computer-based frameworkfor the automated analysis of bacterial secondary metabolitebiosynthetic gene clusters J Biotechnol 140 13ndash17
5 SkinniderMA MerwinNJ JohnstonCW and MagarveyNA(2017) PRISM 3 expanded prediction of natural product chemicalstructures from microbial genomes Nucleic Acids Res 45W49ndashW54
6 MedemaMH BlinK CimermancicP de JagerV ZakrzewskiPFischbachMA WeberT TakanoE and BreitlingR (2011)antiSMASH rapid identification annotation and analysis ofsecondary metabolite biosynthesis gene clusters in bacterial andfungal genome sequences Nucleic Acids Res 39 W339ndashW346
7 BlinK MedemaMH KazempourD FischbachMABreitlingR TakanoE and WeberT (2013) antiSMASH 20ndashndashaversatile platform for genome mining of secondary metaboliteproducers Nucleic Acids Res 41 W204ndashW212
8 WeberT BlinK DuddelaS KrugD KimHU BruccoleriRLeeSY FischbachMA MullerR WohllebenW et al (2015)antiSMASH 30ndashndasha comprehensive resource for the genome miningof biosynthetic gene clusters Nucleic Acids Res 43 W237ndashW243
9 BlinK WolfT ChevretteMG LuX SchwalenCJKautsarSA Suarez DuranHG de Los SantosELC KimHUNaveM et al (2017) antiSMASH 40-improvements in chemistryprediction and gene cluster boundary identification Nucleic AcidsRes 45 W36ndashW41
10 KautsarSA Suarez DuranHG BlinK OsbournA andMedemaMH (2017) plantiSMASH automated identificationannotation and expression analysis of plant biosynthetic geneclusters Nucleic Acids Res 45 W55ndashW63
11 BlinK KazempourD WohllebenW and WeberT (2014)Improved lanthipeptide detection and prediction for antiSMASHPLoS One 9 e89420
12 VillebroR ShawS BlinK and WeberT (2019) Sequence-basedclassification of type II polyketide synthase biosynthetic gene clustersfor antiSMASH J Ind Microbiol Biotechnol 46 469ndash475
13 BlinK MedemaMH KottmannR LeeSY and WeberT (2017)The antiSMASH database a comprehensive database of microbialsecondary metabolite biosynthetic gene clusters Nucleic Acids Res45 D555ndashD559
14 BlinK Pascal AndreuV de Los SantosELC Del CarratoreFLeeSY MedemaMH and WeberT (2019) The antiSMASHdatabase version 2 a comprehensive resource on secondarymetabolite biosynthetic gene clusters Nucleic Acids Res 47D625ndashD630
15 MedemaMH PaalvastY NguyenDD MelnikADorresteinPC TakanoE and BreitlingR (2014) Pep2Pathautomated mass spectrometry-guided genome mining of peptidicnatural products PLoS Comput Biol 10 e1003822
16 AlanjaryM KronmillerB AdamekM BlinK WeberTHusonD PhilmusB and ZiemertN (2017) The AntibioticResistant Target Seeker (ARTS) an exploration engine for antibioticcluster prioritization and novel drug target discovery Nucleic AcidsRes 45 W42ndashW48
17 BlinK PedersenLE WeberT and LeeSY (2016) CRISPy-webAn online resource to design sgRNAs for CRISPR applicationsSynth Syst Biotechnol 1 118ndash121
18 ShirleyWA KelleyBP PotierY KoschwanezJH BruccoleriRand TarselliM (2018) Unzipping natural products improved naturalproduct structure predictions by ensemble modeling and fingerprintmatching ChemRxiv doi httpdoi1026434chemrxiv6863864 26July 2018 preprint not peer reviewed
19 Navarro-MunozJ Selem-MojicaN MullowneyM KautsarSTryonJ ParkinsonE De Los SantosE YeongMCruz-MoralesP AbubuckerS et al (2018) A computationalframework for systematic exploration of biosynthetic diversity fromlarge-scale genomic data bioRxiv doi httpdoi101101445270 17October 2018 preprint not peer reviewed
20 FinnRD CoggillP EberhardtRY EddySR MistryJMitchellAL PotterSC PuntaM QureshiMSangrador-VegasA et al (2016) The Pfam protein families databasetowards a more sustainable future Nucleic Acids Res 44D279ndashD285
21 LetunicI and BorkP (2018) 20 years of the SMART protein domainannotation resource Nucleic Acids Res 46 D493ndashD496
22 de JongA van HeelAJ KokJ and KuipersOP (2010) BAGEL2mining for bacteriocins in genomic data Nucleic Acids Res 38W647ndashW651
23 YadavG GokhaleRS and MohantyD (2009) Towards predictionof metabolic products of polyketide synthases an in silico analysisPLoS Comput Biol 5 e1000351
24 CraigJW CherryMA and BradySF (2011) Long-chain N-acylamino acid synthases are linked to the putativePEP-CTERMexosortase protein-sorting system in Gram-negativebacteria J Bacteriol 193 5707ndash5715
25 RobinsonSL ChristensonJK and WackettLP (2018)Biosynthesis and chemical diversity of -lactone natural productsNat Prod Rep 36 458ndash475
26 AgarwalV BlantonJM PodellS TatonA SchornMABuschJ LinZ SchmidtEW JensenPR PaulVJ et al (2017)Metagenomic discovery of polybrominated diphenyl etherbiosynthesis by marine sponges Nat Chem Biol 13 537ndash543
27 SosioM GaspariE IorioM PessinaS MedemaMHBernasconiA SimoneM MaffioliSI EbrightRH andDonadioS (2018) Analysis of the Pseudouridimycin biosyntheticpathway provides Insights into the formation of C-nucleosideantibiotics Cell Chem Biol 25 540ndash549
28 BauerJS GhequireMGK NettM JostenM SahlH-G DeMotR and GrossH (2015) Biosynthetic origin of the antibioticpseudopyronines A and B in Pseudomonas putida BW11M1Chembiochem 16 2491ndash2497
29 LuoH Hallen-AdamsHE Scott-CraigJS and WaltonJD (2012)Ribosomal biosynthesis of -amanitin in Galerina marginata FungalGenet Biol 49 123ndash129
30 NaganoN UmemuraM IzumikawaM KawanoJ IshiiTKikuchiM TomiiK KumagaiT YoshimiA MachidaM et al(2016) Class of cyclic ribosomal peptide synthetic genes infilamentous fungi Fungal Genet Biol 86 58ndash70
31 DingW LiuW-Q JiaY LiY van der DonkWA and ZhangQ(2016) Biosynthetic investigation of phomopsins reveals a widespreadpathway for ribosomal natural products in Ascomycetes Proc NatlAcad Sci USA 113 3521ndash3526
32 BushinLB ClarkKA PelczerI and SeyedsayamdostMR (2018)Charting an unexplored streptococcal biosynthetic landscape reveals
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W87
a unique peptide cyclization motif J Am Chem Soc 14017674ndash17684
33 CarusoA BushinLB ClarkKA MartinieRJ andSeyedsayamdostMR (2019) A radical approach to enzymatic-Thioether bond formation J Am Chem Soc 141 990ndash997
34 GibsonMK ForsbergKJ and DantasG (2015) Improvedannotation of antibiotic resistance determinants reveals microbialresistomes cluster by ecology ISME J 9 207ndash216
35 YiG SzeSH and ThonMR (2007) Identifying clusters offunctionally related genes in genomes Bioinformatics 23 1053ndash1060
36 InglisDO BinkleyJ SkrzypekMS ArnaudMBCerqueiraGC ShahP WymoreF WortmanJR and SherlockG(2013) Comprehensive annotation of secondary metabolitebiosynthetic genes and gene clusters of Aspergillus nidulans Afumigatus A niger and A oryzae BMC Microbiol 13 91
37 The Gene Ontology Consortium (2016) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338
38 MedemaMH KottmannR YilmazP CummingsMBigginsJB BlinK de BruijnI ChooiYH ClaesenJ
CoatesRC et al (2015) Minimum information about a biosyntheticgene cluster Nat Chem Biol 11 625ndash631
39 MedemaMH TakanoE and BreitlingR (2013) Detectingsequence homology at the gene cluster level with MultiGeneBlastMol Biol Evol 30 1218ndash1223
40 BaltzRH (2018) Natural product drug discovery in the genomic erarealities conjectures misconceptions and opportunities J IndMicrobiol Biotechnol 46 281ndash299
41 CimermancicP MedemaMH ClaesenJ KuritaK WielandBrownLC MavrommatisK PatiA GodfreyPA KoehrsenMClardyJ et al (2014) Insights into secondary metabolism from aglobal analysis of prokaryotic biosynthetic gene clusters Cell 158412ndash421
42 BlinK KimHU MedemaMH and WeberT (2017) Recentdevelopment of antiSMASH and other computational approaches tomine secondary metabolite biosynthetic gene clusters BriefBioinform doi101093bibbbx146
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
W86 Nucleic Acids Research 2019 Vol 47 Web Server issue
httpsgithubcomantismashantismash antiSMASH isalso available via Docker
ACKNOWLEDGEMENTS
We thank Justin JJ van der Hooft for critical comments onthe manuscript and providing documentation and EmiliaPalazzotto and Tetiana Gren for helpful discussions anduser testing of the new features
FUNDING
Novo Nordisk Foundation [NNF10CC1016517 toSYLTW NNF16OC0021746 to TW] Center forMicrobial Secondary Metabolites (CeMiSt) Danish Na-tional Research Foundation [DNRF137 to TW] Reinholdand Maria Teufel Foundation (to KS) Funding for openaccess charge The Novo Nordisk FoundationConflict of interest statement None declared
REFERENCES1 NewmanDJ and CraggGM (2016) Natural products as sources of
new drugs from 1981 to 2014 J Nat Prod 79 629ndash6612 van der MeijA WorsleySF HutchingsMI and van WezelGP
(2017) Chemical ecology of antibiotic production by actinomycetesFEMS Microbiol Rev 41 392ndash416
3 ZiemertN AlanjaryM and WeberT (2016) The evolution ofgenome mining in microbes - a review Nat Prod Rep 33 988ndash1005
4 WeberT RauschC LopezP HoofI GaykovaV HusonDHand WohllebenW (2009) CLUSEAN a computer-based frameworkfor the automated analysis of bacterial secondary metabolitebiosynthetic gene clusters J Biotechnol 140 13ndash17
5 SkinniderMA MerwinNJ JohnstonCW and MagarveyNA(2017) PRISM 3 expanded prediction of natural product chemicalstructures from microbial genomes Nucleic Acids Res 45W49ndashW54
6 MedemaMH BlinK CimermancicP de JagerV ZakrzewskiPFischbachMA WeberT TakanoE and BreitlingR (2011)antiSMASH rapid identification annotation and analysis ofsecondary metabolite biosynthesis gene clusters in bacterial andfungal genome sequences Nucleic Acids Res 39 W339ndashW346
7 BlinK MedemaMH KazempourD FischbachMABreitlingR TakanoE and WeberT (2013) antiSMASH 20ndashndashaversatile platform for genome mining of secondary metaboliteproducers Nucleic Acids Res 41 W204ndashW212
8 WeberT BlinK DuddelaS KrugD KimHU BruccoleriRLeeSY FischbachMA MullerR WohllebenW et al (2015)antiSMASH 30ndashndasha comprehensive resource for the genome miningof biosynthetic gene clusters Nucleic Acids Res 43 W237ndashW243
9 BlinK WolfT ChevretteMG LuX SchwalenCJKautsarSA Suarez DuranHG de Los SantosELC KimHUNaveM et al (2017) antiSMASH 40-improvements in chemistryprediction and gene cluster boundary identification Nucleic AcidsRes 45 W36ndashW41
10 KautsarSA Suarez DuranHG BlinK OsbournA andMedemaMH (2017) plantiSMASH automated identificationannotation and expression analysis of plant biosynthetic geneclusters Nucleic Acids Res 45 W55ndashW63
11 BlinK KazempourD WohllebenW and WeberT (2014)Improved lanthipeptide detection and prediction for antiSMASHPLoS One 9 e89420
12 VillebroR ShawS BlinK and WeberT (2019) Sequence-basedclassification of type II polyketide synthase biosynthetic gene clustersfor antiSMASH J Ind Microbiol Biotechnol 46 469ndash475
13 BlinK MedemaMH KottmannR LeeSY and WeberT (2017)The antiSMASH database a comprehensive database of microbialsecondary metabolite biosynthetic gene clusters Nucleic Acids Res45 D555ndashD559
14 BlinK Pascal AndreuV de Los SantosELC Del CarratoreFLeeSY MedemaMH and WeberT (2019) The antiSMASHdatabase version 2 a comprehensive resource on secondarymetabolite biosynthetic gene clusters Nucleic Acids Res 47D625ndashD630
15 MedemaMH PaalvastY NguyenDD MelnikADorresteinPC TakanoE and BreitlingR (2014) Pep2Pathautomated mass spectrometry-guided genome mining of peptidicnatural products PLoS Comput Biol 10 e1003822
16 AlanjaryM KronmillerB AdamekM BlinK WeberTHusonD PhilmusB and ZiemertN (2017) The AntibioticResistant Target Seeker (ARTS) an exploration engine for antibioticcluster prioritization and novel drug target discovery Nucleic AcidsRes 45 W42ndashW48
17 BlinK PedersenLE WeberT and LeeSY (2016) CRISPy-webAn online resource to design sgRNAs for CRISPR applicationsSynth Syst Biotechnol 1 118ndash121
18 ShirleyWA KelleyBP PotierY KoschwanezJH BruccoleriRand TarselliM (2018) Unzipping natural products improved naturalproduct structure predictions by ensemble modeling and fingerprintmatching ChemRxiv doi httpdoi1026434chemrxiv6863864 26July 2018 preprint not peer reviewed
19 Navarro-MunozJ Selem-MojicaN MullowneyM KautsarSTryonJ ParkinsonE De Los SantosE YeongMCruz-MoralesP AbubuckerS et al (2018) A computationalframework for systematic exploration of biosynthetic diversity fromlarge-scale genomic data bioRxiv doi httpdoi101101445270 17October 2018 preprint not peer reviewed
20 FinnRD CoggillP EberhardtRY EddySR MistryJMitchellAL PotterSC PuntaM QureshiMSangrador-VegasA et al (2016) The Pfam protein families databasetowards a more sustainable future Nucleic Acids Res 44D279ndashD285
21 LetunicI and BorkP (2018) 20 years of the SMART protein domainannotation resource Nucleic Acids Res 46 D493ndashD496
22 de JongA van HeelAJ KokJ and KuipersOP (2010) BAGEL2mining for bacteriocins in genomic data Nucleic Acids Res 38W647ndashW651
23 YadavG GokhaleRS and MohantyD (2009) Towards predictionof metabolic products of polyketide synthases an in silico analysisPLoS Comput Biol 5 e1000351
24 CraigJW CherryMA and BradySF (2011) Long-chain N-acylamino acid synthases are linked to the putativePEP-CTERMexosortase protein-sorting system in Gram-negativebacteria J Bacteriol 193 5707ndash5715
25 RobinsonSL ChristensonJK and WackettLP (2018)Biosynthesis and chemical diversity of -lactone natural productsNat Prod Rep 36 458ndash475
26 AgarwalV BlantonJM PodellS TatonA SchornMABuschJ LinZ SchmidtEW JensenPR PaulVJ et al (2017)Metagenomic discovery of polybrominated diphenyl etherbiosynthesis by marine sponges Nat Chem Biol 13 537ndash543
27 SosioM GaspariE IorioM PessinaS MedemaMHBernasconiA SimoneM MaffioliSI EbrightRH andDonadioS (2018) Analysis of the Pseudouridimycin biosyntheticpathway provides Insights into the formation of C-nucleosideantibiotics Cell Chem Biol 25 540ndash549
28 BauerJS GhequireMGK NettM JostenM SahlH-G DeMotR and GrossH (2015) Biosynthetic origin of the antibioticpseudopyronines A and B in Pseudomonas putida BW11M1Chembiochem 16 2491ndash2497
29 LuoH Hallen-AdamsHE Scott-CraigJS and WaltonJD (2012)Ribosomal biosynthesis of -amanitin in Galerina marginata FungalGenet Biol 49 123ndash129
30 NaganoN UmemuraM IzumikawaM KawanoJ IshiiTKikuchiM TomiiK KumagaiT YoshimiA MachidaM et al(2016) Class of cyclic ribosomal peptide synthetic genes infilamentous fungi Fungal Genet Biol 86 58ndash70
31 DingW LiuW-Q JiaY LiY van der DonkWA and ZhangQ(2016) Biosynthetic investigation of phomopsins reveals a widespreadpathway for ribosomal natural products in Ascomycetes Proc NatlAcad Sci USA 113 3521ndash3526
32 BushinLB ClarkKA PelczerI and SeyedsayamdostMR (2018)Charting an unexplored streptococcal biosynthetic landscape reveals
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W87
a unique peptide cyclization motif J Am Chem Soc 14017674ndash17684
33 CarusoA BushinLB ClarkKA MartinieRJ andSeyedsayamdostMR (2019) A radical approach to enzymatic-Thioether bond formation J Am Chem Soc 141 990ndash997
34 GibsonMK ForsbergKJ and DantasG (2015) Improvedannotation of antibiotic resistance determinants reveals microbialresistomes cluster by ecology ISME J 9 207ndash216
35 YiG SzeSH and ThonMR (2007) Identifying clusters offunctionally related genes in genomes Bioinformatics 23 1053ndash1060
36 InglisDO BinkleyJ SkrzypekMS ArnaudMBCerqueiraGC ShahP WymoreF WortmanJR and SherlockG(2013) Comprehensive annotation of secondary metabolitebiosynthetic genes and gene clusters of Aspergillus nidulans Afumigatus A niger and A oryzae BMC Microbiol 13 91
37 The Gene Ontology Consortium (2016) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338
38 MedemaMH KottmannR YilmazP CummingsMBigginsJB BlinK de BruijnI ChooiYH ClaesenJ
CoatesRC et al (2015) Minimum information about a biosyntheticgene cluster Nat Chem Biol 11 625ndash631
39 MedemaMH TakanoE and BreitlingR (2013) Detectingsequence homology at the gene cluster level with MultiGeneBlastMol Biol Evol 30 1218ndash1223
40 BaltzRH (2018) Natural product drug discovery in the genomic erarealities conjectures misconceptions and opportunities J IndMicrobiol Biotechnol 46 281ndash299
41 CimermancicP MedemaMH ClaesenJ KuritaK WielandBrownLC MavrommatisK PatiA GodfreyPA KoehrsenMClardyJ et al (2014) Insights into secondary metabolism from aglobal analysis of prokaryotic biosynthetic gene clusters Cell 158412ndash421
42 BlinK KimHU MedemaMH and WeberT (2017) Recentdevelopment of antiSMASH and other computational approaches tomine secondary metabolite biosynthetic gene clusters BriefBioinform doi101093bibbbx146
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019
Nucleic Acids Research 2019 Vol 47 Web Server issue W87
a unique peptide cyclization motif J Am Chem Soc 14017674ndash17684
33 CarusoA BushinLB ClarkKA MartinieRJ andSeyedsayamdostMR (2019) A radical approach to enzymatic-Thioether bond formation J Am Chem Soc 141 990ndash997
34 GibsonMK ForsbergKJ and DantasG (2015) Improvedannotation of antibiotic resistance determinants reveals microbialresistomes cluster by ecology ISME J 9 207ndash216
35 YiG SzeSH and ThonMR (2007) Identifying clusters offunctionally related genes in genomes Bioinformatics 23 1053ndash1060
36 InglisDO BinkleyJ SkrzypekMS ArnaudMBCerqueiraGC ShahP WymoreF WortmanJR and SherlockG(2013) Comprehensive annotation of secondary metabolitebiosynthetic genes and gene clusters of Aspergillus nidulans Afumigatus A niger and A oryzae BMC Microbiol 13 91
37 The Gene Ontology Consortium (2016) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338
38 MedemaMH KottmannR YilmazP CummingsMBigginsJB BlinK de BruijnI ChooiYH ClaesenJ
CoatesRC et al (2015) Minimum information about a biosyntheticgene cluster Nat Chem Biol 11 625ndash631
39 MedemaMH TakanoE and BreitlingR (2013) Detectingsequence homology at the gene cluster level with MultiGeneBlastMol Biol Evol 30 1218ndash1223
40 BaltzRH (2018) Natural product drug discovery in the genomic erarealities conjectures misconceptions and opportunities J IndMicrobiol Biotechnol 46 281ndash299
41 CimermancicP MedemaMH ClaesenJ KuritaK WielandBrownLC MavrommatisK PatiA GodfreyPA KoehrsenMClardyJ et al (2014) Insights into secondary metabolism from aglobal analysis of prokaryotic biosynthetic gene clusters Cell 158412ndash421
42 BlinK KimHU MedemaMH and WeberT (2017) Recentdevelopment of antiSMASH and other computational approaches tomine secondary metabolite biosynthetic gene clusters BriefBioinform doi101093bibbbx146
Dow
nloaded from httpsacadem
icoupcomnararticle-abstract47W
1W815481154 by D
TU Library - Technical Inform
ation Center of D
enmark user on 18 July 2019