+ All Categories
Home > Documents > The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions...

The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions...

Date post: 14-Oct-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
11
Published online 20 November 2014 Nucleic Acids Research, 2015, Vol. 43, Database issue D645–D655 doi: 10.1093/nar/gku1165 The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban 1,* , Rashmi Pant 2 , Arathi Raghunath 2 , Alistair G. Irvine 3 , Helder Pedro 4 and Kim E. Hammond-Kosack 1 1 Department of Plant Biology and Crop Science, Rothamsted Research, Harpenden, Herts, AL5 2JQ, UK, 2 Molecular Connections Private Limited, Basavanagudi, Bangalore 560 004, Karnataka, India, 3 Department of Computational and Systems Biology, Rothamsted Research, Harpenden, Herts, AL5 2JQ, UK and 4 European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Received September 19, 2014; Revised October 30, 2014; Accepted October 30, 2014 ABSTRACT Rapidly evolving pathogens cause a diverse array of diseases and epidemics that threaten crop yield, food security as well as human, animal and ecosys- tem health. To combat infection greater comparative knowledge is required on the pathogenic process in multiple species. The Pathogen-Host Interactions database (PHI-base) catalogues experimentally veri- fied pathogenicity, virulence and effector genes from bacterial, fungal and protist pathogens. Mutant phe- notypes are associated with gene information. The included pathogens infect a wide range of hosts in- cluding humans, animals, plants, insects, fish and other fungi. The current version, PHI-base 3.6, avail- able at http://www.phi-base.org, stores information on 2875 genes, 4102 interactions, 110 host species, 160 pathogenic species (103 plant, 3 fungal and 54 animal infecting species) and 181 diseases drawn from 1243 references. Phenotypic and gene function information has been obtained by manual curation of the peer-reviewed literature. A controlled vocabulary consisting of nine high-level phenotype terms per- mits comparisons and data analysis across the tax- onomic space. PHI-base phenotypes were mapped via their associated gene information to reference genomes available in Ensembl Genomes. Virulence genes and hotspots can be visualized directly in genome browsers. Future plans for PHI-base include development of tools facilitating community-led cu- ration and inclusion of the corresponding host tar- get(s). INTRODUCTION Existing and emerging infectious diseases are a major con- cern to plant, animal and human health, threaten global food security and increasingly affect the biodiversity of nat- ural ecosystems (1,2). Although the diseased state is rare, myriads of micro-organisms and invertebrate pests have evolved the ability to infect another species, gain sufficient sustenance to colonize their chosen host(s) and then to re- produce and disseminate efficiently to reinitiate the infec- tion process. In most host-pathogen, host-pest and host- parasite encounters, the host survives and the disease symp- toms are limited to specific cell layers, tissues or organs. Only a few pathogenic species routinely kill their selected host(s). With the advent of molecular cloning methods 30 years ago, the functional analysis of genes in host-pathogen interactions became feasible. The aim of many of these stud- ies is to identify the molecules and mechanisms involved in the disease formation process in an effort to develop reme- dial strategies to increase agricultural crop yield, to improve animal or human health or to maintain biodiversity within natural ecosystems. Since the publication of the first func- tional gene analyses in the early 1980s, which included the molecular characterization of the avrA avirulence gene from the bacterial pathogen Pseudomonas syringae pv. glycinea (PHI-base accession PHI:963) (3,4), many more genes in- volved in pathogen-host interactions have been identified and the number of publications has steadily increased (Fig- ure 1). Further key events in the history of functional gene analysis of pathogen-host interactions include: in 2005, the listing of >1500 active genome sequencing projects by the Genomes Online Database (GOLD) (5); in 2007, the report of a genome-wide functional analysis study of pathogenic- ity genes in the rice blast fungus Magnaporthe grisea; in 2010, publication of the first host-induced gene silencing (HIGS) study involving an obligate biotrophic species (6); in 2011, the genome-wide functional analysis of all transcrip- * To whom correspondence should be addressed. Tel: +44 1582 763133; Fax: +44 1582 760089; Email: [email protected] C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/nar/article-abstract/43/D1/D645/2438794 by Periodicals Assistant - Library user on 08 February 2019
Transcript
Page 1: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

Published online 20 November 2014 Nucleic Acids Research 2015 Vol 43 Database issue D645ndashD655doi 101093nargku1165

The Pathogen-Host Interactions database (PHI-base)additions and future developmentsMartin Urban1 Rashmi Pant2 Arathi Raghunath2 Alistair G Irvine3 Helder Pedro4 andKim E Hammond-Kosack1

1Department of Plant Biology and Crop Science Rothamsted Research Harpenden Herts AL5 2JQ UK2Molecular Connections Private Limited Basavanagudi Bangalore 560 004 Karnataka India 3Department ofComputational and Systems Biology Rothamsted Research Harpenden Herts AL5 2JQ UK and 4EuropeanBioinformatics Institute European Molecular Biology Laboratory Wellcome Trust Genome Campus HinxtonCambridge CB10 1SD UK

Received September 19 2014 Revised October 30 2014 Accepted October 30 2014

ABSTRACT

Rapidly evolving pathogens cause a diverse arrayof diseases and epidemics that threaten crop yieldfood security as well as human animal and ecosys-tem health To combat infection greater comparativeknowledge is required on the pathogenic processin multiple species The Pathogen-Host Interactionsdatabase (PHI-base) catalogues experimentally veri-fied pathogenicity virulence and effector genes frombacterial fungal and protist pathogens Mutant phe-notypes are associated with gene information Theincluded pathogens infect a wide range of hosts in-cluding humans animals plants insects fish andother fungi The current version PHI-base 36 avail-able at httpwwwphi-baseorg stores informationon 2875 genes 4102 interactions 110 host species160 pathogenic species (103 plant 3 fungal and 54animal infecting species) and 181 diseases drawnfrom 1243 references Phenotypic and gene functioninformation has been obtained by manual curation ofthe peer-reviewed literature A controlled vocabularyconsisting of nine high-level phenotype terms per-mits comparisons and data analysis across the tax-onomic space PHI-base phenotypes were mappedvia their associated gene information to referencegenomes available in Ensembl Genomes Virulencegenes and hotspots can be visualized directly ingenome browsers Future plans for PHI-base includedevelopment of tools facilitating community-led cu-ration and inclusion of the corresponding host tar-get(s)

INTRODUCTION

Existing and emerging infectious diseases are a major con-cern to plant animal and human health threaten globalfood security and increasingly affect the biodiversity of nat-ural ecosystems (12) Although the diseased state is raremyriads of micro-organisms and invertebrate pests haveevolved the ability to infect another species gain sufficientsustenance to colonize their chosen host(s) and then to re-produce and disseminate efficiently to reinitiate the infec-tion process In most host-pathogen host-pest and host-parasite encounters the host survives and the disease symp-toms are limited to specific cell layers tissues or organsOnly a few pathogenic species routinely kill their selectedhost(s) With the advent of molecular cloning methods 30years ago the functional analysis of genes in host-pathogeninteractions became feasible The aim of many of these stud-ies is to identify the molecules and mechanisms involved inthe disease formation process in an effort to develop reme-dial strategies to increase agricultural crop yield to improveanimal or human health or to maintain biodiversity withinnatural ecosystems Since the publication of the first func-tional gene analyses in the early 1980s which included themolecular characterization of the avrA avirulence gene fromthe bacterial pathogen Pseudomonas syringae pv glycinea(PHI-base accession PHI963) (34) many more genes in-volved in pathogen-host interactions have been identifiedand the number of publications has steadily increased (Fig-ure 1) Further key events in the history of functional geneanalysis of pathogen-host interactions include in 2005 thelisting of gt1500 active genome sequencing projects by theGenomes Online Database (GOLD) (5) in 2007 the reportof a genome-wide functional analysis study of pathogenic-ity genes in the rice blast fungus Magnaporthe grisea in2010 publication of the first host-induced gene silencing(HIGS) study involving an obligate biotrophic species (6) in2011 the genome-wide functional analysis of all transcrip-

To whom correspondence should be addressed Tel +44 1582 763133 Fax +44 1582 760089 Email martinurbanrothamstedacuk

Ccopy The Author(s) 2014 Published by Oxford University Press on behalf of Nucleic Acids ResearchThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D646 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 1 Growth of the number of published articles screened by keyword search for PHI-base and the number of phenotypically curated genes Thisfigure was generated from literature records retrieved at PubMed and Web of Science using the search terms lsquo(fungor yeast) and (gene or factor) and(pathogenicity or virulen or avirulence gene)rsquo Key events in the history of functional gene analysis of pathogen-host interactions include a identificationof the first avirulence gene (4) b gt1500 genome sequencing projects listed in the GOLD database (5) c genome-wide functional analysis of pathogenicitygenes in the rice blast fungus Magnaporthe oryzae d the first host-induced gene silencing (HIGS) study involving an obligate biotrophic species (6) egenome-wide functional analysis of all transcription factors and protein kinases predicted in the cereal infecting fungus Fusarium graminearum (78)

tion factors and protein kinases in the cereal infecting fun-gus Fusarium graminearum (78)

Established in 2005 the pathogen-host interactionsdatabase (PHI-base) contains expertly curated molecularand biological information on genes proven to affect theoutcome of pathogen-host interactions Phenotypes canbe assigned to the outcome of such interactions WithinPHI-base genes are catalogued when their function in thepathogenic process has been tested through gene disruptionandor transcript level alteration experiments These genesare termed pathogenicity genes if the effect on the phe-notype is qualitative (diseaseno disease) They are calledvirulenceaggressiveness genes if the effect is quantitativeAnother category of genes increasingly catalogued in PHI-base are effector genes formerly known as avirulence genesEffector genes either activate or suppress plant defence re-sponses

There are five key motivations to improve the data con-tent of PHI-base and its taxonomic coverage (i) In thepost-genomics era and with the ever cheaper cost of whole-genome sequencing there is intense interest in compara-tive pathogen genomics to identify functionally homolo-gous genes as well as species-unique genes (ii) The breadth

and efficiency of both forward and reverse genetics anal-ysis in plant- and animal-infecting pathogenic species hasaccelerated the pace of discovery with generated mutantssubject to intense investigation and scrutiny In many inter-action studies model host species are used increasingly tosave costs but which may or may not yield results equiv-alent to those obtained in the natural host species Thuscomparisons with known interactions in the natural hostcan be informative (iii) Many gene sequences linked to apathogenic process lack sufficient formal descriptive anno-tation such as that provided by Gene Ontology (GO) (9)PHI-base provides a repository for such gene annotation(iv) Increased species coverage across a wider taxonomicrange permits the PHI-base data to be used in silico to pre-dict with a higher level of confidence the repertoire of viru-lence associated genes in more species (v) Finally and mostimportantly researchers require free and easy access to dif-ferent types of interaction information to facilitate hypoth-esis generation and knowledge discovery

Here we report on a major increase in PHI-base genecontent new database features integration with comple-mentary databases and use cases The original release ofPHI-base was published in the NAR database issue in

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D647

2006 (10) A second NAR article in 2008 reviewed ad-ditional data and new features available within PHI-baseversion 30 (11) Since then usage of PHI-base has grownand the PHI-base website receives about 1500 hits perquarter excluding internal users with users located insim89 countries Several other databases provide informa-tion which partially overlap with either the species dataor biological information provided within PHI-base Theseresources include the Fungal Virulence Factor Database(DFVF) (12) the e-Fungi project (13) Ensembl Genomes(14) the Oomycetes Transcriptomics Database (15) the Eu-karyotic Pathogen Database Resources (EuPathDB) (16)FungiDB (17) the Host-Pathogen Interaction databaseon human viruses (HPIDB) (18) JGI-MycoCosm (19)PHIDIAS (20) PLEXdb (21) and the database on virulencefactors of pathogenic bacteria (VFDB) (22) These com-plementary resources and their specialisms are summarizedin Table 1 When used collectively these databases provideprospective and existing users of PHI-base with a substan-tially enriched environment to pursue a wide range of simpleto advanced in silico analyses on pathogenic organisms andthe underlying pathogenic processes

NEW FEATURES

An expanded taxonomic range and controlled vocabulary

Version 30 released in 2007 contained information on bac-terial fungal and oomycete pathogens as well as plant en-dophytes Version 36 now also includes pathogenic plantinfecting nematode and aphid pests and animalhumaninfecting parasites (Table 2) Between these versions ofPHI-base the total number of pathogenic species has risenfrom 95 to 160 The number of bacterial pathogens tripledover the same period In addition the number of obligatebiotrophic species has increased from three to seven Tohelp PHI-base users become rapidly familiar with the bi-ology of the wider range of pathogens and pests available afull list of the pathogenic species covered in PHI-base ver-sion 36 is provided in Supplementary Table S1 along withtheir NCBI taxon identifier and both the natural and ex-perimental host(s) The number of documented host speciesnaturally infected by each pathogen and the identity of ob-ligate biotrophs among the species is also described Thislevel of detail is provided to assist users in the selection ofpathogenic species to include in comparative genomic anal-ysis An up-to-date version of Supplementary Table S1 ismaintained on the PHI-base lsquoAboutrsquo website reflecting thedata for each new release

A new addition requested by users is the consistent useof a controlled vocabulary of high-level phenotyping terms(Table 3) Currently nine phenotyping terms are used topermit consistent data retrieval comparative phenomicsacross a wide taxonomic range and statistical analysis Onlyone term is assigned per host-pathogen interaction An in-teraction is defined as the function of one gene on one hostand one tissue type The PHI-base phenotype terms selectedare routinely used in research articles but mapping to GOterms is not supported due to their high-level nature Since2008 several new techniques for investigating gene prod-uct function have become more widely adopted For exam-ple for some obligate plant infecting pathogens including

Blumeria and Puccinia species which infect specific cerealhosts a novel technique called host-induced gene silenc-ing (HIGS) is used In HIGS an antisense construct is ex-pressed from the host species and used to transiently silencea specific pathogen gene during the infection process whichif successful results in an altered phenotypic outcome (23)The eight entries PHI2896 to PHI2903 were obtained forthe Blumeria graminis f sp hordeindashbarley interaction usingthe new HIGS technique

Additional content and species coverage

PHI-base version 36 contains information on 2875 genes4102 interactions 110 host species and 160 pathogenicspecies The pathogen species include 103 plant 3 fun-gal and 54 animal infecting species The organisms in thedatabase cause 181 different diseases and were obtainedfrom 1243 peer-reviewed references The functional geneinformation included was curated from studies publishedbetween 1987 to the end of 2013 Details of the host andpathogen species coverage is given in Table 2 and Supple-mentary Table S1 One-third of the prokaryote interactionsnow involve a human pathogen with the highest numberof 115 interactions from Salmonella species For plant in-fecting bacteria the highest numbers are 300 and 161 in-teractions from Xanthomonas and Pseudomonas species re-spectively The fungal pathogen interactions are dominatedby the Ascomycetes (67 species) followed by the Basid-iomycetes (8 species) providing 2759 and 405 interactionsrespectively The fungal interactions are also predominantlyfrom plant infecting species (2645 interactions) comparedto animalhuman infecting species (519 interactions) Thenumber of interactions from the eight oomycete species isfar lower at 86 which are all from plant infecting speciesThe newly curated plant infecting nematodes and aphidsand animalhuman infecting parasites provide 43 interac-tions from 9 species The new data is summarized by hosttype and pathogen species taxonomy in Table 2 The plantpathogen species providing the greatest number of interac-tions are the cereal infecting fungi Fusarium graminearumMagnaporthe oryzae and Ustilago maydis Xanthomonasbacteria and the dicotyledonous infecting fungus Botrytiscinerea and Pseudomonas bacteria For animalhuman in-fecting species the greatest number of interactions are pro-vided by the fungi Candida albicans and Cryptococcus neo-formans and the bacterium Salmonella entrica (Table 2)

The nine new high-level phenotypic outcome terms aredefined in Table 3 These have been included in the advancedsearch to permit researchers to explore the database acrossa wide range of taxonomically diverse species which exhibitvery varied pathogenic lifestyles Only the entry types lsquoef-fectorrsquo and lsquoenhanced antagonismrsquo are limited to plant in-fecting species In total 84 interactions from a total of 23species have the outcome lsquoincreased virulence (hyperviru-lence)rsquo This expanding number is noteworthy and suggeststhat negative regulation of key pathogenicity processes com-monly occurs during the infection and colonization of bothplant and animal hosts Also of interest are the 1224 inter-actions (298 of the entire database content) with the out-come lsquounaffected pathogenicityrsquo The majority of these havebeen reported for plant pathogens These negative outcomes

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D648 Nucleic Acids Research 2015 Vol 43 Database issue

Table 1 Multispecies databases and websites involving plant human andor animal infecting pathogens which contain information complementary tothe data in PHI-base

Name and refa URL (http) Comments

Broad-Fungal Genome Initiative wwwbroadinstituteorgscientific-communityscienceprojectsfungal-genome-initiative

Genome browsing and comparative analysis forseveral plant pathogen division

DFVF (12) sysbiounleduDFVF Fungal virulence factor database generated usingtext-mining of the PubMed database and Internet

e-Fungi (13) wwwcsmanacuksimcornelleFungi Rich source of ESTs obtained by Sangersequencing

Ensembl Genomes (14) wwwensemblgenomesorg Non-vertebrate species genomes portal with linksto bacteria fungi metazoa plants and protists

Ensembl Bacteria bacteriaensemblorg Genomes of bacterial and archeaEnsembl Fungi fungiensemblorg Genomes of fungal species including fungal

pathogensEnsembl Protists protistsensemblorg Genomes of protist species including

PhytophthoraOomycetes Transcriptomics Database (15) wwweumicrobedborgtranscripts Oomycete genomes and transcriptomics

EuPathDB (16) eupathdborg Human pathogens

FRAC wwwfracinfo All known chemical target sites used commerciallyfor the control of pathogens

FungiDB (17) fungidborg Fungal genomics database providing graphicaltools for data mining

HPIDB (18) agbasemsstateedu Fifteen human virus pathogensndashprotein-proteininteraction data

JGI-MycoCosm (19) genomejgidoegovprogramsfungi A genome portal for 100s of pathogenic andnon-pathogenic fungal species

Pathogen Portal wwwpathogenportalorg Emerging or re-emerging pathogens potentialbiowarfare or bioterrorism pathogens

PHIDIAS (20) wwwphidiasus Medical fungal and bacterial pathogens

PhytoPath wwwphytopathdborg PhytoPathndash32 Fungi 14 Protists 12 bacterialspecies linked to PHI-base

PLEXdb (21) wwwplexdborg Transcriptomics data only on plants pathogensand during interactions

USDA ntars-gringovfungaldatabases Description of all the known hosts of fungi whichinfect plants

VFDB (22) wwwmgcaccnVFs Virulence factors of human and animal bacterialpathogens

aReference provided where available

are usually presumed by the authors to indicate the geneproduct does not have a role in the pathogenic process underinvestigation or has arisen due to genetic redundancy iethe function of a highly homologous gene replaces the func-tion of the missing gene product under experimental evalu-ation In some studies the inclusion of double-gene deletionresults has been able to clarify the situation For examplethe Candida albicans gene PDE1 (PHI857) has been impli-cated in virulence The PDE1 mutant alone is unaffected inpathogenicity However the double-gene deletion of PDE1and PDE2 shows a more severe effect than deletion of thePDE2 (PHI856) gene on its own (24) In Magnaportheoryzae (formerly called M grisea) deletion of the individ-ual genes MoRgs1 (PHI2192) and MoRgs4 (PHI2195) ledto a reduced-virulence phenotype but the double-gene dele-tion rgs1 rgs4 mutant has a more severe lsquoloss of pathogenic-ityrsquo phenotype (25) In the animal pathogen Vibrio choleraethe effect of a triple mutation on biofilm formation andvirulence was used to test the combined function of tatA(PHI2415) tatB (PHI2416) tatC (PHI2417) and revealedthis small gene family was required for virulence in mice(25) Going forward the use of the lsquounaffected pathogenic-ityrsquo category in comparative species analyses will be partic-

ularly informative when the genes involved are present inonly one copy per species This approach will reveal whichgenes function in a species-specific or taxon clade-specificmanner

The high-level phenotypic outcomes for all interactionsare summarized in Table 4 A total of 120 PHI-base ac-cessions have been assigned the high-level phenotypic out-come lsquoEssential (lethal)rsquo In these studies mainly two typesof experimental data were reported First in Aspergillus fu-migatus a promoter replacement strategy was employed toconstruct conditional mutants For these mutants the ad-dition of ammonium into the nitrogen source switches offgene expression and this allows functional gene tests of es-sential genes (26) Secondly in genome-wide gene replace-ment studies in Gibberella zeae no transformants were re-covered in repeated experiments while transformants wererecovered for many other genes Thus authors consideredthat the genersquos function was lsquoessential for lifersquo (78)

A lsquomixed outcomersquo of phenotypes can be assigned whenthe transgenic mutants generated are tested on either multi-ple host species or different tissuesorgans of the same hostspecies Different outcomes on hosts belonging to differentkingdoms potentially indicate a differential host require-

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D649

Table 2 Interactions in PHI-base version 36 grouped by either hostspecies or pathogen species

HostEntry type Interactions

TOTALa 4102PROKARYOTES (55)b 804Animal hosts (16)c 249 (31)

Salmonella spp(3)d 115Plant hosts (29) 555 (69)

Xanthomonas spp (10) 300Pseudomonas spp(7) 161Erwinia amylovora 29Plectobacterium spp (3) 10

EUKARYOTES (105) 3298Animal hosts (20) 549 (166)

Ascomycetes (17) 375Candida spp (5) 238Aspergillus fumigatus 98

Basidiomycetes (4) 144Cryptococcus neoformans 136

Parasitic species (5)e 30Plant hosts (93) 2744 (832)

Ascomycetes (60) 2384Fusaria - cereal infecting (7) 1053Fusarium graminearum 1042Magnaporthe spp(3) 575Botrytis spp(2) 205Fusaria - dicot infecting (6) 93Cochliobolus (5) 88Alternaria spp (4) 78Colletotrichium (9) 48Stagnosporum nodorum 44Zymoseptoria tritici 42

Basidiomycetes (4) 261Ustilago maydis 243Melampsori lini 7

Oomycetes (8) 86Phytophthora spp (5) 53Hyaloperonospora spp(2) 30

Others (4) 13Aphids (2) 10Nematodes (2) 3

Fungal hosts (3) 4Endophyte (1) 5

Epichloe festucae 5

aOnly highly represented taxon groups are listed For a complete list ofspecies in the database see Supplementary Table S1bThe table is divided into prokaryote and eukaryote host species Thespecies count number is listed in bracketscHost species are further divided into animal and plant hostdLeft-indented genera and species infect or belong to taxonomic grouplisted non-indented above Only main representatives organisms are listedeParasitic species are Leishmania infantum L mexicana Toxoplasmagondii Trypanosoma brucei and T cruzi

ment For example Fusarium oxysporum is able to systemi-cally infect tomato plants and immune-compromised miceThe PHI-base entries PHI215 PHI-285 and PHI315 re-veal a differential requirement for cell-signalling and cellwall formation of three genes during the pathogenesis ofplant and animal hosts

Integration with other database sources

PHI-base is a gene-centric database Each gene has its ownPHI-base accession number One advantage of this designis that phenotypic information is directly linked to a spe-cific gene This phenotypic information can then easily bemapped to genomes Additional information such as GO

terms and protein structure information is then extractedfrom other databases In our current curation we priori-tise the use on UniProt accessions (27) to facilitate subse-quent bioinformatics analysis During the curation processour biocurators map reported EMBL or GenBank acces-sions to existing UniProt identifiers where these exist How-ever for species where protein accessions are not availablein UniProt at the time of curation and authors did not pro-vide GenBank accession numbers in their studies only lim-ited or no information on the geneprotein can be providedin PHI-base until this information becomes available

Whole-genome information is increasingly available forplant and animal pathogens We have mapped phenotypesin PHI-base via their gene accessions to reference genomicsequences available in Ensembl Genomes sites for fungiprotist (including oomycetes) and bacteria (26) In total1550 out of 2047 interactions involved in plant pathogene-sis from pathogens with an available reference genome havebeen mapped to Ensembl Genomes The remainder of thePHI-base accessions are either associated with only geneticdata or the genome sequence information is still missingor are associated with previously reported sequences andisolates that differ from those in the published referencegenomes Work is continuing to resolve these cases

Functional analysis of PHI-base accessions

The entire contents of PHI-base are available to users fromthe lsquoDownloadrsquo section where sequence information isavailable for 2527 PHI-base accessions We surveyed thecontent of PHI-base accessions by cataloguing the proteinaccessions using their GO classification using Blast2GOsoftware and standard parameters (28) GO terms were as-signed to 63 of PHI-base accessions (Figure 2) For a totalof 37 (929 proteins) no GO annotation could be madeMany of these accessions are species-specific proteins andare effectors The major GO categories assigned included(i) metabolic processes (ii) cellular processes such as cellcommunication and (iii) single-organism processes suchas cell proliferation filamentous growth and pigmentationMicrobial pigments in pathogens are known to provide pro-tection against ultraviolet radiation host-defence productsand other stresses encountered during host invasion

The category lsquocell killingrsquo was only assigned to six acces-sions and included Pseudomonas effectors and the Vibriocholerae enterotoxin This low number is an unexpected re-sult because for many of the host-pathogen interactions cat-alogued in PHI-base at some point host cell death occursie in interactions involving pathogens with a necrotrophicor hemibiotrophic lifestyle

TECHNICAL DEVELOPMENTS CURATION AND OUT-REACH

Data curation and release management

In the NAR 2008 article (11) we provided the details of thecuration procedure in use This procedure is still in placeHowever due to the increasing volume of literature requir-ing curation (Figure 1) we now use additional proceduresPrimarily papers are found in the literature databases Webof Science and PubMed using the keyword search terms

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D650 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 Definitions for the nine high-level phenotype outcomes used in PHI-base

High-level phenotypeoutcomea Definition

Loss of pathogenicity The transgenic strain fails to cause disease that is observed in the wild type (ie qualitative effect)Reduced virulence The transgenic strain still causes some disease formation but fewer symptoms than the wild-type

strain (ie a quantitative effect) Synonymous with the term reduced aggressivenessUnaffectedpathogenicity

The transgenic strain which expresses altered levels of a specific gene product(s) causes the same levelof disease compared to the wild-type reference strain

Increased virulence(Hypervirulence)

The transgenic strain causes greater incidence or severity of disease than the wild-type strain

Effector (plantavirulencedeterminant)

Some effector genes are required to cause disease on susceptible hosts but most are not A plantpathogen-specific term which was previously referred to as a corresponding avirulence (Avr) geneAn effector gene is formally identified because its presence leads to the direct or indirect recognitionof a pathogen in resistant host genotypes which possess the corresponding disease resistance (R)gene Positive recognition leads to activation of plant defense and the pathogen either fails to causedisease or causes less disease In the absence of the pathogen effector delivery into a healthy plantpossessing the corresponding R gene activates plant defense responses

Lethal The transgenic strain is not viable The gene product is essential for life of the organismEnhanced antagonism The transgenic strain shows greater endophytic biomass in the host andor the formation of visible

disease symptomsResistant to chemical The transgenic strainb grows andor develops normally when exposed to chemistry concentrations

that are detrimental to the wild-type strainSensitive to chemical The transgenic strain which expresses either no or reduced levels of a specific gene product(s) or

possesses a specific gene mutation(s) has the same abilityc as the wild-type strain to grow anddevelop when exposed to detrimental chemistry concentrations

aCompared to wild-type reference strain (ie a direct isogenic strain comparison)bMolecular studies on natural field isolate population are also considered once the natural target site has been identifiedcOn rare occasions increased sensitivity to chemistry has been observed

Table 4 Number of interactions per phenotypic group in animal and plant hosts

Entry type Animal hosta Plant host

Loss of pathogenicity 73 404Reduced virulenceb 542 1056Increased virulence 33 51Essential (lethal) 46 74Unaffected pathogenicityc 80 1144Effector 0 533Enhanced antagonism 0 4Resistance to chemistry 5 30Sensitive to chemistryd 1 7

aAnimal and plant-attacking pathogens are listed with their taxonomy ID and lifestyle in Supplementary Table S1bThe three missing entries in this category have other host typescOne entry in this category has a fungal hostdOne entry in this category has a fish host

(fungor yeast) and (gene or factor) and (pathogenicity orvirulen or avirulence gene) (29) Text mining is not em-ployed due to the fact that relevant information has to beextracted by analysing figures tables and text in the peer-reviewed articles This task can only be done by trainedbiocurators with a strong understanding of the researcharea PHI-base relies heavily on support of the scientificcommunity to suggest relevant articles for curation and forthe subsequent quality control of entries The PHI-baseteam does not have any individual member solely dedicatedto data curation Instead team members curate data on apart-time basis and when the need arises In an effort toclose a curation gap for articles published between 1984and 2014 a collaboration was established with the cura-tion scientists at Molecular Connections Bangalore IndiaThe biocurators give priority to author assigned gene func-tion over computational transferred annotation such asGO terms The author-assigned function is frequently ex-

tracted from either title or abstract Experts from the sci-entific community are invited on a regular basis to verifynew records before uploading into the database and providequality control

Mapping PHI-base phenotypes to Ensembl Genomes

Through the cross-referencing with Ensembl Genomes(httpensemblgenomesorg) PHI-base annotations cannow be visualized directly in their genomic context identi-fying features such as pathogenicity islands through a sim-ple system of colour coding using the nine high-level phe-notyping terms This new way to explore the data in PHI-base is shown in Figure 3 The phenotyping term lsquomixedoutcomersquo is also used to identify genes where a range of in-teraction outcomes have been identified depending on thehost species andor tissue type evaluated

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D651

Figure 2 GO terms assigned to PHI-base accessions in Version 36 mapped to a biological process

APPLICATIONS OF PHI-BASE

PHI-base use has been cited in over 100 peer-reviewedpublications These publications are listed in year orderin the lsquoAboutrsquo section of the database Recently publishedPHI-base use cases include genome mining and compar-ative genomics (3031) the selection and functional test-ing of candidate virulence factors in newly sequenced fun-gal and nematode pathogens of agricultural importance(1011) and studies investigating the subtle differences be-tween pathogen and biocontrol species (23) In Table 5 themain uses of PHI-base are given along with literature ex-amples (1430ndash46) In the past 4 years we have observeda gradual shift in PHI-base use with an increase in thenumber of larger comparative gene function studies and in-vestigations reporting the in silico prediction of virulence-associated genes

FORTHCOMING DEVELOPMENTS

Tools for community-led curation

A big challenge facing all biological databases is the grow-ing quantity of data and the relative difficulty of obtain-ing resources to curate the knowledge that derives fromit For the pathogen-host interaction community the scaleof the problem is considerable (Figure 1) One solution isto encourage community-based curation particularly bythe authors of scientific publications who may be moti-vated to have their work correctly represented within thedatabase and who are the experts in their own specialist do-mains (although they may not be expert in the conventionsin use within the database) Inclusion of studies in PHI-base also improves their visibility and accessibility PHI-base has a curation model based on community contribu-tion although hitherto this has involved certain collabo-rators curating many papers in their own area of expertise

after prior training in the data entry tools A more scalablemodel would allow all users to directly curate their own pa-pers without prior training A new easy-to-use web-basedinterface for direct access by the wider community is cur-rently in development

The PHI-base web-based curation tool will facilitate cu-ration of pathogen-host interactions from peer reviewed lit-erature into PHI-base by the authors doing the experimen-tal analyses This curation tool will be based on the recentlydeveloped Canto tool an online tool that supports func-tional gene annotation (47) Canto is part of the GenericModel Organism Database project which provides a suiteof open software for managing genetic data (httpwwwgmodorg) Canto has proven effective for the community-based curation of data for the fission yeast database Pom-Base (httpwwwpombaseorg) (48) The PHI-base cura-tion tool will use ontological data from a variety of sourcesmost notably from the Open Biological and Biomedical On-tologies Foundry (httpwwwobofoundryorg) (49) How-ever some terminology is specific to the nature of the inter-actions captured in PHI-base so will require the develop-ment of new controlled vocabularies for this purpose Forexample an lsquointeraction evidencersquo ontology will be createdto specify the evidence for pathogen-host interactions thuscomplimenting the gene-centric data from the GO Also inaddition to the controlled high level vocabulary above de-scribing the phenotype of the pathogen (Table 4) a similarcontrolled vocabulary can be created to describe the affectthe interaction has on the host organism To ensure qualityand consistency of the curated data all annotations will stillbe approved by a curator or expert with knowledge of thespecies and the captured data

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D652 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 3 Inspection of gene function using the Ensembl genome browser (A) Displayed is a small chromosomal region in Magnaporthe oryzae showingtwo genes involved in pathogenicity (as annotated in PHI-base) in their genomic context (viewable in the Ensembl browser in the transcript display) Acolour code indicates the annotated role of each gene green lsquoloss of pathogenicityrsquo and orange lsquoreduced virulencersquo (B) By selecting each colour-codedMGG transcript ID information is revealed on the associated gene deletion study curated in the PHI-base database

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D653

Table 5 PHI-base uses that have often appeared in the peer-reviewed literature

Use case Type of research studyExamplereference

1 Annotation and candidate gene selectionLarge scale forward genetics screens (3233)Transcriptome studies (RNAseq microarrays ESTs) (343645)Full and partial genome annotation genome mining (303137)

2 Predictive bioinformatics analyses Networks protein-protein interaction mapping (353839)3 Complementary databases (144041)4 Review articles (424446)5 Single gene function studies

Inter-comparisonand inter-comparison of gene mutants within and between species (43)

Tools for data mining

We are currently developing a new tool for the analysisand extraction of whole genomic data from plant pathogensas part of the PhytoPath project (httpwwwphytopathdborg) using the data warehousing framework BioMart (50)allowing users to mine genomic data (for sequence and an-notation) across multiple species based on PHI-base anno-tations in conjunction with other annotations The tool isexpected to be launched before the end of 2015

Other activities

Our intention is to extend the taxonomic range availablewithin PHI-base to sim200 host-infecting species within thenext 2 years At this level of species coverage detailed analy-ses within and between specific groups of pathogens withdifferent infection strategies host ranges taxonomic as-signments or between pathogenic and closely related non-pathogenic endophytic or symbiotic lifestyles should befeasible

In the next phase of curation a greater emphasis will beplaced on the effector literature which should increase thenumber of interactions from bacterial oomycete and ob-ligate biotrophic species To accompany this developmentthe curation of the corresponding host target(s) ie initialmolecular partner in the host has commenced and this im-portant information should soon be available For examplevarious bacterial effectors including AvrRpm1 (PHI977)and AvrRpt2 (PHI979) are delivered into the plant cyto-plasm via the bacterial type III secretion system These ef-fectors interact with the Arabidopsis protein RIN4 (51)These protein interaction data sets are of growing impor-tance in the analysis of host-pathogen host-pest and host-parasite interactions as they typically represent communi-cation events that have co-evolved between biological king-doms For example Arabidopsis mutants harbouring T-DNA insertions within different host targets of specific ef-fectors were found to exhibit an enhanced disease resistance(edr) phenotype to both powdery and downy mildews (52)In addition by the inclusion of the corresponding host tar-gets more effectors from obligate biotrophic species canbe curated into PHI-base These effectors are rigorouslytested for their role in pathogenicity using a range of othertechniques but not those involving the generation of stablepathogen transformants

Database access and feedback

PHI-base can be freely accessed at httpwwwphi-baseorg The complete database can be downloaded fromthe lsquoDownloadrsquo section Prior to downloading the entiredatabase to create local Basic Local Alignment Search Tooldatabases or for other bioinformatics applications (Table 5)users are asked to fill in a registration form This allowsPHI-base to monitor the number of academic and indus-trial users a requirement by our sponsors

User support can be obtained from this emailcontactphi-baseorg Please use this email address ifyou wish to provide new data for inclusion in PHI-baseif you are an expert willing to assist with curation for thenomination of peer-reviewed papers to be curated or if youcan provide suggestions for improvement to the PHI-basewebsite

To increase the awareness of PHI-base developmentswithin the community and for users to be notified whennew releases occur we have developed a PHI-base usermailing list (userslistsphi-baseorg) Users can subscribefrom a link on the PHI-base website in the lsquoHelprsquo sectionor directly by going to httpswwwlistsrothamstedacukmailmanlistinfousers

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors would like to thank all the species experts whocontributed database annotations from their field of ex-pertise into PHI-base and the members of our ScientificAdvisory Board Dr Michael Csukai (Syngenta JealottrsquosHill UK) Professor Jonathan Jones (The Sainsbury Lab-oratory UK) Dr Leighton Pritchard (The James HuttonUK) and Professor Pietro Spanu (Imperial College UK)Dr Jan Taubert (formerly Rothamsted Research) is thankedfor his expert support in maintaining the PHI-base parserWe thank Drs Paul Kersey Uma Maheswari and DanStaines at the European Bioinformatics Institute (Cam-bridge) for helpful discussions and for significantly improv-ing the pathogen species content within Ensembl Genomes

FUNDING

This work is supported by the UK Biotechnologyand Biological Sciences Research Council (BBSRC)

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 2: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

D646 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 1 Growth of the number of published articles screened by keyword search for PHI-base and the number of phenotypically curated genes Thisfigure was generated from literature records retrieved at PubMed and Web of Science using the search terms lsquo(fungor yeast) and (gene or factor) and(pathogenicity or virulen or avirulence gene)rsquo Key events in the history of functional gene analysis of pathogen-host interactions include a identificationof the first avirulence gene (4) b gt1500 genome sequencing projects listed in the GOLD database (5) c genome-wide functional analysis of pathogenicitygenes in the rice blast fungus Magnaporthe oryzae d the first host-induced gene silencing (HIGS) study involving an obligate biotrophic species (6) egenome-wide functional analysis of all transcription factors and protein kinases predicted in the cereal infecting fungus Fusarium graminearum (78)

tion factors and protein kinases in the cereal infecting fun-gus Fusarium graminearum (78)

Established in 2005 the pathogen-host interactionsdatabase (PHI-base) contains expertly curated molecularand biological information on genes proven to affect theoutcome of pathogen-host interactions Phenotypes canbe assigned to the outcome of such interactions WithinPHI-base genes are catalogued when their function in thepathogenic process has been tested through gene disruptionandor transcript level alteration experiments These genesare termed pathogenicity genes if the effect on the phe-notype is qualitative (diseaseno disease) They are calledvirulenceaggressiveness genes if the effect is quantitativeAnother category of genes increasingly catalogued in PHI-base are effector genes formerly known as avirulence genesEffector genes either activate or suppress plant defence re-sponses

There are five key motivations to improve the data con-tent of PHI-base and its taxonomic coverage (i) In thepost-genomics era and with the ever cheaper cost of whole-genome sequencing there is intense interest in compara-tive pathogen genomics to identify functionally homolo-gous genes as well as species-unique genes (ii) The breadth

and efficiency of both forward and reverse genetics anal-ysis in plant- and animal-infecting pathogenic species hasaccelerated the pace of discovery with generated mutantssubject to intense investigation and scrutiny In many inter-action studies model host species are used increasingly tosave costs but which may or may not yield results equiv-alent to those obtained in the natural host species Thuscomparisons with known interactions in the natural hostcan be informative (iii) Many gene sequences linked to apathogenic process lack sufficient formal descriptive anno-tation such as that provided by Gene Ontology (GO) (9)PHI-base provides a repository for such gene annotation(iv) Increased species coverage across a wider taxonomicrange permits the PHI-base data to be used in silico to pre-dict with a higher level of confidence the repertoire of viru-lence associated genes in more species (v) Finally and mostimportantly researchers require free and easy access to dif-ferent types of interaction information to facilitate hypoth-esis generation and knowledge discovery

Here we report on a major increase in PHI-base genecontent new database features integration with comple-mentary databases and use cases The original release ofPHI-base was published in the NAR database issue in

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D647

2006 (10) A second NAR article in 2008 reviewed ad-ditional data and new features available within PHI-baseversion 30 (11) Since then usage of PHI-base has grownand the PHI-base website receives about 1500 hits perquarter excluding internal users with users located insim89 countries Several other databases provide informa-tion which partially overlap with either the species dataor biological information provided within PHI-base Theseresources include the Fungal Virulence Factor Database(DFVF) (12) the e-Fungi project (13) Ensembl Genomes(14) the Oomycetes Transcriptomics Database (15) the Eu-karyotic Pathogen Database Resources (EuPathDB) (16)FungiDB (17) the Host-Pathogen Interaction databaseon human viruses (HPIDB) (18) JGI-MycoCosm (19)PHIDIAS (20) PLEXdb (21) and the database on virulencefactors of pathogenic bacteria (VFDB) (22) These com-plementary resources and their specialisms are summarizedin Table 1 When used collectively these databases provideprospective and existing users of PHI-base with a substan-tially enriched environment to pursue a wide range of simpleto advanced in silico analyses on pathogenic organisms andthe underlying pathogenic processes

NEW FEATURES

An expanded taxonomic range and controlled vocabulary

Version 30 released in 2007 contained information on bac-terial fungal and oomycete pathogens as well as plant en-dophytes Version 36 now also includes pathogenic plantinfecting nematode and aphid pests and animalhumaninfecting parasites (Table 2) Between these versions ofPHI-base the total number of pathogenic species has risenfrom 95 to 160 The number of bacterial pathogens tripledover the same period In addition the number of obligatebiotrophic species has increased from three to seven Tohelp PHI-base users become rapidly familiar with the bi-ology of the wider range of pathogens and pests available afull list of the pathogenic species covered in PHI-base ver-sion 36 is provided in Supplementary Table S1 along withtheir NCBI taxon identifier and both the natural and ex-perimental host(s) The number of documented host speciesnaturally infected by each pathogen and the identity of ob-ligate biotrophs among the species is also described Thislevel of detail is provided to assist users in the selection ofpathogenic species to include in comparative genomic anal-ysis An up-to-date version of Supplementary Table S1 ismaintained on the PHI-base lsquoAboutrsquo website reflecting thedata for each new release

A new addition requested by users is the consistent useof a controlled vocabulary of high-level phenotyping terms(Table 3) Currently nine phenotyping terms are used topermit consistent data retrieval comparative phenomicsacross a wide taxonomic range and statistical analysis Onlyone term is assigned per host-pathogen interaction An in-teraction is defined as the function of one gene on one hostand one tissue type The PHI-base phenotype terms selectedare routinely used in research articles but mapping to GOterms is not supported due to their high-level nature Since2008 several new techniques for investigating gene prod-uct function have become more widely adopted For exam-ple for some obligate plant infecting pathogens including

Blumeria and Puccinia species which infect specific cerealhosts a novel technique called host-induced gene silenc-ing (HIGS) is used In HIGS an antisense construct is ex-pressed from the host species and used to transiently silencea specific pathogen gene during the infection process whichif successful results in an altered phenotypic outcome (23)The eight entries PHI2896 to PHI2903 were obtained forthe Blumeria graminis f sp hordeindashbarley interaction usingthe new HIGS technique

Additional content and species coverage

PHI-base version 36 contains information on 2875 genes4102 interactions 110 host species and 160 pathogenicspecies The pathogen species include 103 plant 3 fun-gal and 54 animal infecting species The organisms in thedatabase cause 181 different diseases and were obtainedfrom 1243 peer-reviewed references The functional geneinformation included was curated from studies publishedbetween 1987 to the end of 2013 Details of the host andpathogen species coverage is given in Table 2 and Supple-mentary Table S1 One-third of the prokaryote interactionsnow involve a human pathogen with the highest numberof 115 interactions from Salmonella species For plant in-fecting bacteria the highest numbers are 300 and 161 in-teractions from Xanthomonas and Pseudomonas species re-spectively The fungal pathogen interactions are dominatedby the Ascomycetes (67 species) followed by the Basid-iomycetes (8 species) providing 2759 and 405 interactionsrespectively The fungal interactions are also predominantlyfrom plant infecting species (2645 interactions) comparedto animalhuman infecting species (519 interactions) Thenumber of interactions from the eight oomycete species isfar lower at 86 which are all from plant infecting speciesThe newly curated plant infecting nematodes and aphidsand animalhuman infecting parasites provide 43 interac-tions from 9 species The new data is summarized by hosttype and pathogen species taxonomy in Table 2 The plantpathogen species providing the greatest number of interac-tions are the cereal infecting fungi Fusarium graminearumMagnaporthe oryzae and Ustilago maydis Xanthomonasbacteria and the dicotyledonous infecting fungus Botrytiscinerea and Pseudomonas bacteria For animalhuman in-fecting species the greatest number of interactions are pro-vided by the fungi Candida albicans and Cryptococcus neo-formans and the bacterium Salmonella entrica (Table 2)

The nine new high-level phenotypic outcome terms aredefined in Table 3 These have been included in the advancedsearch to permit researchers to explore the database acrossa wide range of taxonomically diverse species which exhibitvery varied pathogenic lifestyles Only the entry types lsquoef-fectorrsquo and lsquoenhanced antagonismrsquo are limited to plant in-fecting species In total 84 interactions from a total of 23species have the outcome lsquoincreased virulence (hyperviru-lence)rsquo This expanding number is noteworthy and suggeststhat negative regulation of key pathogenicity processes com-monly occurs during the infection and colonization of bothplant and animal hosts Also of interest are the 1224 inter-actions (298 of the entire database content) with the out-come lsquounaffected pathogenicityrsquo The majority of these havebeen reported for plant pathogens These negative outcomes

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D648 Nucleic Acids Research 2015 Vol 43 Database issue

Table 1 Multispecies databases and websites involving plant human andor animal infecting pathogens which contain information complementary tothe data in PHI-base

Name and refa URL (http) Comments

Broad-Fungal Genome Initiative wwwbroadinstituteorgscientific-communityscienceprojectsfungal-genome-initiative

Genome browsing and comparative analysis forseveral plant pathogen division

DFVF (12) sysbiounleduDFVF Fungal virulence factor database generated usingtext-mining of the PubMed database and Internet

e-Fungi (13) wwwcsmanacuksimcornelleFungi Rich source of ESTs obtained by Sangersequencing

Ensembl Genomes (14) wwwensemblgenomesorg Non-vertebrate species genomes portal with linksto bacteria fungi metazoa plants and protists

Ensembl Bacteria bacteriaensemblorg Genomes of bacterial and archeaEnsembl Fungi fungiensemblorg Genomes of fungal species including fungal

pathogensEnsembl Protists protistsensemblorg Genomes of protist species including

PhytophthoraOomycetes Transcriptomics Database (15) wwweumicrobedborgtranscripts Oomycete genomes and transcriptomics

EuPathDB (16) eupathdborg Human pathogens

FRAC wwwfracinfo All known chemical target sites used commerciallyfor the control of pathogens

FungiDB (17) fungidborg Fungal genomics database providing graphicaltools for data mining

HPIDB (18) agbasemsstateedu Fifteen human virus pathogensndashprotein-proteininteraction data

JGI-MycoCosm (19) genomejgidoegovprogramsfungi A genome portal for 100s of pathogenic andnon-pathogenic fungal species

Pathogen Portal wwwpathogenportalorg Emerging or re-emerging pathogens potentialbiowarfare or bioterrorism pathogens

PHIDIAS (20) wwwphidiasus Medical fungal and bacterial pathogens

PhytoPath wwwphytopathdborg PhytoPathndash32 Fungi 14 Protists 12 bacterialspecies linked to PHI-base

PLEXdb (21) wwwplexdborg Transcriptomics data only on plants pathogensand during interactions

USDA ntars-gringovfungaldatabases Description of all the known hosts of fungi whichinfect plants

VFDB (22) wwwmgcaccnVFs Virulence factors of human and animal bacterialpathogens

aReference provided where available

are usually presumed by the authors to indicate the geneproduct does not have a role in the pathogenic process underinvestigation or has arisen due to genetic redundancy iethe function of a highly homologous gene replaces the func-tion of the missing gene product under experimental evalu-ation In some studies the inclusion of double-gene deletionresults has been able to clarify the situation For examplethe Candida albicans gene PDE1 (PHI857) has been impli-cated in virulence The PDE1 mutant alone is unaffected inpathogenicity However the double-gene deletion of PDE1and PDE2 shows a more severe effect than deletion of thePDE2 (PHI856) gene on its own (24) In Magnaportheoryzae (formerly called M grisea) deletion of the individ-ual genes MoRgs1 (PHI2192) and MoRgs4 (PHI2195) ledto a reduced-virulence phenotype but the double-gene dele-tion rgs1 rgs4 mutant has a more severe lsquoloss of pathogenic-ityrsquo phenotype (25) In the animal pathogen Vibrio choleraethe effect of a triple mutation on biofilm formation andvirulence was used to test the combined function of tatA(PHI2415) tatB (PHI2416) tatC (PHI2417) and revealedthis small gene family was required for virulence in mice(25) Going forward the use of the lsquounaffected pathogenic-ityrsquo category in comparative species analyses will be partic-

ularly informative when the genes involved are present inonly one copy per species This approach will reveal whichgenes function in a species-specific or taxon clade-specificmanner

The high-level phenotypic outcomes for all interactionsare summarized in Table 4 A total of 120 PHI-base ac-cessions have been assigned the high-level phenotypic out-come lsquoEssential (lethal)rsquo In these studies mainly two typesof experimental data were reported First in Aspergillus fu-migatus a promoter replacement strategy was employed toconstruct conditional mutants For these mutants the ad-dition of ammonium into the nitrogen source switches offgene expression and this allows functional gene tests of es-sential genes (26) Secondly in genome-wide gene replace-ment studies in Gibberella zeae no transformants were re-covered in repeated experiments while transformants wererecovered for many other genes Thus authors consideredthat the genersquos function was lsquoessential for lifersquo (78)

A lsquomixed outcomersquo of phenotypes can be assigned whenthe transgenic mutants generated are tested on either multi-ple host species or different tissuesorgans of the same hostspecies Different outcomes on hosts belonging to differentkingdoms potentially indicate a differential host require-

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D649

Table 2 Interactions in PHI-base version 36 grouped by either hostspecies or pathogen species

HostEntry type Interactions

TOTALa 4102PROKARYOTES (55)b 804Animal hosts (16)c 249 (31)

Salmonella spp(3)d 115Plant hosts (29) 555 (69)

Xanthomonas spp (10) 300Pseudomonas spp(7) 161Erwinia amylovora 29Plectobacterium spp (3) 10

EUKARYOTES (105) 3298Animal hosts (20) 549 (166)

Ascomycetes (17) 375Candida spp (5) 238Aspergillus fumigatus 98

Basidiomycetes (4) 144Cryptococcus neoformans 136

Parasitic species (5)e 30Plant hosts (93) 2744 (832)

Ascomycetes (60) 2384Fusaria - cereal infecting (7) 1053Fusarium graminearum 1042Magnaporthe spp(3) 575Botrytis spp(2) 205Fusaria - dicot infecting (6) 93Cochliobolus (5) 88Alternaria spp (4) 78Colletotrichium (9) 48Stagnosporum nodorum 44Zymoseptoria tritici 42

Basidiomycetes (4) 261Ustilago maydis 243Melampsori lini 7

Oomycetes (8) 86Phytophthora spp (5) 53Hyaloperonospora spp(2) 30

Others (4) 13Aphids (2) 10Nematodes (2) 3

Fungal hosts (3) 4Endophyte (1) 5

Epichloe festucae 5

aOnly highly represented taxon groups are listed For a complete list ofspecies in the database see Supplementary Table S1bThe table is divided into prokaryote and eukaryote host species Thespecies count number is listed in bracketscHost species are further divided into animal and plant hostdLeft-indented genera and species infect or belong to taxonomic grouplisted non-indented above Only main representatives organisms are listedeParasitic species are Leishmania infantum L mexicana Toxoplasmagondii Trypanosoma brucei and T cruzi

ment For example Fusarium oxysporum is able to systemi-cally infect tomato plants and immune-compromised miceThe PHI-base entries PHI215 PHI-285 and PHI315 re-veal a differential requirement for cell-signalling and cellwall formation of three genes during the pathogenesis ofplant and animal hosts

Integration with other database sources

PHI-base is a gene-centric database Each gene has its ownPHI-base accession number One advantage of this designis that phenotypic information is directly linked to a spe-cific gene This phenotypic information can then easily bemapped to genomes Additional information such as GO

terms and protein structure information is then extractedfrom other databases In our current curation we priori-tise the use on UniProt accessions (27) to facilitate subse-quent bioinformatics analysis During the curation processour biocurators map reported EMBL or GenBank acces-sions to existing UniProt identifiers where these exist How-ever for species where protein accessions are not availablein UniProt at the time of curation and authors did not pro-vide GenBank accession numbers in their studies only lim-ited or no information on the geneprotein can be providedin PHI-base until this information becomes available

Whole-genome information is increasingly available forplant and animal pathogens We have mapped phenotypesin PHI-base via their gene accessions to reference genomicsequences available in Ensembl Genomes sites for fungiprotist (including oomycetes) and bacteria (26) In total1550 out of 2047 interactions involved in plant pathogene-sis from pathogens with an available reference genome havebeen mapped to Ensembl Genomes The remainder of thePHI-base accessions are either associated with only geneticdata or the genome sequence information is still missingor are associated with previously reported sequences andisolates that differ from those in the published referencegenomes Work is continuing to resolve these cases

Functional analysis of PHI-base accessions

The entire contents of PHI-base are available to users fromthe lsquoDownloadrsquo section where sequence information isavailable for 2527 PHI-base accessions We surveyed thecontent of PHI-base accessions by cataloguing the proteinaccessions using their GO classification using Blast2GOsoftware and standard parameters (28) GO terms were as-signed to 63 of PHI-base accessions (Figure 2) For a totalof 37 (929 proteins) no GO annotation could be madeMany of these accessions are species-specific proteins andare effectors The major GO categories assigned included(i) metabolic processes (ii) cellular processes such as cellcommunication and (iii) single-organism processes suchas cell proliferation filamentous growth and pigmentationMicrobial pigments in pathogens are known to provide pro-tection against ultraviolet radiation host-defence productsand other stresses encountered during host invasion

The category lsquocell killingrsquo was only assigned to six acces-sions and included Pseudomonas effectors and the Vibriocholerae enterotoxin This low number is an unexpected re-sult because for many of the host-pathogen interactions cat-alogued in PHI-base at some point host cell death occursie in interactions involving pathogens with a necrotrophicor hemibiotrophic lifestyle

TECHNICAL DEVELOPMENTS CURATION AND OUT-REACH

Data curation and release management

In the NAR 2008 article (11) we provided the details of thecuration procedure in use This procedure is still in placeHowever due to the increasing volume of literature requir-ing curation (Figure 1) we now use additional proceduresPrimarily papers are found in the literature databases Webof Science and PubMed using the keyword search terms

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D650 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 Definitions for the nine high-level phenotype outcomes used in PHI-base

High-level phenotypeoutcomea Definition

Loss of pathogenicity The transgenic strain fails to cause disease that is observed in the wild type (ie qualitative effect)Reduced virulence The transgenic strain still causes some disease formation but fewer symptoms than the wild-type

strain (ie a quantitative effect) Synonymous with the term reduced aggressivenessUnaffectedpathogenicity

The transgenic strain which expresses altered levels of a specific gene product(s) causes the same levelof disease compared to the wild-type reference strain

Increased virulence(Hypervirulence)

The transgenic strain causes greater incidence or severity of disease than the wild-type strain

Effector (plantavirulencedeterminant)

Some effector genes are required to cause disease on susceptible hosts but most are not A plantpathogen-specific term which was previously referred to as a corresponding avirulence (Avr) geneAn effector gene is formally identified because its presence leads to the direct or indirect recognitionof a pathogen in resistant host genotypes which possess the corresponding disease resistance (R)gene Positive recognition leads to activation of plant defense and the pathogen either fails to causedisease or causes less disease In the absence of the pathogen effector delivery into a healthy plantpossessing the corresponding R gene activates plant defense responses

Lethal The transgenic strain is not viable The gene product is essential for life of the organismEnhanced antagonism The transgenic strain shows greater endophytic biomass in the host andor the formation of visible

disease symptomsResistant to chemical The transgenic strainb grows andor develops normally when exposed to chemistry concentrations

that are detrimental to the wild-type strainSensitive to chemical The transgenic strain which expresses either no or reduced levels of a specific gene product(s) or

possesses a specific gene mutation(s) has the same abilityc as the wild-type strain to grow anddevelop when exposed to detrimental chemistry concentrations

aCompared to wild-type reference strain (ie a direct isogenic strain comparison)bMolecular studies on natural field isolate population are also considered once the natural target site has been identifiedcOn rare occasions increased sensitivity to chemistry has been observed

Table 4 Number of interactions per phenotypic group in animal and plant hosts

Entry type Animal hosta Plant host

Loss of pathogenicity 73 404Reduced virulenceb 542 1056Increased virulence 33 51Essential (lethal) 46 74Unaffected pathogenicityc 80 1144Effector 0 533Enhanced antagonism 0 4Resistance to chemistry 5 30Sensitive to chemistryd 1 7

aAnimal and plant-attacking pathogens are listed with their taxonomy ID and lifestyle in Supplementary Table S1bThe three missing entries in this category have other host typescOne entry in this category has a fungal hostdOne entry in this category has a fish host

(fungor yeast) and (gene or factor) and (pathogenicity orvirulen or avirulence gene) (29) Text mining is not em-ployed due to the fact that relevant information has to beextracted by analysing figures tables and text in the peer-reviewed articles This task can only be done by trainedbiocurators with a strong understanding of the researcharea PHI-base relies heavily on support of the scientificcommunity to suggest relevant articles for curation and forthe subsequent quality control of entries The PHI-baseteam does not have any individual member solely dedicatedto data curation Instead team members curate data on apart-time basis and when the need arises In an effort toclose a curation gap for articles published between 1984and 2014 a collaboration was established with the cura-tion scientists at Molecular Connections Bangalore IndiaThe biocurators give priority to author assigned gene func-tion over computational transferred annotation such asGO terms The author-assigned function is frequently ex-

tracted from either title or abstract Experts from the sci-entific community are invited on a regular basis to verifynew records before uploading into the database and providequality control

Mapping PHI-base phenotypes to Ensembl Genomes

Through the cross-referencing with Ensembl Genomes(httpensemblgenomesorg) PHI-base annotations cannow be visualized directly in their genomic context identi-fying features such as pathogenicity islands through a sim-ple system of colour coding using the nine high-level phe-notyping terms This new way to explore the data in PHI-base is shown in Figure 3 The phenotyping term lsquomixedoutcomersquo is also used to identify genes where a range of in-teraction outcomes have been identified depending on thehost species andor tissue type evaluated

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D651

Figure 2 GO terms assigned to PHI-base accessions in Version 36 mapped to a biological process

APPLICATIONS OF PHI-BASE

PHI-base use has been cited in over 100 peer-reviewedpublications These publications are listed in year orderin the lsquoAboutrsquo section of the database Recently publishedPHI-base use cases include genome mining and compar-ative genomics (3031) the selection and functional test-ing of candidate virulence factors in newly sequenced fun-gal and nematode pathogens of agricultural importance(1011) and studies investigating the subtle differences be-tween pathogen and biocontrol species (23) In Table 5 themain uses of PHI-base are given along with literature ex-amples (1430ndash46) In the past 4 years we have observeda gradual shift in PHI-base use with an increase in thenumber of larger comparative gene function studies and in-vestigations reporting the in silico prediction of virulence-associated genes

FORTHCOMING DEVELOPMENTS

Tools for community-led curation

A big challenge facing all biological databases is the grow-ing quantity of data and the relative difficulty of obtain-ing resources to curate the knowledge that derives fromit For the pathogen-host interaction community the scaleof the problem is considerable (Figure 1) One solution isto encourage community-based curation particularly bythe authors of scientific publications who may be moti-vated to have their work correctly represented within thedatabase and who are the experts in their own specialist do-mains (although they may not be expert in the conventionsin use within the database) Inclusion of studies in PHI-base also improves their visibility and accessibility PHI-base has a curation model based on community contribu-tion although hitherto this has involved certain collabo-rators curating many papers in their own area of expertise

after prior training in the data entry tools A more scalablemodel would allow all users to directly curate their own pa-pers without prior training A new easy-to-use web-basedinterface for direct access by the wider community is cur-rently in development

The PHI-base web-based curation tool will facilitate cu-ration of pathogen-host interactions from peer reviewed lit-erature into PHI-base by the authors doing the experimen-tal analyses This curation tool will be based on the recentlydeveloped Canto tool an online tool that supports func-tional gene annotation (47) Canto is part of the GenericModel Organism Database project which provides a suiteof open software for managing genetic data (httpwwwgmodorg) Canto has proven effective for the community-based curation of data for the fission yeast database Pom-Base (httpwwwpombaseorg) (48) The PHI-base cura-tion tool will use ontological data from a variety of sourcesmost notably from the Open Biological and Biomedical On-tologies Foundry (httpwwwobofoundryorg) (49) How-ever some terminology is specific to the nature of the inter-actions captured in PHI-base so will require the develop-ment of new controlled vocabularies for this purpose Forexample an lsquointeraction evidencersquo ontology will be createdto specify the evidence for pathogen-host interactions thuscomplimenting the gene-centric data from the GO Also inaddition to the controlled high level vocabulary above de-scribing the phenotype of the pathogen (Table 4) a similarcontrolled vocabulary can be created to describe the affectthe interaction has on the host organism To ensure qualityand consistency of the curated data all annotations will stillbe approved by a curator or expert with knowledge of thespecies and the captured data

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D652 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 3 Inspection of gene function using the Ensembl genome browser (A) Displayed is a small chromosomal region in Magnaporthe oryzae showingtwo genes involved in pathogenicity (as annotated in PHI-base) in their genomic context (viewable in the Ensembl browser in the transcript display) Acolour code indicates the annotated role of each gene green lsquoloss of pathogenicityrsquo and orange lsquoreduced virulencersquo (B) By selecting each colour-codedMGG transcript ID information is revealed on the associated gene deletion study curated in the PHI-base database

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D653

Table 5 PHI-base uses that have often appeared in the peer-reviewed literature

Use case Type of research studyExamplereference

1 Annotation and candidate gene selectionLarge scale forward genetics screens (3233)Transcriptome studies (RNAseq microarrays ESTs) (343645)Full and partial genome annotation genome mining (303137)

2 Predictive bioinformatics analyses Networks protein-protein interaction mapping (353839)3 Complementary databases (144041)4 Review articles (424446)5 Single gene function studies

Inter-comparisonand inter-comparison of gene mutants within and between species (43)

Tools for data mining

We are currently developing a new tool for the analysisand extraction of whole genomic data from plant pathogensas part of the PhytoPath project (httpwwwphytopathdborg) using the data warehousing framework BioMart (50)allowing users to mine genomic data (for sequence and an-notation) across multiple species based on PHI-base anno-tations in conjunction with other annotations The tool isexpected to be launched before the end of 2015

Other activities

Our intention is to extend the taxonomic range availablewithin PHI-base to sim200 host-infecting species within thenext 2 years At this level of species coverage detailed analy-ses within and between specific groups of pathogens withdifferent infection strategies host ranges taxonomic as-signments or between pathogenic and closely related non-pathogenic endophytic or symbiotic lifestyles should befeasible

In the next phase of curation a greater emphasis will beplaced on the effector literature which should increase thenumber of interactions from bacterial oomycete and ob-ligate biotrophic species To accompany this developmentthe curation of the corresponding host target(s) ie initialmolecular partner in the host has commenced and this im-portant information should soon be available For examplevarious bacterial effectors including AvrRpm1 (PHI977)and AvrRpt2 (PHI979) are delivered into the plant cyto-plasm via the bacterial type III secretion system These ef-fectors interact with the Arabidopsis protein RIN4 (51)These protein interaction data sets are of growing impor-tance in the analysis of host-pathogen host-pest and host-parasite interactions as they typically represent communi-cation events that have co-evolved between biological king-doms For example Arabidopsis mutants harbouring T-DNA insertions within different host targets of specific ef-fectors were found to exhibit an enhanced disease resistance(edr) phenotype to both powdery and downy mildews (52)In addition by the inclusion of the corresponding host tar-gets more effectors from obligate biotrophic species canbe curated into PHI-base These effectors are rigorouslytested for their role in pathogenicity using a range of othertechniques but not those involving the generation of stablepathogen transformants

Database access and feedback

PHI-base can be freely accessed at httpwwwphi-baseorg The complete database can be downloaded fromthe lsquoDownloadrsquo section Prior to downloading the entiredatabase to create local Basic Local Alignment Search Tooldatabases or for other bioinformatics applications (Table 5)users are asked to fill in a registration form This allowsPHI-base to monitor the number of academic and indus-trial users a requirement by our sponsors

User support can be obtained from this emailcontactphi-baseorg Please use this email address ifyou wish to provide new data for inclusion in PHI-baseif you are an expert willing to assist with curation for thenomination of peer-reviewed papers to be curated or if youcan provide suggestions for improvement to the PHI-basewebsite

To increase the awareness of PHI-base developmentswithin the community and for users to be notified whennew releases occur we have developed a PHI-base usermailing list (userslistsphi-baseorg) Users can subscribefrom a link on the PHI-base website in the lsquoHelprsquo sectionor directly by going to httpswwwlistsrothamstedacukmailmanlistinfousers

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors would like to thank all the species experts whocontributed database annotations from their field of ex-pertise into PHI-base and the members of our ScientificAdvisory Board Dr Michael Csukai (Syngenta JealottrsquosHill UK) Professor Jonathan Jones (The Sainsbury Lab-oratory UK) Dr Leighton Pritchard (The James HuttonUK) and Professor Pietro Spanu (Imperial College UK)Dr Jan Taubert (formerly Rothamsted Research) is thankedfor his expert support in maintaining the PHI-base parserWe thank Drs Paul Kersey Uma Maheswari and DanStaines at the European Bioinformatics Institute (Cam-bridge) for helpful discussions and for significantly improv-ing the pathogen species content within Ensembl Genomes

FUNDING

This work is supported by the UK Biotechnologyand Biological Sciences Research Council (BBSRC)

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 3: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

Nucleic Acids Research 2015 Vol 43 Database issue D647

2006 (10) A second NAR article in 2008 reviewed ad-ditional data and new features available within PHI-baseversion 30 (11) Since then usage of PHI-base has grownand the PHI-base website receives about 1500 hits perquarter excluding internal users with users located insim89 countries Several other databases provide informa-tion which partially overlap with either the species dataor biological information provided within PHI-base Theseresources include the Fungal Virulence Factor Database(DFVF) (12) the e-Fungi project (13) Ensembl Genomes(14) the Oomycetes Transcriptomics Database (15) the Eu-karyotic Pathogen Database Resources (EuPathDB) (16)FungiDB (17) the Host-Pathogen Interaction databaseon human viruses (HPIDB) (18) JGI-MycoCosm (19)PHIDIAS (20) PLEXdb (21) and the database on virulencefactors of pathogenic bacteria (VFDB) (22) These com-plementary resources and their specialisms are summarizedin Table 1 When used collectively these databases provideprospective and existing users of PHI-base with a substan-tially enriched environment to pursue a wide range of simpleto advanced in silico analyses on pathogenic organisms andthe underlying pathogenic processes

NEW FEATURES

An expanded taxonomic range and controlled vocabulary

Version 30 released in 2007 contained information on bac-terial fungal and oomycete pathogens as well as plant en-dophytes Version 36 now also includes pathogenic plantinfecting nematode and aphid pests and animalhumaninfecting parasites (Table 2) Between these versions ofPHI-base the total number of pathogenic species has risenfrom 95 to 160 The number of bacterial pathogens tripledover the same period In addition the number of obligatebiotrophic species has increased from three to seven Tohelp PHI-base users become rapidly familiar with the bi-ology of the wider range of pathogens and pests available afull list of the pathogenic species covered in PHI-base ver-sion 36 is provided in Supplementary Table S1 along withtheir NCBI taxon identifier and both the natural and ex-perimental host(s) The number of documented host speciesnaturally infected by each pathogen and the identity of ob-ligate biotrophs among the species is also described Thislevel of detail is provided to assist users in the selection ofpathogenic species to include in comparative genomic anal-ysis An up-to-date version of Supplementary Table S1 ismaintained on the PHI-base lsquoAboutrsquo website reflecting thedata for each new release

A new addition requested by users is the consistent useof a controlled vocabulary of high-level phenotyping terms(Table 3) Currently nine phenotyping terms are used topermit consistent data retrieval comparative phenomicsacross a wide taxonomic range and statistical analysis Onlyone term is assigned per host-pathogen interaction An in-teraction is defined as the function of one gene on one hostand one tissue type The PHI-base phenotype terms selectedare routinely used in research articles but mapping to GOterms is not supported due to their high-level nature Since2008 several new techniques for investigating gene prod-uct function have become more widely adopted For exam-ple for some obligate plant infecting pathogens including

Blumeria and Puccinia species which infect specific cerealhosts a novel technique called host-induced gene silenc-ing (HIGS) is used In HIGS an antisense construct is ex-pressed from the host species and used to transiently silencea specific pathogen gene during the infection process whichif successful results in an altered phenotypic outcome (23)The eight entries PHI2896 to PHI2903 were obtained forthe Blumeria graminis f sp hordeindashbarley interaction usingthe new HIGS technique

Additional content and species coverage

PHI-base version 36 contains information on 2875 genes4102 interactions 110 host species and 160 pathogenicspecies The pathogen species include 103 plant 3 fun-gal and 54 animal infecting species The organisms in thedatabase cause 181 different diseases and were obtainedfrom 1243 peer-reviewed references The functional geneinformation included was curated from studies publishedbetween 1987 to the end of 2013 Details of the host andpathogen species coverage is given in Table 2 and Supple-mentary Table S1 One-third of the prokaryote interactionsnow involve a human pathogen with the highest numberof 115 interactions from Salmonella species For plant in-fecting bacteria the highest numbers are 300 and 161 in-teractions from Xanthomonas and Pseudomonas species re-spectively The fungal pathogen interactions are dominatedby the Ascomycetes (67 species) followed by the Basid-iomycetes (8 species) providing 2759 and 405 interactionsrespectively The fungal interactions are also predominantlyfrom plant infecting species (2645 interactions) comparedto animalhuman infecting species (519 interactions) Thenumber of interactions from the eight oomycete species isfar lower at 86 which are all from plant infecting speciesThe newly curated plant infecting nematodes and aphidsand animalhuman infecting parasites provide 43 interac-tions from 9 species The new data is summarized by hosttype and pathogen species taxonomy in Table 2 The plantpathogen species providing the greatest number of interac-tions are the cereal infecting fungi Fusarium graminearumMagnaporthe oryzae and Ustilago maydis Xanthomonasbacteria and the dicotyledonous infecting fungus Botrytiscinerea and Pseudomonas bacteria For animalhuman in-fecting species the greatest number of interactions are pro-vided by the fungi Candida albicans and Cryptococcus neo-formans and the bacterium Salmonella entrica (Table 2)

The nine new high-level phenotypic outcome terms aredefined in Table 3 These have been included in the advancedsearch to permit researchers to explore the database acrossa wide range of taxonomically diverse species which exhibitvery varied pathogenic lifestyles Only the entry types lsquoef-fectorrsquo and lsquoenhanced antagonismrsquo are limited to plant in-fecting species In total 84 interactions from a total of 23species have the outcome lsquoincreased virulence (hyperviru-lence)rsquo This expanding number is noteworthy and suggeststhat negative regulation of key pathogenicity processes com-monly occurs during the infection and colonization of bothplant and animal hosts Also of interest are the 1224 inter-actions (298 of the entire database content) with the out-come lsquounaffected pathogenicityrsquo The majority of these havebeen reported for plant pathogens These negative outcomes

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D648 Nucleic Acids Research 2015 Vol 43 Database issue

Table 1 Multispecies databases and websites involving plant human andor animal infecting pathogens which contain information complementary tothe data in PHI-base

Name and refa URL (http) Comments

Broad-Fungal Genome Initiative wwwbroadinstituteorgscientific-communityscienceprojectsfungal-genome-initiative

Genome browsing and comparative analysis forseveral plant pathogen division

DFVF (12) sysbiounleduDFVF Fungal virulence factor database generated usingtext-mining of the PubMed database and Internet

e-Fungi (13) wwwcsmanacuksimcornelleFungi Rich source of ESTs obtained by Sangersequencing

Ensembl Genomes (14) wwwensemblgenomesorg Non-vertebrate species genomes portal with linksto bacteria fungi metazoa plants and protists

Ensembl Bacteria bacteriaensemblorg Genomes of bacterial and archeaEnsembl Fungi fungiensemblorg Genomes of fungal species including fungal

pathogensEnsembl Protists protistsensemblorg Genomes of protist species including

PhytophthoraOomycetes Transcriptomics Database (15) wwweumicrobedborgtranscripts Oomycete genomes and transcriptomics

EuPathDB (16) eupathdborg Human pathogens

FRAC wwwfracinfo All known chemical target sites used commerciallyfor the control of pathogens

FungiDB (17) fungidborg Fungal genomics database providing graphicaltools for data mining

HPIDB (18) agbasemsstateedu Fifteen human virus pathogensndashprotein-proteininteraction data

JGI-MycoCosm (19) genomejgidoegovprogramsfungi A genome portal for 100s of pathogenic andnon-pathogenic fungal species

Pathogen Portal wwwpathogenportalorg Emerging or re-emerging pathogens potentialbiowarfare or bioterrorism pathogens

PHIDIAS (20) wwwphidiasus Medical fungal and bacterial pathogens

PhytoPath wwwphytopathdborg PhytoPathndash32 Fungi 14 Protists 12 bacterialspecies linked to PHI-base

PLEXdb (21) wwwplexdborg Transcriptomics data only on plants pathogensand during interactions

USDA ntars-gringovfungaldatabases Description of all the known hosts of fungi whichinfect plants

VFDB (22) wwwmgcaccnVFs Virulence factors of human and animal bacterialpathogens

aReference provided where available

are usually presumed by the authors to indicate the geneproduct does not have a role in the pathogenic process underinvestigation or has arisen due to genetic redundancy iethe function of a highly homologous gene replaces the func-tion of the missing gene product under experimental evalu-ation In some studies the inclusion of double-gene deletionresults has been able to clarify the situation For examplethe Candida albicans gene PDE1 (PHI857) has been impli-cated in virulence The PDE1 mutant alone is unaffected inpathogenicity However the double-gene deletion of PDE1and PDE2 shows a more severe effect than deletion of thePDE2 (PHI856) gene on its own (24) In Magnaportheoryzae (formerly called M grisea) deletion of the individ-ual genes MoRgs1 (PHI2192) and MoRgs4 (PHI2195) ledto a reduced-virulence phenotype but the double-gene dele-tion rgs1 rgs4 mutant has a more severe lsquoloss of pathogenic-ityrsquo phenotype (25) In the animal pathogen Vibrio choleraethe effect of a triple mutation on biofilm formation andvirulence was used to test the combined function of tatA(PHI2415) tatB (PHI2416) tatC (PHI2417) and revealedthis small gene family was required for virulence in mice(25) Going forward the use of the lsquounaffected pathogenic-ityrsquo category in comparative species analyses will be partic-

ularly informative when the genes involved are present inonly one copy per species This approach will reveal whichgenes function in a species-specific or taxon clade-specificmanner

The high-level phenotypic outcomes for all interactionsare summarized in Table 4 A total of 120 PHI-base ac-cessions have been assigned the high-level phenotypic out-come lsquoEssential (lethal)rsquo In these studies mainly two typesof experimental data were reported First in Aspergillus fu-migatus a promoter replacement strategy was employed toconstruct conditional mutants For these mutants the ad-dition of ammonium into the nitrogen source switches offgene expression and this allows functional gene tests of es-sential genes (26) Secondly in genome-wide gene replace-ment studies in Gibberella zeae no transformants were re-covered in repeated experiments while transformants wererecovered for many other genes Thus authors consideredthat the genersquos function was lsquoessential for lifersquo (78)

A lsquomixed outcomersquo of phenotypes can be assigned whenthe transgenic mutants generated are tested on either multi-ple host species or different tissuesorgans of the same hostspecies Different outcomes on hosts belonging to differentkingdoms potentially indicate a differential host require-

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D649

Table 2 Interactions in PHI-base version 36 grouped by either hostspecies or pathogen species

HostEntry type Interactions

TOTALa 4102PROKARYOTES (55)b 804Animal hosts (16)c 249 (31)

Salmonella spp(3)d 115Plant hosts (29) 555 (69)

Xanthomonas spp (10) 300Pseudomonas spp(7) 161Erwinia amylovora 29Plectobacterium spp (3) 10

EUKARYOTES (105) 3298Animal hosts (20) 549 (166)

Ascomycetes (17) 375Candida spp (5) 238Aspergillus fumigatus 98

Basidiomycetes (4) 144Cryptococcus neoformans 136

Parasitic species (5)e 30Plant hosts (93) 2744 (832)

Ascomycetes (60) 2384Fusaria - cereal infecting (7) 1053Fusarium graminearum 1042Magnaporthe spp(3) 575Botrytis spp(2) 205Fusaria - dicot infecting (6) 93Cochliobolus (5) 88Alternaria spp (4) 78Colletotrichium (9) 48Stagnosporum nodorum 44Zymoseptoria tritici 42

Basidiomycetes (4) 261Ustilago maydis 243Melampsori lini 7

Oomycetes (8) 86Phytophthora spp (5) 53Hyaloperonospora spp(2) 30

Others (4) 13Aphids (2) 10Nematodes (2) 3

Fungal hosts (3) 4Endophyte (1) 5

Epichloe festucae 5

aOnly highly represented taxon groups are listed For a complete list ofspecies in the database see Supplementary Table S1bThe table is divided into prokaryote and eukaryote host species Thespecies count number is listed in bracketscHost species are further divided into animal and plant hostdLeft-indented genera and species infect or belong to taxonomic grouplisted non-indented above Only main representatives organisms are listedeParasitic species are Leishmania infantum L mexicana Toxoplasmagondii Trypanosoma brucei and T cruzi

ment For example Fusarium oxysporum is able to systemi-cally infect tomato plants and immune-compromised miceThe PHI-base entries PHI215 PHI-285 and PHI315 re-veal a differential requirement for cell-signalling and cellwall formation of three genes during the pathogenesis ofplant and animal hosts

Integration with other database sources

PHI-base is a gene-centric database Each gene has its ownPHI-base accession number One advantage of this designis that phenotypic information is directly linked to a spe-cific gene This phenotypic information can then easily bemapped to genomes Additional information such as GO

terms and protein structure information is then extractedfrom other databases In our current curation we priori-tise the use on UniProt accessions (27) to facilitate subse-quent bioinformatics analysis During the curation processour biocurators map reported EMBL or GenBank acces-sions to existing UniProt identifiers where these exist How-ever for species where protein accessions are not availablein UniProt at the time of curation and authors did not pro-vide GenBank accession numbers in their studies only lim-ited or no information on the geneprotein can be providedin PHI-base until this information becomes available

Whole-genome information is increasingly available forplant and animal pathogens We have mapped phenotypesin PHI-base via their gene accessions to reference genomicsequences available in Ensembl Genomes sites for fungiprotist (including oomycetes) and bacteria (26) In total1550 out of 2047 interactions involved in plant pathogene-sis from pathogens with an available reference genome havebeen mapped to Ensembl Genomes The remainder of thePHI-base accessions are either associated with only geneticdata or the genome sequence information is still missingor are associated with previously reported sequences andisolates that differ from those in the published referencegenomes Work is continuing to resolve these cases

Functional analysis of PHI-base accessions

The entire contents of PHI-base are available to users fromthe lsquoDownloadrsquo section where sequence information isavailable for 2527 PHI-base accessions We surveyed thecontent of PHI-base accessions by cataloguing the proteinaccessions using their GO classification using Blast2GOsoftware and standard parameters (28) GO terms were as-signed to 63 of PHI-base accessions (Figure 2) For a totalof 37 (929 proteins) no GO annotation could be madeMany of these accessions are species-specific proteins andare effectors The major GO categories assigned included(i) metabolic processes (ii) cellular processes such as cellcommunication and (iii) single-organism processes suchas cell proliferation filamentous growth and pigmentationMicrobial pigments in pathogens are known to provide pro-tection against ultraviolet radiation host-defence productsand other stresses encountered during host invasion

The category lsquocell killingrsquo was only assigned to six acces-sions and included Pseudomonas effectors and the Vibriocholerae enterotoxin This low number is an unexpected re-sult because for many of the host-pathogen interactions cat-alogued in PHI-base at some point host cell death occursie in interactions involving pathogens with a necrotrophicor hemibiotrophic lifestyle

TECHNICAL DEVELOPMENTS CURATION AND OUT-REACH

Data curation and release management

In the NAR 2008 article (11) we provided the details of thecuration procedure in use This procedure is still in placeHowever due to the increasing volume of literature requir-ing curation (Figure 1) we now use additional proceduresPrimarily papers are found in the literature databases Webof Science and PubMed using the keyword search terms

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D650 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 Definitions for the nine high-level phenotype outcomes used in PHI-base

High-level phenotypeoutcomea Definition

Loss of pathogenicity The transgenic strain fails to cause disease that is observed in the wild type (ie qualitative effect)Reduced virulence The transgenic strain still causes some disease formation but fewer symptoms than the wild-type

strain (ie a quantitative effect) Synonymous with the term reduced aggressivenessUnaffectedpathogenicity

The transgenic strain which expresses altered levels of a specific gene product(s) causes the same levelof disease compared to the wild-type reference strain

Increased virulence(Hypervirulence)

The transgenic strain causes greater incidence or severity of disease than the wild-type strain

Effector (plantavirulencedeterminant)

Some effector genes are required to cause disease on susceptible hosts but most are not A plantpathogen-specific term which was previously referred to as a corresponding avirulence (Avr) geneAn effector gene is formally identified because its presence leads to the direct or indirect recognitionof a pathogen in resistant host genotypes which possess the corresponding disease resistance (R)gene Positive recognition leads to activation of plant defense and the pathogen either fails to causedisease or causes less disease In the absence of the pathogen effector delivery into a healthy plantpossessing the corresponding R gene activates plant defense responses

Lethal The transgenic strain is not viable The gene product is essential for life of the organismEnhanced antagonism The transgenic strain shows greater endophytic biomass in the host andor the formation of visible

disease symptomsResistant to chemical The transgenic strainb grows andor develops normally when exposed to chemistry concentrations

that are detrimental to the wild-type strainSensitive to chemical The transgenic strain which expresses either no or reduced levels of a specific gene product(s) or

possesses a specific gene mutation(s) has the same abilityc as the wild-type strain to grow anddevelop when exposed to detrimental chemistry concentrations

aCompared to wild-type reference strain (ie a direct isogenic strain comparison)bMolecular studies on natural field isolate population are also considered once the natural target site has been identifiedcOn rare occasions increased sensitivity to chemistry has been observed

Table 4 Number of interactions per phenotypic group in animal and plant hosts

Entry type Animal hosta Plant host

Loss of pathogenicity 73 404Reduced virulenceb 542 1056Increased virulence 33 51Essential (lethal) 46 74Unaffected pathogenicityc 80 1144Effector 0 533Enhanced antagonism 0 4Resistance to chemistry 5 30Sensitive to chemistryd 1 7

aAnimal and plant-attacking pathogens are listed with their taxonomy ID and lifestyle in Supplementary Table S1bThe three missing entries in this category have other host typescOne entry in this category has a fungal hostdOne entry in this category has a fish host

(fungor yeast) and (gene or factor) and (pathogenicity orvirulen or avirulence gene) (29) Text mining is not em-ployed due to the fact that relevant information has to beextracted by analysing figures tables and text in the peer-reviewed articles This task can only be done by trainedbiocurators with a strong understanding of the researcharea PHI-base relies heavily on support of the scientificcommunity to suggest relevant articles for curation and forthe subsequent quality control of entries The PHI-baseteam does not have any individual member solely dedicatedto data curation Instead team members curate data on apart-time basis and when the need arises In an effort toclose a curation gap for articles published between 1984and 2014 a collaboration was established with the cura-tion scientists at Molecular Connections Bangalore IndiaThe biocurators give priority to author assigned gene func-tion over computational transferred annotation such asGO terms The author-assigned function is frequently ex-

tracted from either title or abstract Experts from the sci-entific community are invited on a regular basis to verifynew records before uploading into the database and providequality control

Mapping PHI-base phenotypes to Ensembl Genomes

Through the cross-referencing with Ensembl Genomes(httpensemblgenomesorg) PHI-base annotations cannow be visualized directly in their genomic context identi-fying features such as pathogenicity islands through a sim-ple system of colour coding using the nine high-level phe-notyping terms This new way to explore the data in PHI-base is shown in Figure 3 The phenotyping term lsquomixedoutcomersquo is also used to identify genes where a range of in-teraction outcomes have been identified depending on thehost species andor tissue type evaluated

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D651

Figure 2 GO terms assigned to PHI-base accessions in Version 36 mapped to a biological process

APPLICATIONS OF PHI-BASE

PHI-base use has been cited in over 100 peer-reviewedpublications These publications are listed in year orderin the lsquoAboutrsquo section of the database Recently publishedPHI-base use cases include genome mining and compar-ative genomics (3031) the selection and functional test-ing of candidate virulence factors in newly sequenced fun-gal and nematode pathogens of agricultural importance(1011) and studies investigating the subtle differences be-tween pathogen and biocontrol species (23) In Table 5 themain uses of PHI-base are given along with literature ex-amples (1430ndash46) In the past 4 years we have observeda gradual shift in PHI-base use with an increase in thenumber of larger comparative gene function studies and in-vestigations reporting the in silico prediction of virulence-associated genes

FORTHCOMING DEVELOPMENTS

Tools for community-led curation

A big challenge facing all biological databases is the grow-ing quantity of data and the relative difficulty of obtain-ing resources to curate the knowledge that derives fromit For the pathogen-host interaction community the scaleof the problem is considerable (Figure 1) One solution isto encourage community-based curation particularly bythe authors of scientific publications who may be moti-vated to have their work correctly represented within thedatabase and who are the experts in their own specialist do-mains (although they may not be expert in the conventionsin use within the database) Inclusion of studies in PHI-base also improves their visibility and accessibility PHI-base has a curation model based on community contribu-tion although hitherto this has involved certain collabo-rators curating many papers in their own area of expertise

after prior training in the data entry tools A more scalablemodel would allow all users to directly curate their own pa-pers without prior training A new easy-to-use web-basedinterface for direct access by the wider community is cur-rently in development

The PHI-base web-based curation tool will facilitate cu-ration of pathogen-host interactions from peer reviewed lit-erature into PHI-base by the authors doing the experimen-tal analyses This curation tool will be based on the recentlydeveloped Canto tool an online tool that supports func-tional gene annotation (47) Canto is part of the GenericModel Organism Database project which provides a suiteof open software for managing genetic data (httpwwwgmodorg) Canto has proven effective for the community-based curation of data for the fission yeast database Pom-Base (httpwwwpombaseorg) (48) The PHI-base cura-tion tool will use ontological data from a variety of sourcesmost notably from the Open Biological and Biomedical On-tologies Foundry (httpwwwobofoundryorg) (49) How-ever some terminology is specific to the nature of the inter-actions captured in PHI-base so will require the develop-ment of new controlled vocabularies for this purpose Forexample an lsquointeraction evidencersquo ontology will be createdto specify the evidence for pathogen-host interactions thuscomplimenting the gene-centric data from the GO Also inaddition to the controlled high level vocabulary above de-scribing the phenotype of the pathogen (Table 4) a similarcontrolled vocabulary can be created to describe the affectthe interaction has on the host organism To ensure qualityand consistency of the curated data all annotations will stillbe approved by a curator or expert with knowledge of thespecies and the captured data

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D652 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 3 Inspection of gene function using the Ensembl genome browser (A) Displayed is a small chromosomal region in Magnaporthe oryzae showingtwo genes involved in pathogenicity (as annotated in PHI-base) in their genomic context (viewable in the Ensembl browser in the transcript display) Acolour code indicates the annotated role of each gene green lsquoloss of pathogenicityrsquo and orange lsquoreduced virulencersquo (B) By selecting each colour-codedMGG transcript ID information is revealed on the associated gene deletion study curated in the PHI-base database

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D653

Table 5 PHI-base uses that have often appeared in the peer-reviewed literature

Use case Type of research studyExamplereference

1 Annotation and candidate gene selectionLarge scale forward genetics screens (3233)Transcriptome studies (RNAseq microarrays ESTs) (343645)Full and partial genome annotation genome mining (303137)

2 Predictive bioinformatics analyses Networks protein-protein interaction mapping (353839)3 Complementary databases (144041)4 Review articles (424446)5 Single gene function studies

Inter-comparisonand inter-comparison of gene mutants within and between species (43)

Tools for data mining

We are currently developing a new tool for the analysisand extraction of whole genomic data from plant pathogensas part of the PhytoPath project (httpwwwphytopathdborg) using the data warehousing framework BioMart (50)allowing users to mine genomic data (for sequence and an-notation) across multiple species based on PHI-base anno-tations in conjunction with other annotations The tool isexpected to be launched before the end of 2015

Other activities

Our intention is to extend the taxonomic range availablewithin PHI-base to sim200 host-infecting species within thenext 2 years At this level of species coverage detailed analy-ses within and between specific groups of pathogens withdifferent infection strategies host ranges taxonomic as-signments or between pathogenic and closely related non-pathogenic endophytic or symbiotic lifestyles should befeasible

In the next phase of curation a greater emphasis will beplaced on the effector literature which should increase thenumber of interactions from bacterial oomycete and ob-ligate biotrophic species To accompany this developmentthe curation of the corresponding host target(s) ie initialmolecular partner in the host has commenced and this im-portant information should soon be available For examplevarious bacterial effectors including AvrRpm1 (PHI977)and AvrRpt2 (PHI979) are delivered into the plant cyto-plasm via the bacterial type III secretion system These ef-fectors interact with the Arabidopsis protein RIN4 (51)These protein interaction data sets are of growing impor-tance in the analysis of host-pathogen host-pest and host-parasite interactions as they typically represent communi-cation events that have co-evolved between biological king-doms For example Arabidopsis mutants harbouring T-DNA insertions within different host targets of specific ef-fectors were found to exhibit an enhanced disease resistance(edr) phenotype to both powdery and downy mildews (52)In addition by the inclusion of the corresponding host tar-gets more effectors from obligate biotrophic species canbe curated into PHI-base These effectors are rigorouslytested for their role in pathogenicity using a range of othertechniques but not those involving the generation of stablepathogen transformants

Database access and feedback

PHI-base can be freely accessed at httpwwwphi-baseorg The complete database can be downloaded fromthe lsquoDownloadrsquo section Prior to downloading the entiredatabase to create local Basic Local Alignment Search Tooldatabases or for other bioinformatics applications (Table 5)users are asked to fill in a registration form This allowsPHI-base to monitor the number of academic and indus-trial users a requirement by our sponsors

User support can be obtained from this emailcontactphi-baseorg Please use this email address ifyou wish to provide new data for inclusion in PHI-baseif you are an expert willing to assist with curation for thenomination of peer-reviewed papers to be curated or if youcan provide suggestions for improvement to the PHI-basewebsite

To increase the awareness of PHI-base developmentswithin the community and for users to be notified whennew releases occur we have developed a PHI-base usermailing list (userslistsphi-baseorg) Users can subscribefrom a link on the PHI-base website in the lsquoHelprsquo sectionor directly by going to httpswwwlistsrothamstedacukmailmanlistinfousers

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors would like to thank all the species experts whocontributed database annotations from their field of ex-pertise into PHI-base and the members of our ScientificAdvisory Board Dr Michael Csukai (Syngenta JealottrsquosHill UK) Professor Jonathan Jones (The Sainsbury Lab-oratory UK) Dr Leighton Pritchard (The James HuttonUK) and Professor Pietro Spanu (Imperial College UK)Dr Jan Taubert (formerly Rothamsted Research) is thankedfor his expert support in maintaining the PHI-base parserWe thank Drs Paul Kersey Uma Maheswari and DanStaines at the European Bioinformatics Institute (Cam-bridge) for helpful discussions and for significantly improv-ing the pathogen species content within Ensembl Genomes

FUNDING

This work is supported by the UK Biotechnologyand Biological Sciences Research Council (BBSRC)

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 4: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

D648 Nucleic Acids Research 2015 Vol 43 Database issue

Table 1 Multispecies databases and websites involving plant human andor animal infecting pathogens which contain information complementary tothe data in PHI-base

Name and refa URL (http) Comments

Broad-Fungal Genome Initiative wwwbroadinstituteorgscientific-communityscienceprojectsfungal-genome-initiative

Genome browsing and comparative analysis forseveral plant pathogen division

DFVF (12) sysbiounleduDFVF Fungal virulence factor database generated usingtext-mining of the PubMed database and Internet

e-Fungi (13) wwwcsmanacuksimcornelleFungi Rich source of ESTs obtained by Sangersequencing

Ensembl Genomes (14) wwwensemblgenomesorg Non-vertebrate species genomes portal with linksto bacteria fungi metazoa plants and protists

Ensembl Bacteria bacteriaensemblorg Genomes of bacterial and archeaEnsembl Fungi fungiensemblorg Genomes of fungal species including fungal

pathogensEnsembl Protists protistsensemblorg Genomes of protist species including

PhytophthoraOomycetes Transcriptomics Database (15) wwweumicrobedborgtranscripts Oomycete genomes and transcriptomics

EuPathDB (16) eupathdborg Human pathogens

FRAC wwwfracinfo All known chemical target sites used commerciallyfor the control of pathogens

FungiDB (17) fungidborg Fungal genomics database providing graphicaltools for data mining

HPIDB (18) agbasemsstateedu Fifteen human virus pathogensndashprotein-proteininteraction data

JGI-MycoCosm (19) genomejgidoegovprogramsfungi A genome portal for 100s of pathogenic andnon-pathogenic fungal species

Pathogen Portal wwwpathogenportalorg Emerging or re-emerging pathogens potentialbiowarfare or bioterrorism pathogens

PHIDIAS (20) wwwphidiasus Medical fungal and bacterial pathogens

PhytoPath wwwphytopathdborg PhytoPathndash32 Fungi 14 Protists 12 bacterialspecies linked to PHI-base

PLEXdb (21) wwwplexdborg Transcriptomics data only on plants pathogensand during interactions

USDA ntars-gringovfungaldatabases Description of all the known hosts of fungi whichinfect plants

VFDB (22) wwwmgcaccnVFs Virulence factors of human and animal bacterialpathogens

aReference provided where available

are usually presumed by the authors to indicate the geneproduct does not have a role in the pathogenic process underinvestigation or has arisen due to genetic redundancy iethe function of a highly homologous gene replaces the func-tion of the missing gene product under experimental evalu-ation In some studies the inclusion of double-gene deletionresults has been able to clarify the situation For examplethe Candida albicans gene PDE1 (PHI857) has been impli-cated in virulence The PDE1 mutant alone is unaffected inpathogenicity However the double-gene deletion of PDE1and PDE2 shows a more severe effect than deletion of thePDE2 (PHI856) gene on its own (24) In Magnaportheoryzae (formerly called M grisea) deletion of the individ-ual genes MoRgs1 (PHI2192) and MoRgs4 (PHI2195) ledto a reduced-virulence phenotype but the double-gene dele-tion rgs1 rgs4 mutant has a more severe lsquoloss of pathogenic-ityrsquo phenotype (25) In the animal pathogen Vibrio choleraethe effect of a triple mutation on biofilm formation andvirulence was used to test the combined function of tatA(PHI2415) tatB (PHI2416) tatC (PHI2417) and revealedthis small gene family was required for virulence in mice(25) Going forward the use of the lsquounaffected pathogenic-ityrsquo category in comparative species analyses will be partic-

ularly informative when the genes involved are present inonly one copy per species This approach will reveal whichgenes function in a species-specific or taxon clade-specificmanner

The high-level phenotypic outcomes for all interactionsare summarized in Table 4 A total of 120 PHI-base ac-cessions have been assigned the high-level phenotypic out-come lsquoEssential (lethal)rsquo In these studies mainly two typesof experimental data were reported First in Aspergillus fu-migatus a promoter replacement strategy was employed toconstruct conditional mutants For these mutants the ad-dition of ammonium into the nitrogen source switches offgene expression and this allows functional gene tests of es-sential genes (26) Secondly in genome-wide gene replace-ment studies in Gibberella zeae no transformants were re-covered in repeated experiments while transformants wererecovered for many other genes Thus authors consideredthat the genersquos function was lsquoessential for lifersquo (78)

A lsquomixed outcomersquo of phenotypes can be assigned whenthe transgenic mutants generated are tested on either multi-ple host species or different tissuesorgans of the same hostspecies Different outcomes on hosts belonging to differentkingdoms potentially indicate a differential host require-

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D649

Table 2 Interactions in PHI-base version 36 grouped by either hostspecies or pathogen species

HostEntry type Interactions

TOTALa 4102PROKARYOTES (55)b 804Animal hosts (16)c 249 (31)

Salmonella spp(3)d 115Plant hosts (29) 555 (69)

Xanthomonas spp (10) 300Pseudomonas spp(7) 161Erwinia amylovora 29Plectobacterium spp (3) 10

EUKARYOTES (105) 3298Animal hosts (20) 549 (166)

Ascomycetes (17) 375Candida spp (5) 238Aspergillus fumigatus 98

Basidiomycetes (4) 144Cryptococcus neoformans 136

Parasitic species (5)e 30Plant hosts (93) 2744 (832)

Ascomycetes (60) 2384Fusaria - cereal infecting (7) 1053Fusarium graminearum 1042Magnaporthe spp(3) 575Botrytis spp(2) 205Fusaria - dicot infecting (6) 93Cochliobolus (5) 88Alternaria spp (4) 78Colletotrichium (9) 48Stagnosporum nodorum 44Zymoseptoria tritici 42

Basidiomycetes (4) 261Ustilago maydis 243Melampsori lini 7

Oomycetes (8) 86Phytophthora spp (5) 53Hyaloperonospora spp(2) 30

Others (4) 13Aphids (2) 10Nematodes (2) 3

Fungal hosts (3) 4Endophyte (1) 5

Epichloe festucae 5

aOnly highly represented taxon groups are listed For a complete list ofspecies in the database see Supplementary Table S1bThe table is divided into prokaryote and eukaryote host species Thespecies count number is listed in bracketscHost species are further divided into animal and plant hostdLeft-indented genera and species infect or belong to taxonomic grouplisted non-indented above Only main representatives organisms are listedeParasitic species are Leishmania infantum L mexicana Toxoplasmagondii Trypanosoma brucei and T cruzi

ment For example Fusarium oxysporum is able to systemi-cally infect tomato plants and immune-compromised miceThe PHI-base entries PHI215 PHI-285 and PHI315 re-veal a differential requirement for cell-signalling and cellwall formation of three genes during the pathogenesis ofplant and animal hosts

Integration with other database sources

PHI-base is a gene-centric database Each gene has its ownPHI-base accession number One advantage of this designis that phenotypic information is directly linked to a spe-cific gene This phenotypic information can then easily bemapped to genomes Additional information such as GO

terms and protein structure information is then extractedfrom other databases In our current curation we priori-tise the use on UniProt accessions (27) to facilitate subse-quent bioinformatics analysis During the curation processour biocurators map reported EMBL or GenBank acces-sions to existing UniProt identifiers where these exist How-ever for species where protein accessions are not availablein UniProt at the time of curation and authors did not pro-vide GenBank accession numbers in their studies only lim-ited or no information on the geneprotein can be providedin PHI-base until this information becomes available

Whole-genome information is increasingly available forplant and animal pathogens We have mapped phenotypesin PHI-base via their gene accessions to reference genomicsequences available in Ensembl Genomes sites for fungiprotist (including oomycetes) and bacteria (26) In total1550 out of 2047 interactions involved in plant pathogene-sis from pathogens with an available reference genome havebeen mapped to Ensembl Genomes The remainder of thePHI-base accessions are either associated with only geneticdata or the genome sequence information is still missingor are associated with previously reported sequences andisolates that differ from those in the published referencegenomes Work is continuing to resolve these cases

Functional analysis of PHI-base accessions

The entire contents of PHI-base are available to users fromthe lsquoDownloadrsquo section where sequence information isavailable for 2527 PHI-base accessions We surveyed thecontent of PHI-base accessions by cataloguing the proteinaccessions using their GO classification using Blast2GOsoftware and standard parameters (28) GO terms were as-signed to 63 of PHI-base accessions (Figure 2) For a totalof 37 (929 proteins) no GO annotation could be madeMany of these accessions are species-specific proteins andare effectors The major GO categories assigned included(i) metabolic processes (ii) cellular processes such as cellcommunication and (iii) single-organism processes suchas cell proliferation filamentous growth and pigmentationMicrobial pigments in pathogens are known to provide pro-tection against ultraviolet radiation host-defence productsand other stresses encountered during host invasion

The category lsquocell killingrsquo was only assigned to six acces-sions and included Pseudomonas effectors and the Vibriocholerae enterotoxin This low number is an unexpected re-sult because for many of the host-pathogen interactions cat-alogued in PHI-base at some point host cell death occursie in interactions involving pathogens with a necrotrophicor hemibiotrophic lifestyle

TECHNICAL DEVELOPMENTS CURATION AND OUT-REACH

Data curation and release management

In the NAR 2008 article (11) we provided the details of thecuration procedure in use This procedure is still in placeHowever due to the increasing volume of literature requir-ing curation (Figure 1) we now use additional proceduresPrimarily papers are found in the literature databases Webof Science and PubMed using the keyword search terms

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D650 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 Definitions for the nine high-level phenotype outcomes used in PHI-base

High-level phenotypeoutcomea Definition

Loss of pathogenicity The transgenic strain fails to cause disease that is observed in the wild type (ie qualitative effect)Reduced virulence The transgenic strain still causes some disease formation but fewer symptoms than the wild-type

strain (ie a quantitative effect) Synonymous with the term reduced aggressivenessUnaffectedpathogenicity

The transgenic strain which expresses altered levels of a specific gene product(s) causes the same levelof disease compared to the wild-type reference strain

Increased virulence(Hypervirulence)

The transgenic strain causes greater incidence or severity of disease than the wild-type strain

Effector (plantavirulencedeterminant)

Some effector genes are required to cause disease on susceptible hosts but most are not A plantpathogen-specific term which was previously referred to as a corresponding avirulence (Avr) geneAn effector gene is formally identified because its presence leads to the direct or indirect recognitionof a pathogen in resistant host genotypes which possess the corresponding disease resistance (R)gene Positive recognition leads to activation of plant defense and the pathogen either fails to causedisease or causes less disease In the absence of the pathogen effector delivery into a healthy plantpossessing the corresponding R gene activates plant defense responses

Lethal The transgenic strain is not viable The gene product is essential for life of the organismEnhanced antagonism The transgenic strain shows greater endophytic biomass in the host andor the formation of visible

disease symptomsResistant to chemical The transgenic strainb grows andor develops normally when exposed to chemistry concentrations

that are detrimental to the wild-type strainSensitive to chemical The transgenic strain which expresses either no or reduced levels of a specific gene product(s) or

possesses a specific gene mutation(s) has the same abilityc as the wild-type strain to grow anddevelop when exposed to detrimental chemistry concentrations

aCompared to wild-type reference strain (ie a direct isogenic strain comparison)bMolecular studies on natural field isolate population are also considered once the natural target site has been identifiedcOn rare occasions increased sensitivity to chemistry has been observed

Table 4 Number of interactions per phenotypic group in animal and plant hosts

Entry type Animal hosta Plant host

Loss of pathogenicity 73 404Reduced virulenceb 542 1056Increased virulence 33 51Essential (lethal) 46 74Unaffected pathogenicityc 80 1144Effector 0 533Enhanced antagonism 0 4Resistance to chemistry 5 30Sensitive to chemistryd 1 7

aAnimal and plant-attacking pathogens are listed with their taxonomy ID and lifestyle in Supplementary Table S1bThe three missing entries in this category have other host typescOne entry in this category has a fungal hostdOne entry in this category has a fish host

(fungor yeast) and (gene or factor) and (pathogenicity orvirulen or avirulence gene) (29) Text mining is not em-ployed due to the fact that relevant information has to beextracted by analysing figures tables and text in the peer-reviewed articles This task can only be done by trainedbiocurators with a strong understanding of the researcharea PHI-base relies heavily on support of the scientificcommunity to suggest relevant articles for curation and forthe subsequent quality control of entries The PHI-baseteam does not have any individual member solely dedicatedto data curation Instead team members curate data on apart-time basis and when the need arises In an effort toclose a curation gap for articles published between 1984and 2014 a collaboration was established with the cura-tion scientists at Molecular Connections Bangalore IndiaThe biocurators give priority to author assigned gene func-tion over computational transferred annotation such asGO terms The author-assigned function is frequently ex-

tracted from either title or abstract Experts from the sci-entific community are invited on a regular basis to verifynew records before uploading into the database and providequality control

Mapping PHI-base phenotypes to Ensembl Genomes

Through the cross-referencing with Ensembl Genomes(httpensemblgenomesorg) PHI-base annotations cannow be visualized directly in their genomic context identi-fying features such as pathogenicity islands through a sim-ple system of colour coding using the nine high-level phe-notyping terms This new way to explore the data in PHI-base is shown in Figure 3 The phenotyping term lsquomixedoutcomersquo is also used to identify genes where a range of in-teraction outcomes have been identified depending on thehost species andor tissue type evaluated

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D651

Figure 2 GO terms assigned to PHI-base accessions in Version 36 mapped to a biological process

APPLICATIONS OF PHI-BASE

PHI-base use has been cited in over 100 peer-reviewedpublications These publications are listed in year orderin the lsquoAboutrsquo section of the database Recently publishedPHI-base use cases include genome mining and compar-ative genomics (3031) the selection and functional test-ing of candidate virulence factors in newly sequenced fun-gal and nematode pathogens of agricultural importance(1011) and studies investigating the subtle differences be-tween pathogen and biocontrol species (23) In Table 5 themain uses of PHI-base are given along with literature ex-amples (1430ndash46) In the past 4 years we have observeda gradual shift in PHI-base use with an increase in thenumber of larger comparative gene function studies and in-vestigations reporting the in silico prediction of virulence-associated genes

FORTHCOMING DEVELOPMENTS

Tools for community-led curation

A big challenge facing all biological databases is the grow-ing quantity of data and the relative difficulty of obtain-ing resources to curate the knowledge that derives fromit For the pathogen-host interaction community the scaleof the problem is considerable (Figure 1) One solution isto encourage community-based curation particularly bythe authors of scientific publications who may be moti-vated to have their work correctly represented within thedatabase and who are the experts in their own specialist do-mains (although they may not be expert in the conventionsin use within the database) Inclusion of studies in PHI-base also improves their visibility and accessibility PHI-base has a curation model based on community contribu-tion although hitherto this has involved certain collabo-rators curating many papers in their own area of expertise

after prior training in the data entry tools A more scalablemodel would allow all users to directly curate their own pa-pers without prior training A new easy-to-use web-basedinterface for direct access by the wider community is cur-rently in development

The PHI-base web-based curation tool will facilitate cu-ration of pathogen-host interactions from peer reviewed lit-erature into PHI-base by the authors doing the experimen-tal analyses This curation tool will be based on the recentlydeveloped Canto tool an online tool that supports func-tional gene annotation (47) Canto is part of the GenericModel Organism Database project which provides a suiteof open software for managing genetic data (httpwwwgmodorg) Canto has proven effective for the community-based curation of data for the fission yeast database Pom-Base (httpwwwpombaseorg) (48) The PHI-base cura-tion tool will use ontological data from a variety of sourcesmost notably from the Open Biological and Biomedical On-tologies Foundry (httpwwwobofoundryorg) (49) How-ever some terminology is specific to the nature of the inter-actions captured in PHI-base so will require the develop-ment of new controlled vocabularies for this purpose Forexample an lsquointeraction evidencersquo ontology will be createdto specify the evidence for pathogen-host interactions thuscomplimenting the gene-centric data from the GO Also inaddition to the controlled high level vocabulary above de-scribing the phenotype of the pathogen (Table 4) a similarcontrolled vocabulary can be created to describe the affectthe interaction has on the host organism To ensure qualityand consistency of the curated data all annotations will stillbe approved by a curator or expert with knowledge of thespecies and the captured data

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D652 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 3 Inspection of gene function using the Ensembl genome browser (A) Displayed is a small chromosomal region in Magnaporthe oryzae showingtwo genes involved in pathogenicity (as annotated in PHI-base) in their genomic context (viewable in the Ensembl browser in the transcript display) Acolour code indicates the annotated role of each gene green lsquoloss of pathogenicityrsquo and orange lsquoreduced virulencersquo (B) By selecting each colour-codedMGG transcript ID information is revealed on the associated gene deletion study curated in the PHI-base database

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D653

Table 5 PHI-base uses that have often appeared in the peer-reviewed literature

Use case Type of research studyExamplereference

1 Annotation and candidate gene selectionLarge scale forward genetics screens (3233)Transcriptome studies (RNAseq microarrays ESTs) (343645)Full and partial genome annotation genome mining (303137)

2 Predictive bioinformatics analyses Networks protein-protein interaction mapping (353839)3 Complementary databases (144041)4 Review articles (424446)5 Single gene function studies

Inter-comparisonand inter-comparison of gene mutants within and between species (43)

Tools for data mining

We are currently developing a new tool for the analysisand extraction of whole genomic data from plant pathogensas part of the PhytoPath project (httpwwwphytopathdborg) using the data warehousing framework BioMart (50)allowing users to mine genomic data (for sequence and an-notation) across multiple species based on PHI-base anno-tations in conjunction with other annotations The tool isexpected to be launched before the end of 2015

Other activities

Our intention is to extend the taxonomic range availablewithin PHI-base to sim200 host-infecting species within thenext 2 years At this level of species coverage detailed analy-ses within and between specific groups of pathogens withdifferent infection strategies host ranges taxonomic as-signments or between pathogenic and closely related non-pathogenic endophytic or symbiotic lifestyles should befeasible

In the next phase of curation a greater emphasis will beplaced on the effector literature which should increase thenumber of interactions from bacterial oomycete and ob-ligate biotrophic species To accompany this developmentthe curation of the corresponding host target(s) ie initialmolecular partner in the host has commenced and this im-portant information should soon be available For examplevarious bacterial effectors including AvrRpm1 (PHI977)and AvrRpt2 (PHI979) are delivered into the plant cyto-plasm via the bacterial type III secretion system These ef-fectors interact with the Arabidopsis protein RIN4 (51)These protein interaction data sets are of growing impor-tance in the analysis of host-pathogen host-pest and host-parasite interactions as they typically represent communi-cation events that have co-evolved between biological king-doms For example Arabidopsis mutants harbouring T-DNA insertions within different host targets of specific ef-fectors were found to exhibit an enhanced disease resistance(edr) phenotype to both powdery and downy mildews (52)In addition by the inclusion of the corresponding host tar-gets more effectors from obligate biotrophic species canbe curated into PHI-base These effectors are rigorouslytested for their role in pathogenicity using a range of othertechniques but not those involving the generation of stablepathogen transformants

Database access and feedback

PHI-base can be freely accessed at httpwwwphi-baseorg The complete database can be downloaded fromthe lsquoDownloadrsquo section Prior to downloading the entiredatabase to create local Basic Local Alignment Search Tooldatabases or for other bioinformatics applications (Table 5)users are asked to fill in a registration form This allowsPHI-base to monitor the number of academic and indus-trial users a requirement by our sponsors

User support can be obtained from this emailcontactphi-baseorg Please use this email address ifyou wish to provide new data for inclusion in PHI-baseif you are an expert willing to assist with curation for thenomination of peer-reviewed papers to be curated or if youcan provide suggestions for improvement to the PHI-basewebsite

To increase the awareness of PHI-base developmentswithin the community and for users to be notified whennew releases occur we have developed a PHI-base usermailing list (userslistsphi-baseorg) Users can subscribefrom a link on the PHI-base website in the lsquoHelprsquo sectionor directly by going to httpswwwlistsrothamstedacukmailmanlistinfousers

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors would like to thank all the species experts whocontributed database annotations from their field of ex-pertise into PHI-base and the members of our ScientificAdvisory Board Dr Michael Csukai (Syngenta JealottrsquosHill UK) Professor Jonathan Jones (The Sainsbury Lab-oratory UK) Dr Leighton Pritchard (The James HuttonUK) and Professor Pietro Spanu (Imperial College UK)Dr Jan Taubert (formerly Rothamsted Research) is thankedfor his expert support in maintaining the PHI-base parserWe thank Drs Paul Kersey Uma Maheswari and DanStaines at the European Bioinformatics Institute (Cam-bridge) for helpful discussions and for significantly improv-ing the pathogen species content within Ensembl Genomes

FUNDING

This work is supported by the UK Biotechnologyand Biological Sciences Research Council (BBSRC)

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 5: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

Nucleic Acids Research 2015 Vol 43 Database issue D649

Table 2 Interactions in PHI-base version 36 grouped by either hostspecies or pathogen species

HostEntry type Interactions

TOTALa 4102PROKARYOTES (55)b 804Animal hosts (16)c 249 (31)

Salmonella spp(3)d 115Plant hosts (29) 555 (69)

Xanthomonas spp (10) 300Pseudomonas spp(7) 161Erwinia amylovora 29Plectobacterium spp (3) 10

EUKARYOTES (105) 3298Animal hosts (20) 549 (166)

Ascomycetes (17) 375Candida spp (5) 238Aspergillus fumigatus 98

Basidiomycetes (4) 144Cryptococcus neoformans 136

Parasitic species (5)e 30Plant hosts (93) 2744 (832)

Ascomycetes (60) 2384Fusaria - cereal infecting (7) 1053Fusarium graminearum 1042Magnaporthe spp(3) 575Botrytis spp(2) 205Fusaria - dicot infecting (6) 93Cochliobolus (5) 88Alternaria spp (4) 78Colletotrichium (9) 48Stagnosporum nodorum 44Zymoseptoria tritici 42

Basidiomycetes (4) 261Ustilago maydis 243Melampsori lini 7

Oomycetes (8) 86Phytophthora spp (5) 53Hyaloperonospora spp(2) 30

Others (4) 13Aphids (2) 10Nematodes (2) 3

Fungal hosts (3) 4Endophyte (1) 5

Epichloe festucae 5

aOnly highly represented taxon groups are listed For a complete list ofspecies in the database see Supplementary Table S1bThe table is divided into prokaryote and eukaryote host species Thespecies count number is listed in bracketscHost species are further divided into animal and plant hostdLeft-indented genera and species infect or belong to taxonomic grouplisted non-indented above Only main representatives organisms are listedeParasitic species are Leishmania infantum L mexicana Toxoplasmagondii Trypanosoma brucei and T cruzi

ment For example Fusarium oxysporum is able to systemi-cally infect tomato plants and immune-compromised miceThe PHI-base entries PHI215 PHI-285 and PHI315 re-veal a differential requirement for cell-signalling and cellwall formation of three genes during the pathogenesis ofplant and animal hosts

Integration with other database sources

PHI-base is a gene-centric database Each gene has its ownPHI-base accession number One advantage of this designis that phenotypic information is directly linked to a spe-cific gene This phenotypic information can then easily bemapped to genomes Additional information such as GO

terms and protein structure information is then extractedfrom other databases In our current curation we priori-tise the use on UniProt accessions (27) to facilitate subse-quent bioinformatics analysis During the curation processour biocurators map reported EMBL or GenBank acces-sions to existing UniProt identifiers where these exist How-ever for species where protein accessions are not availablein UniProt at the time of curation and authors did not pro-vide GenBank accession numbers in their studies only lim-ited or no information on the geneprotein can be providedin PHI-base until this information becomes available

Whole-genome information is increasingly available forplant and animal pathogens We have mapped phenotypesin PHI-base via their gene accessions to reference genomicsequences available in Ensembl Genomes sites for fungiprotist (including oomycetes) and bacteria (26) In total1550 out of 2047 interactions involved in plant pathogene-sis from pathogens with an available reference genome havebeen mapped to Ensembl Genomes The remainder of thePHI-base accessions are either associated with only geneticdata or the genome sequence information is still missingor are associated with previously reported sequences andisolates that differ from those in the published referencegenomes Work is continuing to resolve these cases

Functional analysis of PHI-base accessions

The entire contents of PHI-base are available to users fromthe lsquoDownloadrsquo section where sequence information isavailable for 2527 PHI-base accessions We surveyed thecontent of PHI-base accessions by cataloguing the proteinaccessions using their GO classification using Blast2GOsoftware and standard parameters (28) GO terms were as-signed to 63 of PHI-base accessions (Figure 2) For a totalof 37 (929 proteins) no GO annotation could be madeMany of these accessions are species-specific proteins andare effectors The major GO categories assigned included(i) metabolic processes (ii) cellular processes such as cellcommunication and (iii) single-organism processes suchas cell proliferation filamentous growth and pigmentationMicrobial pigments in pathogens are known to provide pro-tection against ultraviolet radiation host-defence productsand other stresses encountered during host invasion

The category lsquocell killingrsquo was only assigned to six acces-sions and included Pseudomonas effectors and the Vibriocholerae enterotoxin This low number is an unexpected re-sult because for many of the host-pathogen interactions cat-alogued in PHI-base at some point host cell death occursie in interactions involving pathogens with a necrotrophicor hemibiotrophic lifestyle

TECHNICAL DEVELOPMENTS CURATION AND OUT-REACH

Data curation and release management

In the NAR 2008 article (11) we provided the details of thecuration procedure in use This procedure is still in placeHowever due to the increasing volume of literature requir-ing curation (Figure 1) we now use additional proceduresPrimarily papers are found in the literature databases Webof Science and PubMed using the keyword search terms

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D650 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 Definitions for the nine high-level phenotype outcomes used in PHI-base

High-level phenotypeoutcomea Definition

Loss of pathogenicity The transgenic strain fails to cause disease that is observed in the wild type (ie qualitative effect)Reduced virulence The transgenic strain still causes some disease formation but fewer symptoms than the wild-type

strain (ie a quantitative effect) Synonymous with the term reduced aggressivenessUnaffectedpathogenicity

The transgenic strain which expresses altered levels of a specific gene product(s) causes the same levelof disease compared to the wild-type reference strain

Increased virulence(Hypervirulence)

The transgenic strain causes greater incidence or severity of disease than the wild-type strain

Effector (plantavirulencedeterminant)

Some effector genes are required to cause disease on susceptible hosts but most are not A plantpathogen-specific term which was previously referred to as a corresponding avirulence (Avr) geneAn effector gene is formally identified because its presence leads to the direct or indirect recognitionof a pathogen in resistant host genotypes which possess the corresponding disease resistance (R)gene Positive recognition leads to activation of plant defense and the pathogen either fails to causedisease or causes less disease In the absence of the pathogen effector delivery into a healthy plantpossessing the corresponding R gene activates plant defense responses

Lethal The transgenic strain is not viable The gene product is essential for life of the organismEnhanced antagonism The transgenic strain shows greater endophytic biomass in the host andor the formation of visible

disease symptomsResistant to chemical The transgenic strainb grows andor develops normally when exposed to chemistry concentrations

that are detrimental to the wild-type strainSensitive to chemical The transgenic strain which expresses either no or reduced levels of a specific gene product(s) or

possesses a specific gene mutation(s) has the same abilityc as the wild-type strain to grow anddevelop when exposed to detrimental chemistry concentrations

aCompared to wild-type reference strain (ie a direct isogenic strain comparison)bMolecular studies on natural field isolate population are also considered once the natural target site has been identifiedcOn rare occasions increased sensitivity to chemistry has been observed

Table 4 Number of interactions per phenotypic group in animal and plant hosts

Entry type Animal hosta Plant host

Loss of pathogenicity 73 404Reduced virulenceb 542 1056Increased virulence 33 51Essential (lethal) 46 74Unaffected pathogenicityc 80 1144Effector 0 533Enhanced antagonism 0 4Resistance to chemistry 5 30Sensitive to chemistryd 1 7

aAnimal and plant-attacking pathogens are listed with their taxonomy ID and lifestyle in Supplementary Table S1bThe three missing entries in this category have other host typescOne entry in this category has a fungal hostdOne entry in this category has a fish host

(fungor yeast) and (gene or factor) and (pathogenicity orvirulen or avirulence gene) (29) Text mining is not em-ployed due to the fact that relevant information has to beextracted by analysing figures tables and text in the peer-reviewed articles This task can only be done by trainedbiocurators with a strong understanding of the researcharea PHI-base relies heavily on support of the scientificcommunity to suggest relevant articles for curation and forthe subsequent quality control of entries The PHI-baseteam does not have any individual member solely dedicatedto data curation Instead team members curate data on apart-time basis and when the need arises In an effort toclose a curation gap for articles published between 1984and 2014 a collaboration was established with the cura-tion scientists at Molecular Connections Bangalore IndiaThe biocurators give priority to author assigned gene func-tion over computational transferred annotation such asGO terms The author-assigned function is frequently ex-

tracted from either title or abstract Experts from the sci-entific community are invited on a regular basis to verifynew records before uploading into the database and providequality control

Mapping PHI-base phenotypes to Ensembl Genomes

Through the cross-referencing with Ensembl Genomes(httpensemblgenomesorg) PHI-base annotations cannow be visualized directly in their genomic context identi-fying features such as pathogenicity islands through a sim-ple system of colour coding using the nine high-level phe-notyping terms This new way to explore the data in PHI-base is shown in Figure 3 The phenotyping term lsquomixedoutcomersquo is also used to identify genes where a range of in-teraction outcomes have been identified depending on thehost species andor tissue type evaluated

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D651

Figure 2 GO terms assigned to PHI-base accessions in Version 36 mapped to a biological process

APPLICATIONS OF PHI-BASE

PHI-base use has been cited in over 100 peer-reviewedpublications These publications are listed in year orderin the lsquoAboutrsquo section of the database Recently publishedPHI-base use cases include genome mining and compar-ative genomics (3031) the selection and functional test-ing of candidate virulence factors in newly sequenced fun-gal and nematode pathogens of agricultural importance(1011) and studies investigating the subtle differences be-tween pathogen and biocontrol species (23) In Table 5 themain uses of PHI-base are given along with literature ex-amples (1430ndash46) In the past 4 years we have observeda gradual shift in PHI-base use with an increase in thenumber of larger comparative gene function studies and in-vestigations reporting the in silico prediction of virulence-associated genes

FORTHCOMING DEVELOPMENTS

Tools for community-led curation

A big challenge facing all biological databases is the grow-ing quantity of data and the relative difficulty of obtain-ing resources to curate the knowledge that derives fromit For the pathogen-host interaction community the scaleof the problem is considerable (Figure 1) One solution isto encourage community-based curation particularly bythe authors of scientific publications who may be moti-vated to have their work correctly represented within thedatabase and who are the experts in their own specialist do-mains (although they may not be expert in the conventionsin use within the database) Inclusion of studies in PHI-base also improves their visibility and accessibility PHI-base has a curation model based on community contribu-tion although hitherto this has involved certain collabo-rators curating many papers in their own area of expertise

after prior training in the data entry tools A more scalablemodel would allow all users to directly curate their own pa-pers without prior training A new easy-to-use web-basedinterface for direct access by the wider community is cur-rently in development

The PHI-base web-based curation tool will facilitate cu-ration of pathogen-host interactions from peer reviewed lit-erature into PHI-base by the authors doing the experimen-tal analyses This curation tool will be based on the recentlydeveloped Canto tool an online tool that supports func-tional gene annotation (47) Canto is part of the GenericModel Organism Database project which provides a suiteof open software for managing genetic data (httpwwwgmodorg) Canto has proven effective for the community-based curation of data for the fission yeast database Pom-Base (httpwwwpombaseorg) (48) The PHI-base cura-tion tool will use ontological data from a variety of sourcesmost notably from the Open Biological and Biomedical On-tologies Foundry (httpwwwobofoundryorg) (49) How-ever some terminology is specific to the nature of the inter-actions captured in PHI-base so will require the develop-ment of new controlled vocabularies for this purpose Forexample an lsquointeraction evidencersquo ontology will be createdto specify the evidence for pathogen-host interactions thuscomplimenting the gene-centric data from the GO Also inaddition to the controlled high level vocabulary above de-scribing the phenotype of the pathogen (Table 4) a similarcontrolled vocabulary can be created to describe the affectthe interaction has on the host organism To ensure qualityand consistency of the curated data all annotations will stillbe approved by a curator or expert with knowledge of thespecies and the captured data

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D652 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 3 Inspection of gene function using the Ensembl genome browser (A) Displayed is a small chromosomal region in Magnaporthe oryzae showingtwo genes involved in pathogenicity (as annotated in PHI-base) in their genomic context (viewable in the Ensembl browser in the transcript display) Acolour code indicates the annotated role of each gene green lsquoloss of pathogenicityrsquo and orange lsquoreduced virulencersquo (B) By selecting each colour-codedMGG transcript ID information is revealed on the associated gene deletion study curated in the PHI-base database

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D653

Table 5 PHI-base uses that have often appeared in the peer-reviewed literature

Use case Type of research studyExamplereference

1 Annotation and candidate gene selectionLarge scale forward genetics screens (3233)Transcriptome studies (RNAseq microarrays ESTs) (343645)Full and partial genome annotation genome mining (303137)

2 Predictive bioinformatics analyses Networks protein-protein interaction mapping (353839)3 Complementary databases (144041)4 Review articles (424446)5 Single gene function studies

Inter-comparisonand inter-comparison of gene mutants within and between species (43)

Tools for data mining

We are currently developing a new tool for the analysisand extraction of whole genomic data from plant pathogensas part of the PhytoPath project (httpwwwphytopathdborg) using the data warehousing framework BioMart (50)allowing users to mine genomic data (for sequence and an-notation) across multiple species based on PHI-base anno-tations in conjunction with other annotations The tool isexpected to be launched before the end of 2015

Other activities

Our intention is to extend the taxonomic range availablewithin PHI-base to sim200 host-infecting species within thenext 2 years At this level of species coverage detailed analy-ses within and between specific groups of pathogens withdifferent infection strategies host ranges taxonomic as-signments or between pathogenic and closely related non-pathogenic endophytic or symbiotic lifestyles should befeasible

In the next phase of curation a greater emphasis will beplaced on the effector literature which should increase thenumber of interactions from bacterial oomycete and ob-ligate biotrophic species To accompany this developmentthe curation of the corresponding host target(s) ie initialmolecular partner in the host has commenced and this im-portant information should soon be available For examplevarious bacterial effectors including AvrRpm1 (PHI977)and AvrRpt2 (PHI979) are delivered into the plant cyto-plasm via the bacterial type III secretion system These ef-fectors interact with the Arabidopsis protein RIN4 (51)These protein interaction data sets are of growing impor-tance in the analysis of host-pathogen host-pest and host-parasite interactions as they typically represent communi-cation events that have co-evolved between biological king-doms For example Arabidopsis mutants harbouring T-DNA insertions within different host targets of specific ef-fectors were found to exhibit an enhanced disease resistance(edr) phenotype to both powdery and downy mildews (52)In addition by the inclusion of the corresponding host tar-gets more effectors from obligate biotrophic species canbe curated into PHI-base These effectors are rigorouslytested for their role in pathogenicity using a range of othertechniques but not those involving the generation of stablepathogen transformants

Database access and feedback

PHI-base can be freely accessed at httpwwwphi-baseorg The complete database can be downloaded fromthe lsquoDownloadrsquo section Prior to downloading the entiredatabase to create local Basic Local Alignment Search Tooldatabases or for other bioinformatics applications (Table 5)users are asked to fill in a registration form This allowsPHI-base to monitor the number of academic and indus-trial users a requirement by our sponsors

User support can be obtained from this emailcontactphi-baseorg Please use this email address ifyou wish to provide new data for inclusion in PHI-baseif you are an expert willing to assist with curation for thenomination of peer-reviewed papers to be curated or if youcan provide suggestions for improvement to the PHI-basewebsite

To increase the awareness of PHI-base developmentswithin the community and for users to be notified whennew releases occur we have developed a PHI-base usermailing list (userslistsphi-baseorg) Users can subscribefrom a link on the PHI-base website in the lsquoHelprsquo sectionor directly by going to httpswwwlistsrothamstedacukmailmanlistinfousers

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors would like to thank all the species experts whocontributed database annotations from their field of ex-pertise into PHI-base and the members of our ScientificAdvisory Board Dr Michael Csukai (Syngenta JealottrsquosHill UK) Professor Jonathan Jones (The Sainsbury Lab-oratory UK) Dr Leighton Pritchard (The James HuttonUK) and Professor Pietro Spanu (Imperial College UK)Dr Jan Taubert (formerly Rothamsted Research) is thankedfor his expert support in maintaining the PHI-base parserWe thank Drs Paul Kersey Uma Maheswari and DanStaines at the European Bioinformatics Institute (Cam-bridge) for helpful discussions and for significantly improv-ing the pathogen species content within Ensembl Genomes

FUNDING

This work is supported by the UK Biotechnologyand Biological Sciences Research Council (BBSRC)

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 6: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

D650 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 Definitions for the nine high-level phenotype outcomes used in PHI-base

High-level phenotypeoutcomea Definition

Loss of pathogenicity The transgenic strain fails to cause disease that is observed in the wild type (ie qualitative effect)Reduced virulence The transgenic strain still causes some disease formation but fewer symptoms than the wild-type

strain (ie a quantitative effect) Synonymous with the term reduced aggressivenessUnaffectedpathogenicity

The transgenic strain which expresses altered levels of a specific gene product(s) causes the same levelof disease compared to the wild-type reference strain

Increased virulence(Hypervirulence)

The transgenic strain causes greater incidence or severity of disease than the wild-type strain

Effector (plantavirulencedeterminant)

Some effector genes are required to cause disease on susceptible hosts but most are not A plantpathogen-specific term which was previously referred to as a corresponding avirulence (Avr) geneAn effector gene is formally identified because its presence leads to the direct or indirect recognitionof a pathogen in resistant host genotypes which possess the corresponding disease resistance (R)gene Positive recognition leads to activation of plant defense and the pathogen either fails to causedisease or causes less disease In the absence of the pathogen effector delivery into a healthy plantpossessing the corresponding R gene activates plant defense responses

Lethal The transgenic strain is not viable The gene product is essential for life of the organismEnhanced antagonism The transgenic strain shows greater endophytic biomass in the host andor the formation of visible

disease symptomsResistant to chemical The transgenic strainb grows andor develops normally when exposed to chemistry concentrations

that are detrimental to the wild-type strainSensitive to chemical The transgenic strain which expresses either no or reduced levels of a specific gene product(s) or

possesses a specific gene mutation(s) has the same abilityc as the wild-type strain to grow anddevelop when exposed to detrimental chemistry concentrations

aCompared to wild-type reference strain (ie a direct isogenic strain comparison)bMolecular studies on natural field isolate population are also considered once the natural target site has been identifiedcOn rare occasions increased sensitivity to chemistry has been observed

Table 4 Number of interactions per phenotypic group in animal and plant hosts

Entry type Animal hosta Plant host

Loss of pathogenicity 73 404Reduced virulenceb 542 1056Increased virulence 33 51Essential (lethal) 46 74Unaffected pathogenicityc 80 1144Effector 0 533Enhanced antagonism 0 4Resistance to chemistry 5 30Sensitive to chemistryd 1 7

aAnimal and plant-attacking pathogens are listed with their taxonomy ID and lifestyle in Supplementary Table S1bThe three missing entries in this category have other host typescOne entry in this category has a fungal hostdOne entry in this category has a fish host

(fungor yeast) and (gene or factor) and (pathogenicity orvirulen or avirulence gene) (29) Text mining is not em-ployed due to the fact that relevant information has to beextracted by analysing figures tables and text in the peer-reviewed articles This task can only be done by trainedbiocurators with a strong understanding of the researcharea PHI-base relies heavily on support of the scientificcommunity to suggest relevant articles for curation and forthe subsequent quality control of entries The PHI-baseteam does not have any individual member solely dedicatedto data curation Instead team members curate data on apart-time basis and when the need arises In an effort toclose a curation gap for articles published between 1984and 2014 a collaboration was established with the cura-tion scientists at Molecular Connections Bangalore IndiaThe biocurators give priority to author assigned gene func-tion over computational transferred annotation such asGO terms The author-assigned function is frequently ex-

tracted from either title or abstract Experts from the sci-entific community are invited on a regular basis to verifynew records before uploading into the database and providequality control

Mapping PHI-base phenotypes to Ensembl Genomes

Through the cross-referencing with Ensembl Genomes(httpensemblgenomesorg) PHI-base annotations cannow be visualized directly in their genomic context identi-fying features such as pathogenicity islands through a sim-ple system of colour coding using the nine high-level phe-notyping terms This new way to explore the data in PHI-base is shown in Figure 3 The phenotyping term lsquomixedoutcomersquo is also used to identify genes where a range of in-teraction outcomes have been identified depending on thehost species andor tissue type evaluated

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D651

Figure 2 GO terms assigned to PHI-base accessions in Version 36 mapped to a biological process

APPLICATIONS OF PHI-BASE

PHI-base use has been cited in over 100 peer-reviewedpublications These publications are listed in year orderin the lsquoAboutrsquo section of the database Recently publishedPHI-base use cases include genome mining and compar-ative genomics (3031) the selection and functional test-ing of candidate virulence factors in newly sequenced fun-gal and nematode pathogens of agricultural importance(1011) and studies investigating the subtle differences be-tween pathogen and biocontrol species (23) In Table 5 themain uses of PHI-base are given along with literature ex-amples (1430ndash46) In the past 4 years we have observeda gradual shift in PHI-base use with an increase in thenumber of larger comparative gene function studies and in-vestigations reporting the in silico prediction of virulence-associated genes

FORTHCOMING DEVELOPMENTS

Tools for community-led curation

A big challenge facing all biological databases is the grow-ing quantity of data and the relative difficulty of obtain-ing resources to curate the knowledge that derives fromit For the pathogen-host interaction community the scaleof the problem is considerable (Figure 1) One solution isto encourage community-based curation particularly bythe authors of scientific publications who may be moti-vated to have their work correctly represented within thedatabase and who are the experts in their own specialist do-mains (although they may not be expert in the conventionsin use within the database) Inclusion of studies in PHI-base also improves their visibility and accessibility PHI-base has a curation model based on community contribu-tion although hitherto this has involved certain collabo-rators curating many papers in their own area of expertise

after prior training in the data entry tools A more scalablemodel would allow all users to directly curate their own pa-pers without prior training A new easy-to-use web-basedinterface for direct access by the wider community is cur-rently in development

The PHI-base web-based curation tool will facilitate cu-ration of pathogen-host interactions from peer reviewed lit-erature into PHI-base by the authors doing the experimen-tal analyses This curation tool will be based on the recentlydeveloped Canto tool an online tool that supports func-tional gene annotation (47) Canto is part of the GenericModel Organism Database project which provides a suiteof open software for managing genetic data (httpwwwgmodorg) Canto has proven effective for the community-based curation of data for the fission yeast database Pom-Base (httpwwwpombaseorg) (48) The PHI-base cura-tion tool will use ontological data from a variety of sourcesmost notably from the Open Biological and Biomedical On-tologies Foundry (httpwwwobofoundryorg) (49) How-ever some terminology is specific to the nature of the inter-actions captured in PHI-base so will require the develop-ment of new controlled vocabularies for this purpose Forexample an lsquointeraction evidencersquo ontology will be createdto specify the evidence for pathogen-host interactions thuscomplimenting the gene-centric data from the GO Also inaddition to the controlled high level vocabulary above de-scribing the phenotype of the pathogen (Table 4) a similarcontrolled vocabulary can be created to describe the affectthe interaction has on the host organism To ensure qualityand consistency of the curated data all annotations will stillbe approved by a curator or expert with knowledge of thespecies and the captured data

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D652 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 3 Inspection of gene function using the Ensembl genome browser (A) Displayed is a small chromosomal region in Magnaporthe oryzae showingtwo genes involved in pathogenicity (as annotated in PHI-base) in their genomic context (viewable in the Ensembl browser in the transcript display) Acolour code indicates the annotated role of each gene green lsquoloss of pathogenicityrsquo and orange lsquoreduced virulencersquo (B) By selecting each colour-codedMGG transcript ID information is revealed on the associated gene deletion study curated in the PHI-base database

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D653

Table 5 PHI-base uses that have often appeared in the peer-reviewed literature

Use case Type of research studyExamplereference

1 Annotation and candidate gene selectionLarge scale forward genetics screens (3233)Transcriptome studies (RNAseq microarrays ESTs) (343645)Full and partial genome annotation genome mining (303137)

2 Predictive bioinformatics analyses Networks protein-protein interaction mapping (353839)3 Complementary databases (144041)4 Review articles (424446)5 Single gene function studies

Inter-comparisonand inter-comparison of gene mutants within and between species (43)

Tools for data mining

We are currently developing a new tool for the analysisand extraction of whole genomic data from plant pathogensas part of the PhytoPath project (httpwwwphytopathdborg) using the data warehousing framework BioMart (50)allowing users to mine genomic data (for sequence and an-notation) across multiple species based on PHI-base anno-tations in conjunction with other annotations The tool isexpected to be launched before the end of 2015

Other activities

Our intention is to extend the taxonomic range availablewithin PHI-base to sim200 host-infecting species within thenext 2 years At this level of species coverage detailed analy-ses within and between specific groups of pathogens withdifferent infection strategies host ranges taxonomic as-signments or between pathogenic and closely related non-pathogenic endophytic or symbiotic lifestyles should befeasible

In the next phase of curation a greater emphasis will beplaced on the effector literature which should increase thenumber of interactions from bacterial oomycete and ob-ligate biotrophic species To accompany this developmentthe curation of the corresponding host target(s) ie initialmolecular partner in the host has commenced and this im-portant information should soon be available For examplevarious bacterial effectors including AvrRpm1 (PHI977)and AvrRpt2 (PHI979) are delivered into the plant cyto-plasm via the bacterial type III secretion system These ef-fectors interact with the Arabidopsis protein RIN4 (51)These protein interaction data sets are of growing impor-tance in the analysis of host-pathogen host-pest and host-parasite interactions as they typically represent communi-cation events that have co-evolved between biological king-doms For example Arabidopsis mutants harbouring T-DNA insertions within different host targets of specific ef-fectors were found to exhibit an enhanced disease resistance(edr) phenotype to both powdery and downy mildews (52)In addition by the inclusion of the corresponding host tar-gets more effectors from obligate biotrophic species canbe curated into PHI-base These effectors are rigorouslytested for their role in pathogenicity using a range of othertechniques but not those involving the generation of stablepathogen transformants

Database access and feedback

PHI-base can be freely accessed at httpwwwphi-baseorg The complete database can be downloaded fromthe lsquoDownloadrsquo section Prior to downloading the entiredatabase to create local Basic Local Alignment Search Tooldatabases or for other bioinformatics applications (Table 5)users are asked to fill in a registration form This allowsPHI-base to monitor the number of academic and indus-trial users a requirement by our sponsors

User support can be obtained from this emailcontactphi-baseorg Please use this email address ifyou wish to provide new data for inclusion in PHI-baseif you are an expert willing to assist with curation for thenomination of peer-reviewed papers to be curated or if youcan provide suggestions for improvement to the PHI-basewebsite

To increase the awareness of PHI-base developmentswithin the community and for users to be notified whennew releases occur we have developed a PHI-base usermailing list (userslistsphi-baseorg) Users can subscribefrom a link on the PHI-base website in the lsquoHelprsquo sectionor directly by going to httpswwwlistsrothamstedacukmailmanlistinfousers

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors would like to thank all the species experts whocontributed database annotations from their field of ex-pertise into PHI-base and the members of our ScientificAdvisory Board Dr Michael Csukai (Syngenta JealottrsquosHill UK) Professor Jonathan Jones (The Sainsbury Lab-oratory UK) Dr Leighton Pritchard (The James HuttonUK) and Professor Pietro Spanu (Imperial College UK)Dr Jan Taubert (formerly Rothamsted Research) is thankedfor his expert support in maintaining the PHI-base parserWe thank Drs Paul Kersey Uma Maheswari and DanStaines at the European Bioinformatics Institute (Cam-bridge) for helpful discussions and for significantly improv-ing the pathogen species content within Ensembl Genomes

FUNDING

This work is supported by the UK Biotechnologyand Biological Sciences Research Council (BBSRC)

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 7: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

Nucleic Acids Research 2015 Vol 43 Database issue D651

Figure 2 GO terms assigned to PHI-base accessions in Version 36 mapped to a biological process

APPLICATIONS OF PHI-BASE

PHI-base use has been cited in over 100 peer-reviewedpublications These publications are listed in year orderin the lsquoAboutrsquo section of the database Recently publishedPHI-base use cases include genome mining and compar-ative genomics (3031) the selection and functional test-ing of candidate virulence factors in newly sequenced fun-gal and nematode pathogens of agricultural importance(1011) and studies investigating the subtle differences be-tween pathogen and biocontrol species (23) In Table 5 themain uses of PHI-base are given along with literature ex-amples (1430ndash46) In the past 4 years we have observeda gradual shift in PHI-base use with an increase in thenumber of larger comparative gene function studies and in-vestigations reporting the in silico prediction of virulence-associated genes

FORTHCOMING DEVELOPMENTS

Tools for community-led curation

A big challenge facing all biological databases is the grow-ing quantity of data and the relative difficulty of obtain-ing resources to curate the knowledge that derives fromit For the pathogen-host interaction community the scaleof the problem is considerable (Figure 1) One solution isto encourage community-based curation particularly bythe authors of scientific publications who may be moti-vated to have their work correctly represented within thedatabase and who are the experts in their own specialist do-mains (although they may not be expert in the conventionsin use within the database) Inclusion of studies in PHI-base also improves their visibility and accessibility PHI-base has a curation model based on community contribu-tion although hitherto this has involved certain collabo-rators curating many papers in their own area of expertise

after prior training in the data entry tools A more scalablemodel would allow all users to directly curate their own pa-pers without prior training A new easy-to-use web-basedinterface for direct access by the wider community is cur-rently in development

The PHI-base web-based curation tool will facilitate cu-ration of pathogen-host interactions from peer reviewed lit-erature into PHI-base by the authors doing the experimen-tal analyses This curation tool will be based on the recentlydeveloped Canto tool an online tool that supports func-tional gene annotation (47) Canto is part of the GenericModel Organism Database project which provides a suiteof open software for managing genetic data (httpwwwgmodorg) Canto has proven effective for the community-based curation of data for the fission yeast database Pom-Base (httpwwwpombaseorg) (48) The PHI-base cura-tion tool will use ontological data from a variety of sourcesmost notably from the Open Biological and Biomedical On-tologies Foundry (httpwwwobofoundryorg) (49) How-ever some terminology is specific to the nature of the inter-actions captured in PHI-base so will require the develop-ment of new controlled vocabularies for this purpose Forexample an lsquointeraction evidencersquo ontology will be createdto specify the evidence for pathogen-host interactions thuscomplimenting the gene-centric data from the GO Also inaddition to the controlled high level vocabulary above de-scribing the phenotype of the pathogen (Table 4) a similarcontrolled vocabulary can be created to describe the affectthe interaction has on the host organism To ensure qualityand consistency of the curated data all annotations will stillbe approved by a curator or expert with knowledge of thespecies and the captured data

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D652 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 3 Inspection of gene function using the Ensembl genome browser (A) Displayed is a small chromosomal region in Magnaporthe oryzae showingtwo genes involved in pathogenicity (as annotated in PHI-base) in their genomic context (viewable in the Ensembl browser in the transcript display) Acolour code indicates the annotated role of each gene green lsquoloss of pathogenicityrsquo and orange lsquoreduced virulencersquo (B) By selecting each colour-codedMGG transcript ID information is revealed on the associated gene deletion study curated in the PHI-base database

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D653

Table 5 PHI-base uses that have often appeared in the peer-reviewed literature

Use case Type of research studyExamplereference

1 Annotation and candidate gene selectionLarge scale forward genetics screens (3233)Transcriptome studies (RNAseq microarrays ESTs) (343645)Full and partial genome annotation genome mining (303137)

2 Predictive bioinformatics analyses Networks protein-protein interaction mapping (353839)3 Complementary databases (144041)4 Review articles (424446)5 Single gene function studies

Inter-comparisonand inter-comparison of gene mutants within and between species (43)

Tools for data mining

We are currently developing a new tool for the analysisand extraction of whole genomic data from plant pathogensas part of the PhytoPath project (httpwwwphytopathdborg) using the data warehousing framework BioMart (50)allowing users to mine genomic data (for sequence and an-notation) across multiple species based on PHI-base anno-tations in conjunction with other annotations The tool isexpected to be launched before the end of 2015

Other activities

Our intention is to extend the taxonomic range availablewithin PHI-base to sim200 host-infecting species within thenext 2 years At this level of species coverage detailed analy-ses within and between specific groups of pathogens withdifferent infection strategies host ranges taxonomic as-signments or between pathogenic and closely related non-pathogenic endophytic or symbiotic lifestyles should befeasible

In the next phase of curation a greater emphasis will beplaced on the effector literature which should increase thenumber of interactions from bacterial oomycete and ob-ligate biotrophic species To accompany this developmentthe curation of the corresponding host target(s) ie initialmolecular partner in the host has commenced and this im-portant information should soon be available For examplevarious bacterial effectors including AvrRpm1 (PHI977)and AvrRpt2 (PHI979) are delivered into the plant cyto-plasm via the bacterial type III secretion system These ef-fectors interact with the Arabidopsis protein RIN4 (51)These protein interaction data sets are of growing impor-tance in the analysis of host-pathogen host-pest and host-parasite interactions as they typically represent communi-cation events that have co-evolved between biological king-doms For example Arabidopsis mutants harbouring T-DNA insertions within different host targets of specific ef-fectors were found to exhibit an enhanced disease resistance(edr) phenotype to both powdery and downy mildews (52)In addition by the inclusion of the corresponding host tar-gets more effectors from obligate biotrophic species canbe curated into PHI-base These effectors are rigorouslytested for their role in pathogenicity using a range of othertechniques but not those involving the generation of stablepathogen transformants

Database access and feedback

PHI-base can be freely accessed at httpwwwphi-baseorg The complete database can be downloaded fromthe lsquoDownloadrsquo section Prior to downloading the entiredatabase to create local Basic Local Alignment Search Tooldatabases or for other bioinformatics applications (Table 5)users are asked to fill in a registration form This allowsPHI-base to monitor the number of academic and indus-trial users a requirement by our sponsors

User support can be obtained from this emailcontactphi-baseorg Please use this email address ifyou wish to provide new data for inclusion in PHI-baseif you are an expert willing to assist with curation for thenomination of peer-reviewed papers to be curated or if youcan provide suggestions for improvement to the PHI-basewebsite

To increase the awareness of PHI-base developmentswithin the community and for users to be notified whennew releases occur we have developed a PHI-base usermailing list (userslistsphi-baseorg) Users can subscribefrom a link on the PHI-base website in the lsquoHelprsquo sectionor directly by going to httpswwwlistsrothamstedacukmailmanlistinfousers

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors would like to thank all the species experts whocontributed database annotations from their field of ex-pertise into PHI-base and the members of our ScientificAdvisory Board Dr Michael Csukai (Syngenta JealottrsquosHill UK) Professor Jonathan Jones (The Sainsbury Lab-oratory UK) Dr Leighton Pritchard (The James HuttonUK) and Professor Pietro Spanu (Imperial College UK)Dr Jan Taubert (formerly Rothamsted Research) is thankedfor his expert support in maintaining the PHI-base parserWe thank Drs Paul Kersey Uma Maheswari and DanStaines at the European Bioinformatics Institute (Cam-bridge) for helpful discussions and for significantly improv-ing the pathogen species content within Ensembl Genomes

FUNDING

This work is supported by the UK Biotechnologyand Biological Sciences Research Council (BBSRC)

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 8: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

D652 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 3 Inspection of gene function using the Ensembl genome browser (A) Displayed is a small chromosomal region in Magnaporthe oryzae showingtwo genes involved in pathogenicity (as annotated in PHI-base) in their genomic context (viewable in the Ensembl browser in the transcript display) Acolour code indicates the annotated role of each gene green lsquoloss of pathogenicityrsquo and orange lsquoreduced virulencersquo (B) By selecting each colour-codedMGG transcript ID information is revealed on the associated gene deletion study curated in the PHI-base database

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D653

Table 5 PHI-base uses that have often appeared in the peer-reviewed literature

Use case Type of research studyExamplereference

1 Annotation and candidate gene selectionLarge scale forward genetics screens (3233)Transcriptome studies (RNAseq microarrays ESTs) (343645)Full and partial genome annotation genome mining (303137)

2 Predictive bioinformatics analyses Networks protein-protein interaction mapping (353839)3 Complementary databases (144041)4 Review articles (424446)5 Single gene function studies

Inter-comparisonand inter-comparison of gene mutants within and between species (43)

Tools for data mining

We are currently developing a new tool for the analysisand extraction of whole genomic data from plant pathogensas part of the PhytoPath project (httpwwwphytopathdborg) using the data warehousing framework BioMart (50)allowing users to mine genomic data (for sequence and an-notation) across multiple species based on PHI-base anno-tations in conjunction with other annotations The tool isexpected to be launched before the end of 2015

Other activities

Our intention is to extend the taxonomic range availablewithin PHI-base to sim200 host-infecting species within thenext 2 years At this level of species coverage detailed analy-ses within and between specific groups of pathogens withdifferent infection strategies host ranges taxonomic as-signments or between pathogenic and closely related non-pathogenic endophytic or symbiotic lifestyles should befeasible

In the next phase of curation a greater emphasis will beplaced on the effector literature which should increase thenumber of interactions from bacterial oomycete and ob-ligate biotrophic species To accompany this developmentthe curation of the corresponding host target(s) ie initialmolecular partner in the host has commenced and this im-portant information should soon be available For examplevarious bacterial effectors including AvrRpm1 (PHI977)and AvrRpt2 (PHI979) are delivered into the plant cyto-plasm via the bacterial type III secretion system These ef-fectors interact with the Arabidopsis protein RIN4 (51)These protein interaction data sets are of growing impor-tance in the analysis of host-pathogen host-pest and host-parasite interactions as they typically represent communi-cation events that have co-evolved between biological king-doms For example Arabidopsis mutants harbouring T-DNA insertions within different host targets of specific ef-fectors were found to exhibit an enhanced disease resistance(edr) phenotype to both powdery and downy mildews (52)In addition by the inclusion of the corresponding host tar-gets more effectors from obligate biotrophic species canbe curated into PHI-base These effectors are rigorouslytested for their role in pathogenicity using a range of othertechniques but not those involving the generation of stablepathogen transformants

Database access and feedback

PHI-base can be freely accessed at httpwwwphi-baseorg The complete database can be downloaded fromthe lsquoDownloadrsquo section Prior to downloading the entiredatabase to create local Basic Local Alignment Search Tooldatabases or for other bioinformatics applications (Table 5)users are asked to fill in a registration form This allowsPHI-base to monitor the number of academic and indus-trial users a requirement by our sponsors

User support can be obtained from this emailcontactphi-baseorg Please use this email address ifyou wish to provide new data for inclusion in PHI-baseif you are an expert willing to assist with curation for thenomination of peer-reviewed papers to be curated or if youcan provide suggestions for improvement to the PHI-basewebsite

To increase the awareness of PHI-base developmentswithin the community and for users to be notified whennew releases occur we have developed a PHI-base usermailing list (userslistsphi-baseorg) Users can subscribefrom a link on the PHI-base website in the lsquoHelprsquo sectionor directly by going to httpswwwlistsrothamstedacukmailmanlistinfousers

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors would like to thank all the species experts whocontributed database annotations from their field of ex-pertise into PHI-base and the members of our ScientificAdvisory Board Dr Michael Csukai (Syngenta JealottrsquosHill UK) Professor Jonathan Jones (The Sainsbury Lab-oratory UK) Dr Leighton Pritchard (The James HuttonUK) and Professor Pietro Spanu (Imperial College UK)Dr Jan Taubert (formerly Rothamsted Research) is thankedfor his expert support in maintaining the PHI-base parserWe thank Drs Paul Kersey Uma Maheswari and DanStaines at the European Bioinformatics Institute (Cam-bridge) for helpful discussions and for significantly improv-ing the pathogen species content within Ensembl Genomes

FUNDING

This work is supported by the UK Biotechnologyand Biological Sciences Research Council (BBSRC)

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 9: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

Nucleic Acids Research 2015 Vol 43 Database issue D653

Table 5 PHI-base uses that have often appeared in the peer-reviewed literature

Use case Type of research studyExamplereference

1 Annotation and candidate gene selectionLarge scale forward genetics screens (3233)Transcriptome studies (RNAseq microarrays ESTs) (343645)Full and partial genome annotation genome mining (303137)

2 Predictive bioinformatics analyses Networks protein-protein interaction mapping (353839)3 Complementary databases (144041)4 Review articles (424446)5 Single gene function studies

Inter-comparisonand inter-comparison of gene mutants within and between species (43)

Tools for data mining

We are currently developing a new tool for the analysisand extraction of whole genomic data from plant pathogensas part of the PhytoPath project (httpwwwphytopathdborg) using the data warehousing framework BioMart (50)allowing users to mine genomic data (for sequence and an-notation) across multiple species based on PHI-base anno-tations in conjunction with other annotations The tool isexpected to be launched before the end of 2015

Other activities

Our intention is to extend the taxonomic range availablewithin PHI-base to sim200 host-infecting species within thenext 2 years At this level of species coverage detailed analy-ses within and between specific groups of pathogens withdifferent infection strategies host ranges taxonomic as-signments or between pathogenic and closely related non-pathogenic endophytic or symbiotic lifestyles should befeasible

In the next phase of curation a greater emphasis will beplaced on the effector literature which should increase thenumber of interactions from bacterial oomycete and ob-ligate biotrophic species To accompany this developmentthe curation of the corresponding host target(s) ie initialmolecular partner in the host has commenced and this im-portant information should soon be available For examplevarious bacterial effectors including AvrRpm1 (PHI977)and AvrRpt2 (PHI979) are delivered into the plant cyto-plasm via the bacterial type III secretion system These ef-fectors interact with the Arabidopsis protein RIN4 (51)These protein interaction data sets are of growing impor-tance in the analysis of host-pathogen host-pest and host-parasite interactions as they typically represent communi-cation events that have co-evolved between biological king-doms For example Arabidopsis mutants harbouring T-DNA insertions within different host targets of specific ef-fectors were found to exhibit an enhanced disease resistance(edr) phenotype to both powdery and downy mildews (52)In addition by the inclusion of the corresponding host tar-gets more effectors from obligate biotrophic species canbe curated into PHI-base These effectors are rigorouslytested for their role in pathogenicity using a range of othertechniques but not those involving the generation of stablepathogen transformants

Database access and feedback

PHI-base can be freely accessed at httpwwwphi-baseorg The complete database can be downloaded fromthe lsquoDownloadrsquo section Prior to downloading the entiredatabase to create local Basic Local Alignment Search Tooldatabases or for other bioinformatics applications (Table 5)users are asked to fill in a registration form This allowsPHI-base to monitor the number of academic and indus-trial users a requirement by our sponsors

User support can be obtained from this emailcontactphi-baseorg Please use this email address ifyou wish to provide new data for inclusion in PHI-baseif you are an expert willing to assist with curation for thenomination of peer-reviewed papers to be curated or if youcan provide suggestions for improvement to the PHI-basewebsite

To increase the awareness of PHI-base developmentswithin the community and for users to be notified whennew releases occur we have developed a PHI-base usermailing list (userslistsphi-baseorg) Users can subscribefrom a link on the PHI-base website in the lsquoHelprsquo sectionor directly by going to httpswwwlistsrothamstedacukmailmanlistinfousers

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors would like to thank all the species experts whocontributed database annotations from their field of ex-pertise into PHI-base and the members of our ScientificAdvisory Board Dr Michael Csukai (Syngenta JealottrsquosHill UK) Professor Jonathan Jones (The Sainsbury Lab-oratory UK) Dr Leighton Pritchard (The James HuttonUK) and Professor Pietro Spanu (Imperial College UK)Dr Jan Taubert (formerly Rothamsted Research) is thankedfor his expert support in maintaining the PHI-base parserWe thank Drs Paul Kersey Uma Maheswari and DanStaines at the European Bioinformatics Institute (Cam-bridge) for helpful discussions and for significantly improv-ing the pathogen species content within Ensembl Genomes

FUNDING

This work is supported by the UK Biotechnologyand Biological Sciences Research Council (BBSRC)

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 10: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

D654 Nucleic Acids Research 2015 Vol 43 Database issue

[BBI0010771 BBK0200561] PHI-base receives addi-tional support from the BBSRC as a National Capability[BBJ0043831] Funding for open access charge UKBiotechnology and Biological Sciences Research Council(BBSRC) [BBK0200561]Conflict of interest statement None declared

REFERENCES1 DanglJL HorvathDM and StaskawiczBJ (2013) Pivoting the

plant immune system from dissection to deployment Science 341746ndash751

2 FisherMC HenkDA BriggsCJ BrownsteinJS MadoffLCMcCrawSL and GurrSJ (2012) Emerging fungal threats to animalplant and ecosystem health Nature 484 186ndash194

3 NapoliC and StaskawiczB (1987) Molecular characterization andnucleic acid sequence of an avirulence gene from race 6 ofPseudomonas syringae pv glycinea J Bacteriol 169 572ndash578

4 StaskawiczBJ DahlbeckD and KeenNT (1984) Clonedavirulence gene of Pseudomonas syringae pv glycinea determinesrace-specific incompatibility on Glycine max (L) Merr Proc NatlAcad Sci USA 81 6024ndash6028

5 LioliosK TavernarakisN HugenholtzP and KyrpidesNC (2006)The Genomes On Line Database (GOLD) v2 a monitor of genomeprojects worldwide Nucleic Acids Res 34 D332ndashD334

6 NowaraD GayA LacommeC ShawJ RidoutC DouchkovDHenselG KumlehnJ and SchweizerP (2010) HIGS host-inducedgene silencing in the obligate biotrophic fungal pathogen Blumeriagraminis Plant cell 22 3130ndash3141

7 SonH SeoYS MinK ParkAR LeeJ JinJM LinY CaoPHongSY KimEK et al (2011) A phenome-based functionalanalysis of transcription factors in the cereal head blight fungusFusarium graminearum PLoS Pathog 7 e1002310

8 WangC ZhangS HouR ZhaoZ ZhengQ XuQ ZhengDWangG LiuH GaoX et al (2011) Functional analysis of thekinome of the wheat scab fungus Fusarium graminearum PLoSPathog 7 e1002460

9 Gene Ontology Consortium (2013) Gene Ontology annotations andresources Nucleic Acids Res 41 D530ndashD535

10 WinnenburgR BaldwinTK UrbanM RawlingsC KohlerJ andHammond-KosackKE (2006) PHI-base a new database forpathogen host interactions Nucleic Acids Res 34 D459ndashD464

11 WinnenburgR UrbanM BeachamA BaldwinTK HollandSLindebergM HansenH RawlingsC Hammond-KosackKE andKohlerJ (2008) PHI-base update additions to the pathogen hostinteraction database Nucleic Acids Res 36 D572ndashD576

12 LuT YaoB and ZhangC (2012) DFVF database of fungalvirulence factors Database 2012 doi101093databasebas032

13 HedelerC WongHM CornellMJ AlamI SoanesDMRattrayM HubbardSJ TalbotNJ OliverSG and PatonNW(2007) e-Fungi a data resource for comparative analysis of fungalgenomes BMC Genom 8 426

14 KerseyPJ AllenJE ChristensenM DavisP FalinLJGrabmuellerC HughesDS HumphreyJ KerhornouAKhobovaJ et al (2014) Ensembl Genomes 2013 scaling up access togenome-wide data Nucleic Acids Res 42 D546ndashD552

15 TripathyS DeoT and TylerBM (2012) Oomycete TranscriptomicsDatabase a resource for oomycete transcriptomes BMC Genom 13303

16 AurrecoecheaC BrestelliJ BrunkBP FischerS GajriaBGaoX GingleA GrantG HarbOS HeigesM et al (2010)EuPathDB a portal to eukaryotic pathogen databases Nucleic AcidsRes 38 D415ndashD419

17 StajichJE HarrisT BrunkBP BrestelliJ FischerS HarbOSKissingerJC LiW NayakV PinneyDF et al (2012) FungiDBan integrated functional genomics database for fungi Nucleic AcidsRes 40 D675ndashD681

18 KumarR and NanduriB (2010) HPIDBndasha unified resource forhost-pathogen interactions BMC Bioinformatics 11(Suppl 6) S16

19 GrigorievIV NikitinR HaridasS KuoA OhmR OtillarRRileyR SalamovA ZhaoX KorzeniewskiF et al (2014)

MycoCosm portal gearing up for 1000 fungal genomes NucleicAcids Res 42 D699ndashD704

20 XiangZ TianY and HeY (2007) PHIDIAS a pathogen-hostinteraction data integration and analysis system Genome Biol 8R150

21 DashS Van HemertJ HongL WiseRP and DickersonJA(2012) PLEXdb gene expression resources for plants and plantpathogens Nucleic Acids Res 40 D1194ndashD1201

22 ChenL XiongZ SunL YangJ and JinQ (2012) VFDB 2012update toward the genetic diversity and molecular evolution ofbacterial virulence factors Nucleic Acids Res 40 D641ndashD645

23 LeeWS Hammond-KosackKE and KanyukaK (2012) Barleystripe mosaic virus-mediated tools for investigating gene function incereal plants and their pathogens virus-induced gene silencinghost-mediated gene silencing and virus-mediated overexpression ofheterologous protein Plant Physiol 160 582ndash590

24 WilsonD Tutulan-CunitaA JungW HauserNC HernandezRWilliamsonT PiekarskaK RuppS YoungT and StatevaL(2007) Deletion of the high-affinity cAMP phosphodiesteraseencoded by PDE2 affects stress responses and virulence in Candidaalbicans Mol Microbiol 65 841ndash856

25 ZhangL ZhuZ JingH ZhangJ XiongY YanM GaoSWuLF XuJ and KanB (2009) Pleiotropic effects of thetwin-arginine translocation system on biofilm formationcolonization and virulence in Vibrio cholerae BMC Microbiol 9114

26 HuW SillaotsS LemieuxS DavisonJ KauffmanS BretonALinteauA XinC BowmanJ BeckerJ et al (2007) Essential geneidentification and drug target prioritization in Aspergillus fumigatusPLoS Pathog 3 e24

27 UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

28 GotzS Garcia-GomezJM TerolJ WilliamsTD NagarajSHNuedaMJ RoblesM TalonM DopazoJ and ConesaA (2008)High-throughput functional annotation and data mining with theBlast2GO suite Nucleic Acids Res 36 3420ndash3435

29 BaldwinTK WinnenburgR UrbanM RawlingsC KoehlerJand Hammond-KosackKE (2006) The pathogen-host interactionsdatabase (PHI-base) provides insights into generic and novel themesof pathogenicity MPMI 19 1451ndash1462

30 HaneJK AndersonJP WilliamsAH SperschneiderJ andSinghKB (2014) Genome sequencing and comparative genomics ofthe broad host-range pathogen Rhizoctonia solani AG8 PLoS Genet10 e1004281

31 DanchinEG ArguelMJ Campan-FournierAPerfus-BarbeochL MaglianoM RossoMN Da RochaM DaSilvaC NottetN LabadieK et al (2013) Identification of noveltarget genes for safer and more specific control of root-knotnematodes from a pan-genome mining PLoS Pathog 9 e1003745

32 JeonJ ParkSY ChiMH ChoiJ ParkJ RhoHS KimSGohJ YooS ChoiJ et al (2007) Genome-wide functional analysisof pathogenicity genes in the rice blast fungus Nat Genet 39561ndash565

33 CaiZ LiG LinC ShiT ZhaiL ChenY and HuangG (2013)Identifying pathogenicity genes in the rubber tree anthracnose fungusColletotrichum gloeosporioides through random insertionalmutagenesis Microbiol Res 168 340ndash350

34 VargasWA MartinJMS RechGE RiveraLP BenitoEPDiaz-MinguezJM ThonMR and SuknoSA (2012) Plant defensemechanisms are activated during biotrophic and necrotrophicdevelopment of Colletotricum graminicola in maize Plant Physiol158 1342ndash1358

35 SperschneiderJ GardinerDM TaylorJM HaneJK SinghKBand MannersJM (2013) A comparative hidden Markov modelanalysis pipeline identifies proteins characteristic of cereal-infectingfungi BMC Genom 14 807

36 ThakurK ChawlaV BhattiS SwarnkarMK KaurJShankarR and JhaG (2013) De novo transcriptome sequencingand analysis for Venturia inaequalis the devastating apple scabpathogen Plos One 8 e53937

37 LefebvreF JolyDL LabbeC TeichmannB LinningRBelzileF BakkerenG and BelangerRR (2013) The transition froma phytopathogenic smut ancestor to an anamorphic biocontrol agent

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019

Page 11: The Pathogen-Host Interactions database (PHI-base ...€¦ · The Pathogen-Host Interactions database (PHI-base): additions and future developments Martin Urban1,*, Rashmi Pant2,

Nucleic Acids Research 2015 Vol 43 Database issue D655

deciphered by comparative whole-genome analysis Plant Cell 251946ndash1959

38 SchlekerS Garcia-GarciaJ Klein-SeetharamanJ and OlivaB(2012) Prediction and comparison of Salmonella-human andSalmonella-Arabidopsis interactomes Chem Biodiv 9 991ndash1018

39 LiuX TangWH ZhaoXM and ChenL (2010) A networkapproach to predict pathogenic genes for Fusarium graminearumPLoS One 5 e13021

40 KourA GreerK ValentB OrbachMJ and SoderlundC (2012)MGOS development of a community annotation database forMagnaporthe oryzae MPMI 25 271ndash278

41 BlevesS DungerI WalterMC FrangoulidisD KastenmullerGVoulhouxR and RueppA (2014) HoPaCI-DB host-Pseudomonasand Coxiella interaction database Nucleic Acids Res 42D671ndashD676

42 Van De WouwAP and HowlettBJ (2011) Fungal pathogenicitygenes in the age of lsquoomicsrsquo Mol Plant Pathol 12 507ndash514

43 DoehlemannG ReissmannS AssmannD FleckensteinM andKahmannR (2011) Two linked genes encoding a secreted effectorand a membrane protein are essential for Ustilago maydis-inducedtumour formation Mol Microbiol 81 751ndash766

44 DickmanMB (2007) Subversion or coersion Pathogenicdeteminants in fungal phytopathogens Fungal Biol Rev 21 125ndash129

45 ZhangY ZhangK FangA HanY YangJ XueM BaoJHuD ZhouB SunX et al (2014) Specific adaptation of

Ustilaginoidea virens in occupying host florets revealed bycomparative and functional genomics Nat Commun 5 3849

46 CoolsHJ and Hammond-KosackKE (2013) Exploitation ofgenomics in fungicide research current status and futureperspectives Mol Plant Pathol 14 197ndash210

47 RutherfordKM HarrisMA LockA OliverSG and WoodV(2014) Canto an online tool for community literature curationBioinformatics 30 1791ndash1792

48 WoodV HarrisMA McDowallMD RutherfordKVaughanBW StainesDM AslettM LockA BahlerJKerseyPJ et al (2012) PomBase a comprehensive online resourcefor fission yeast Nucleic Acids Res 40 D695ndashD699

49 SmithB AshburnerM RosseC BardJ BugW CeustersWGoldbergLJ EilbeckK IrelandA MungallCJ et al (2007) TheOBO Foundry coordinated evolution of ontologies to supportbiomedical data integration Nat Biotechnol 25 1251ndash1255

50 KasprzykA (2011) BioMart driving a paradigm change inbiological data management Database 2011 bar049

51 JonesJD and DanglJL (2006) The plant immune system Nature444 323ndash329

52 WeszliglingR EppleP AltmannS HeY YangL HenzStefan RMcDonaldN WileyK BaderKai C et al (2014) Convergenttargeting of a common host protein-network by pathogen effectorsfrom three kingdoms of life Cell Host Microb 16 364ndash375

Dow

nloaded from httpsacadem

icoupcomnararticle-abstract43D

1D6452438794 by Periodicals Assistant - Library user on 08 February 2019


Recommended