+ All Categories
Home > Documents > THE IPD-MHC DATABASE - Anthony Nolan · INTRODUCTION The IPD-MHC Database was first released in...

THE IPD-MHC DATABASE - Anthony Nolan · INTRODUCTION The IPD-MHC Database was first released in...

Date post: 12-Mar-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
1
THE IPD-MHC DATABASE James Robinson 1,2 , Giuseppe Maccari 3 , Ronald E. Bontrop 4 , Sam Ho 5 , Unni Grimholt 6 , Jim Kaufman 7 , Lisbeth A. Guethlein 8 , Keith Ballingall 9 , Steven GE. Marsh 1,2 , John A. Hammond 3 1 Anthony Nolan Research Institute, London, United Kingdom 2 UCL Cancer Institute, London, United Kingdom 3 The Pirbright Institute, Guildford, United Kingdom 4 Biomedical Primate Research Centre, Rijswijk, Netherlands 5 Gift of Life Michigan, Michigan, United States of America 6 Norwegian Veterinary Institute, Oslo, Norway 7 University of Cambridge, Cambridge, United Kingdom 8 Stanford University, Stanford, CA, United States of America 9 The Moredun Research Institute, Edinburgh, United Kingdom INTRODUCTION The IPD-MHC Database was first released in 2003 to provide a curated database of Major Histocompatibility Complex sequences from a number of non-human species. The system was modeled on the IPD-IMGT/HLA database but expanded to cover multiple species and nomenclature systems. The initial release contained data on non-human primates, felines and canines. Since this time further data on horse, sheep, swine, cattle, fish and rats has been added. Since its release, the database has grown rapidly and now contains nearly 7,000 alleles covering over 70 non-human species. The site averages ~1,500 unique visitors a month viewing ~5,000 pages. This growth has led to a number of challenges in maintaining and developing the site. The bioinformatics requirements of running a locus specific database across multiple nomenclature systems and locations have impacted on the performance and sustainability of the current IPD- MHC model. In 2015 a BBSRC Bioinformatics and Biological Resources (BBR) grant (UK) was awarded to fund both essential upgrades and future expansion of IPD-MHC. Work is currently focused on the underlying database structure, the public website and the submission procedures. The key aims of this first phase are development of a universal cross species data submission and display tool and streamlining and standardizing the work of the nomenclature committees in curating the data. This will allow for simpler and more frequent species updates. Future developments aim to incorporate cross- species alignment, primer design tools and the incorporation of Next Generation Sequencing data. Ultimately this will create an improved system capable of coping with large number of sequence variants, genes and species within the dataset. The project is overseen by a steering committee representing key stakeholders and non-human MHC nomenclature committees. This group is also helping to develop new MHC nomenclature standards and guidelines. BIOINFORMATICS CHALLENGES The current work is focussed on re-developing both the public front-end of the database, and the back-end utilised by the curators for the analysis and assignment of allele designations. The new version of the site, will contain the updated versions of current tools, as well as providing additional functionality for both the users and administrators of the system. IPD-MHC USER REDEVELOPMENT The following screenshots show examples of the new site and the development work that is currently underway. 1) 2) 3) 4) Figure 1: Panel 1: The IPD-MHC homepage, left, is being redeveloped to be act as a sign-post to the group pages, and also provide more information on the project. The main page will feature dynamic statistics, and tools for requesting new taxonomic designations. Panel 2: The individual group homepages, right, are been redesigned to provide more information and statistics on the species, genes and number of alleles, curated by each group. Panel 3 & 4: The current tools like the sequence alignments (3), above left, and allele reports (4), above right, are been redesigned with improved functionality whilst retaining the current data. IPD-MHC CURATOR REDEVELOPMENT The following screenshots show examples of the new site and the development work that is currently underway. 5) 6) 7) Figure 2: Extensive work is been undertaken to improve the tools for the curators of the data. This aim to improve the flow of data through the curation process and reduce the time from submission, to naming and subsequent publication. Panel 5 is an illustration of the new curators home-page, with enhanced graphic, change logs and improved navigation. Panel 6 shows how each individual allele entry, has a refined interface for viewing and processing the data submissions. Panel 7 highlights the tools and designed to aid in the assignment of official species designations. CONCLUSIONS The new IPD-MHC System will be a single cloud based resource utilising PHP, MySQL,Silex and other technologies to offer both the end-users and the curators a more streamlined and user friendly version of the current system. This work will enable the aims of the IPD-MHC project to be realised and high quality MHC data from a number of species disseminated to the wider scientific community. NHP OLA BoLA ELA SLA DLA FISH FLA RT1 Species 52 2 4 1 2 5 2 1 1 Genes 378 3 27 18 14 4 6 1 21 Alleles 5,894 127 367 39 303 103 107 20 105 Average Genes per Species 7 2 7 18 7 1 3 1 1 Average Alleles per Gene 16 42 14 2 22 26 17 20 5 Table 1: The IPD-MHC is currently populated with a large amount of data for a number of different projects and species. Re-development of IPD-MHC is supported by a BBSRC grant and by The Pirbright Institute and Anthony Nolan Reg charity no 803716/SC038827 978PA/0416 None of the authors have any conflict of interest to declare
Transcript
Page 1: THE IPD-MHC DATABASE - Anthony Nolan · INTRODUCTION The IPD-MHC Database was first released in 2003 to provide a curated database of Major Histocompatibility Complex sequences from

THE IPD-MHC DATABASE James Robinson 1,2, Giuseppe Maccari 3, Ronald E. Bontrop 4, Sam Ho 5, Unni Grimholt 6, Jim Kaufman 7, Lisbeth A. Guethlein 8, Keith Ballingall 9, Steven GE. Marsh 1,2, John A. Hammond 3 1 Anthony Nolan Research Institute, London, United Kingdom 2 UCL Cancer Institute, London, United Kingdom 3 The Pirbright Institute, Guildford, United Kingdom 4 Biomedical Primate Research Centre, Rijswijk, Netherlands 5 Gift of Life Michigan, Michigan, United States of America 6 Norwegian Veterinary Institute, Oslo, Norway 7 University of Cambridge, Cambridge, United Kingdom 8 Stanford University, Stanford, CA, United States of America 9 The Moredun Research Institute, Edinburgh, United Kingdom

INTRODUCTION The IPD-MHC Database was first released in 2003 to provide a curated database of Major Histocompatibility Complex sequences from a number of non-human species. The system was modeled on the IPD-IMGT/HLA database but expanded to cover multiple species and nomenclature systems. The initial release contained data on non-human primates, felines and canines. Since this time further data on horse, sheep, swine, cattle, fish and rats has been added. Since its release, the database has grown rapidly and now contains nearly 7,000 alleles covering over 70 non-human species. The site averages ~1,500 unique visitors a month viewing ~5,000 pages. This growth has led to a number of challenges in maintaining and developing the site. The bioinformatics requirements of running a locus specific database across multiple nomenclature systems and locations have impacted on the performance and sustainability of the current IPD-MHC model. In 2015 a BBSRC Bioinformatics and Biological Resources (BBR) grant (UK) was awarded to fund both essential upgrades and future expansion of IPD-MHC. Work is currently focused on the underlying database structure, the public website and the submission procedures. The key aims of this first phase are development of a universal cross species data submission and display tool and streamlining and standardizing the work of the nomenclature committees in curating the data. This will allow for simpler and more frequent species updates. Future developments aim to incorporate cross-species alignment, primer design tools and the incorporation of Next Generation Sequencing data. Ultimately this will create an improved system capable of coping with large number of sequence variants, genes and species within the dataset. The project is overseen by a steering committee representing key stakeholders and non-human MHC nomenclature committees. This group is also helping to develop new MHC nomenclature standards and guidelines.

BIOINFORMATICS CHALLENGES The current work is focussed on re-developing both the public front-end of the database, and the back-end utilised by the curators for the analysis and assignment of allele designations. The new version of the site, will contain the updated versions of current tools, as well as providing additional functionality for both the users and administrators of the system.

IPD-MHC USER REDEVELOPMENT The following screenshots show examples of the new site and the development work that is currently underway. 1) 2) 3) 4)

Figure 1: Panel 1: The IPD-MHC homepage, left, is being redeveloped to be act as a sign-post to the group pages, and also provide more information on the project. The main page will feature dynamic statistics, and tools for requesting new taxonomic designations. Panel 2: The individual group homepages, right, are been redesigned to provide more information and statistics on the species, genes and number of alleles, curated by each group. Panel 3 & 4: The current tools like the sequence alignments (3), above left, and allele reports (4), above right, are been redesigned with improved functionality whilst retaining the current data.

IPD-MHC CURATOR REDEVELOPMENT The following screenshots show examples of the new site and the development work that is currently underway. 5)  6)

7)

Figure 2: Extensive work is been undertaken to improve the tools for the curators of the data. This aim to improve the flow of data through the curation process and reduce the time from submission, to naming and subsequent publication. Panel 5 is an illustration of the new curators home-page, with enhanced graphic, change logs and improved navigation. Panel 6 shows how each individual allele entry, has a refined interface for viewing and processing the data submissions. Panel 7 highlights the tools and designed to aid in the assignment of official species designations.

CONCLUSIONS The new IPD-MHC System will be a single cloud based resource utilising PHP, MySQL,Silex and other technologies to offer both the end-users and the curators a more streamlined and user friendly version of the current system. This work will enable the aims of the IPD-MHC project to be realised and high quality MHC data from a number of species disseminated to the wider scientific community.

NHP OLA BoLA ELA SLA DLA FISH FLA RT1

Species 52 2 4 1 2 5 2 1 1

Genes 378 3 27 18 14 4 6 1 21

Alleles 5,894 127 367 39 303 103 107 20 105

Average Genes per Species 7 2 7 18 7 1 3 1 1

Average Alleles per Gene 16 42 14 2 22 26 17 20 5

Table 1: The IPD-MHC is currently populated with a large amount of data for a number of different projects and species.

Re-development of IPD-MHC is supported by a BBSRC grant and by The Pirbright Institute and Anthony Nolan

Reg charity no 803716/SC038827 978PA/0416 None of the authors have any conflict of interest to declare

Recommended