+ All Categories
Home > Documents > Dan Masiga Molecular Biology and Biotechnology Department International Centre of Insect Physiology...

Dan Masiga Molecular Biology and Biotechnology Department International Centre of Insect Physiology...

Date post: 24-Dec-2015
Category:
Upload: theodore-spencer
View: 218 times
Download: 1 times
Share this document with a friend
Popular Tags:
19
Dan Masiga Molecular Biology and Biotechnology Department International Centre of Insect Physiology and Ecology, Nairobi, Kenya The BARCODE Data BARCODE Data Standard Standard: Enabling Molecular Diagnostics for Biodivesity Western and Central Africa: DNA barcoding Meeting One-day course on DNA barcoding: Practical advice 23rd October 2008
Transcript

Dan MasigaMolecular Biology and Biotechnology DepartmentInternational Centre of Insect Physiology and Ecology, Nairobi, Kenya

The BARCODE Data StandardBARCODE Data Standard: Enabling Molecular Diagnostics

for Biodivesity

Western and Central Africa: DNA barcoding MeetingOne-day course on DNA barcoding: Practical advice23rd October 2008

New partners

The Infrastructure of Taxonomy

• Collections and databases of specimens• Codes of Taxonomic Nomenclature• Compilations of taxonomic names• Data repositories (characters, gene

sequences, images, trees)• Monographs• Floristic and faunistic surveys/inventories• Revisions• The (undigitized) Taxonomic Literature

International Nucleotide Sequence Database Collaboration

http://www.insdc.org/

Roles of INSDCan archival database/repository

for nucleotide sequence

Output of Project A

Output of Project B

Output of Project C

Common

access

interface

Standardization of data structure including data items and values

Assignment of a unique identifier (an accession number) to a sequence

Users

New tools for taxonomyD

NA

Barc

od

ing

The ability to compare genotype information across a huge range of organisms is a powerful tool

“Only [27%] of papers had a legitimate specimens examined section, with museum numbers for each

voucher, and names of the museums where the specimens used in the study could be examined”

Couplets Consisting of:

“Species Name - DNA Sequence”DNA Sequence”

•Basis of a “look-up table” enabling molecular diagnostic applications

•However, both elements need validation

•Underlying specimens and associated raw sequence data are not typically available for secondary inspection

Problem Areas

TRANSPARENCY AND TRACEABILITY

• Genetic Data Quality• Specimen Data Quality• Taxonomy • Access to Information

Barcoders began calling for a Paradigm ShiftParadigm Shift

Depositing barcode sequences in public database, along with primer sequences, trace files and associated quality scores makes this species identification technique widely accessible. Reference

DNA barcode sequences should be derived from, and liked to, specimens of known promenance in web-accessible collections in

order to validate this system of molecular diagnostics.

Rationale for Defining “BARCODE” keyword in GenBank

• Provides the community with reference records with verifiable and retrievable data:– Associated with retrievable voucher specimens

(liberally defined: tissue, DNA, etc.)– Linked to on-line metadata– Meet an agreed upon standard of taxonomic

identification– Provide an assured level of data completeness– On an agreed upon gene region – Recommended for use in identifying unknowns

The Barcode Data StandardBarcode Data Standard Establishing a new data standard for “BARCODE”

keyword records in DDBJ/EMBL/GenBank:

1.Minimum 500bp, <1% ambiguous base calls2.Double stranded sequence3.Trace files and associated quality scores4.Primers used to generate sequence5.Linkages to:

• A morphological voucher specimen• Structured reference to collections• Geospatial reference information• Valid species name• Who performed the identification• Literature citations

Features, Qualifiers and Values

The Feature table is updated based on discussions at the International Collaborators meeting of INSDC

NCBI Trace Archive accepts BARCODE as a keyword that identifies “a DNA

sequence analysis of a uniform target gene to enable species identification”

Triplet structure for specimen identifiers

/specimen_voucher=“<institution-code>|<collection-code>|<specimen-id>”

<institution-code>- abbreviation of the archiving institution <collection-code>- collection within the institution (*) <specimen-id>- specimen identifier within the collection The above approach is used in the DarwinCore/GBIF and is parallel to the Life Science Identifier (LSID) that is an Object Management Group (OMG) standard.

(*) museums & herbaria culture collections stock centers germplasm repositories (seed banks) frozen tissue banks zoos/aquaria/botanical gardens DNA banks, personal collections e-voucher archives

Link from GenBank to Museums

www.biorepositories.org

Process Record

acknowledgments• Lee Weight, Smithsonian Institution• Scott Miller, PI CBOL• David Schindel, Executive Secretary, CBOL• Sujeevan Ratnasingham, Biodiversity Institute of

Ontario (BIO)/BOLD• Robert Hanner (BIO)• Organizers: Western and Central Africa DNA

barcoding Meeting (NABDA & CBOL Secretariat)


Recommended