Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Excerpts from: Studies on the Identification, Tracking and
Monitoring of Genetic ResourcesGeorge M. Garrity
Michigan State UniversityEast Lansing, MI USA
Lorraine Thompson, USADavid Ussery, Denmark
Norman Paskin, UKDwight Baker, USA
Philippe Desmethe, BelgiumDavid Schindel, USA
Perry Ong, Philippines
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
The study
Our charge To review of recent methods of identifying genetic resources directly base on DNA sequences
To identify methods of tracking and monitoring genetic resources through the use of persistent globally unique identifiers, including practicality, feasibility, costs, and benefits of different options.Our approach A design exercise to help develop baseline requirements for such a global tracking system to aid users and providers in complying with CBD ABS objectives.
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Key questions and definitions
How are genetic resources defined?Are genetic resources different from biological
resources?
Is the concept universal?
What metadata are associated with genetic resources and how is that metadata defined and supplied?
What are the ramifications of new genomic methods on identifying genetic resources?
What are persistent identifiers?Which persistent identifiers are used widely?
How do persistent identifiers differ?
Have tracking systems for genetic resources been deployed elsewhere?
What existing knowledge and technologies can be “leveraged” in creating such a system?
Questions
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Key events in parallel to the CBD timeline
1980 1985 2005200019951990 2010
AHWGBiodiversity
CBD entersinto force
UNECDRio de Janeiro
COP I
2 2
COP II
2
COP III
2
COP IV
2
COP V
2
COP VI
2
COP VII
2
COP VIII
2
COP IX
2
TCP/IP
Nameserver
DNS
Internet
WWW
UN goesonline
Netscape
VoIP
Google/Napster
Internetviruses & worms
blogsbegin
YouTube
PubMed
PubMedCentral
PMC OAIcompliant
PMCOAI
requirement
UKPMC
PCR
US Supreme
Court Decision
GenBank
GeneticFinger-printing
ABISequence
HumanGenomeInitiative
INSCDCreated
RDPCreated
HGPbegins
BLAST
NIHdata
directive
H. influenzagenome
M. jannaschiigenome
E. coligenome
C. elegansgenome
M. tuberculosisgenome
Drafthuman
genome
100,00016S rDNA
CBOLbegins
Human genomefinished
750,00016S rDNA
NGS
URNURN
PURLPURLARKARK
NNGS
LIMSLIMS
1000bacterialgenomes
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
An example of a tracking system
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Cascading workflow
All genetic resource types
Processed to yield multiple derivative samplesMost yield small numbers of samples that are
discarded. Others can yield 100-1000s, some of which may be retained for decades.
Each sample follows a predictable path through the system.
Derivative samples may be stored and reprocessed in the future for a variety or purposes.
Each sample is associated with one or more unique identifiers.*
Each sample is associated with various types of metadataSample source, testing history, contractual rights
and obligations, etc.Entire sample history can be reconstructed “on-the-
fly” if identifiers are actionable
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Examples of tracking identifiers
Centrally controlled numbering scheme, but semantically laden
General properties
Microorganisms
B325,797-001-001
Sequentialseries
leadnumber
batchnumber
samplenumber
Screening number
Culture collection number (internal/external)
0534-605F
year/week assay.series
XX-YYYYz
collection identifier sequential number
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Lessons learned
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Impact of genome sequencing
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Service (2006, 311: 1544) in Science. Reproduced with permission.
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
The “surprises” keep coming...
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Impact of 2nd and 3rd generation sequencing
Method Read length (bp)Cost/human
genomeRun time (h/GB) Ease of use
ABI Solid 35 60,000 42 difficult
Solexa 35 60,000 56 difficult
454-FLX 240-400 1,000,000 75 difficult
Helicos tSMS 30 70,000 ~12 easy
PacBio SMRT 100,000 Low <1 easy
Nanoporemethods
Potentially unlimited
Low >20 easy
ZS Genetics TMPotentially unlimited
Low ~14 easy
From: Gupta, PK, Trends in Biotechnology, (2008) 22: 602-611
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Cumulative number of published genomes
Source - Liolios et al.
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
A job unfinished
Source - D. Ussery
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
A job undone is still worth something...
Source - P. Chain
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
“Although used every day, identifiers are a mystery to many people, including people responsible for building complex information systems.”
Report of the NISO Identifiers Roundtable, 2006
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Essential elements inHuman - human communications
Human - machine communications
Machine - machine communications
Identifiers
Ideally… Exist as an unambiguous string
Context and application dependentActionable
Resolvable
Other points to considerSemantically opaque
Global or local
Unique or non-unique
Unanticipated uses
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
A name or an identifier for a resource that uniquely identifies that resource and will be forever associated with that resource. It will never be reassigned to any other resource and will not change regardless of where the resource is located or whatever protocol is used to access it.
Use of a well managed persistent identifier rather than a location will ensure that when a document is moved, or its ownership changes, the links to it will remain actionable.
Persistentidentifiers
From: Diana Dack, Persistence is a Virtue Information Online Conference, Sydney. January 2001
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
The concept of name resolution
PID URLPID1
PID2
PID3
URL1
URL2
URL3
Resource
Identifies LocatesName resolution
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Name registration&
Name resolution
Name registration&
Name resolution
AuthorityAuthority
PID URLPID1
PID2
PID3
URL1
URL2
URL3
ResourceMetadata
PID URL
IdentifiesIdentifies LocatesLocates
UserUser
Key metadataKey metadata
Global registryGlobal registry
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
The identifier gradient
A single unambiguous string
A numbering scheme
A label that identifies an entityISBN 0-387-98771-1
ATCC 27126
A method of providing consistent syntax to denote class membership of an entity.
An arbitrary internal system
A formal standard or industry convention
Key point is establishing a 1:1 correspondence between labels and members
Enumeration
The numbers or labels are simply strings
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
The identifier gradient
A syntax by which an identifier can be expressed in a form suitable for use within a specific infrastructure.
Actionable identifiers
URI (URN and URL)
ISBN numbers as UPC/EAN identifiers
Does not mandate a method of creating labels
Does not create a managed environment
An infrastructure specification
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
The identifier gradient
Includes
Unique identifiers
A formalized infrastructureManagement policies for registration,
structured interoperable metadata, policy, and governance mechanisms.
Examples
UPC/EAN barcodes and RFID tagsDigital object identifiers (digital identifiers of
objects)
A fully implementedidentifier system
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
<Handle>::=<Handle Prefix> "/"<Handle Suffix>
http://hdl.handle.net/10.1099/ijs.0.64483-0
LSIDLSID Life ScienceIdentifiers
<purl>::=<protocol>/<resolver>/<name>http://purl.oclc.org/OCLC/OCLC/PURL/FAQ
urn:<LSID>:<AuthorityID>:<Namespace>:<Object>:<Rev>
http://lsid.biopathways.org/resolver/data/urn:LSID:ncbi.nlm.nih.gov:GenBank/accession:NT_001063:2
Syntax of some other PIDs in “common” use
PersistentURLs
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Two implementations using DOIsIndependent membership association,founded and
directed by STM publishers. Mission is to connect users to primary research literature through a DOI RA that performs reference cross-linking, subject to publisher-access controls.
The largest and most successful implementation of DOI services.
NamesforLife is an experimental semantic resolution service for dynamic terminologies. It provides a method for persistently linking the occurrence of a biological name or other technical term in third party content to managed information about its origins, formal definition, current usage, and related information.
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
On-the-fly look-up services using DOIs
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
A proposed tracking system
Ad hoc Open-Ended Working Group on Access and Benefit SharingSeventh Meeting, Paris, 2-8 April 2009
Convention onBiological Diversity
Our recommendations
• Promptly establish the minimum information required for compliance with the IR Stipulate which documents are mandatory and which are optional.
• Adopt a well-developed and widely used PID system that leverages an existing infrastructure and derives support from multiple sources.
• Consider current and future needs of genetic resource providers and users. Biological and functional diversity and both must be accommodated.
• Deploy light-weight applications that use browser technology for interactive use. Publish application program interfaces to support other web services. Develop strong policies governing access and use of the resource to avoid data abuse. Trust is a key element.
• Deploy prototype tracking systems to validate underlying concepts and refine critical elements that will be needed in a fully operational system.