+ All Categories
Home > Documents > Handle System Overview February 2011 Larry Lannom Corporation for National Research Initiatives

Handle System Overview February 2011 Larry Lannom Corporation for National Research Initiatives

Date post: 19-Dec-2015
Category:
View: 220 times
Download: 1 times
Share this document with a friend
Popular Tags:
35
Handle System Overview February 2011 Larry Lannom Corporation for National Research Initiatives http://www.cnri.reston.va.us/ http://www.handle.net/
Transcript

Handle System Overview

February 2011

Larry LannomCorporation for National Research Initiatives

http://www.cnri.reston.va.us/http://www.handle.net/

Corporation for National Research Initiatives

Why Worry About Identifiers?

• Managing increasing amounts of primary and secondary data on the Net over long periods of time

• Managing increasingly complex data relationships on the Net over long periods of time

• When that data, its location(s), responsible parties, and the underlying systems may change dramatically over time

• Science builds on past work and increasingly relies on collaboration within virtual distributed communities

• All of this absolutely requires reliable, long-term persistent references to bind together the distributed data, processes, and parties involved

Role of Identifier Resolution Systems in InformationManagement on Networks

Client

Resource Discovery

Search Engines, Metadata Databases, Catalogues, Guides, etc.

<?xml version="1.0"?><description>……. </description>

<?xml version="1.0"?><description>……. </description>

<?xml version="1.0"?><description>…….

 </description>

<?xml version="1.0"?><note>  <to>John</to>  <from>Jane</from>  <heading>Reminder  <body>Don't forget me!</note>

<?xml version="1.0"?><note>  <to>John</to>  <from>Jane</from>  <heading>Reminder  <body>Don't forget me!</note>

Repositories / Collections

Identifier Resolution System

Handle System

• Provides basic identifier resolution system for Internet• Go from object name to current state data• Name can persist over changes in location and other attributes

• Logically a single system, but physically and organizationally distributed and highly scalable

• Enables association of one or more typed values, e.g., IP address, public key, URL, with each id

• Optimized for speed and reliability• Secure resolution with its own PKI as an option• Open, well-defined protocol and data model• Provides infrastructure for application domains, e.g., digital

libraries & publishing, e-research, id mgmt.

Corporation for National Research Initiatives

Handle System Usage

• Library of Congress• DTIC (Defense Technical Information Center)• IDF (International DOI Foundation)

– CrossRef (scholarly journal consortium, representing >2K publishers & societies)– DataCite (consortium of 9 members from 12 countries started by TIB)– EIDR (Entertainment Identifier Registry)– mEDRA (Multilingual European DOI Registration Agency)– R.R. Bowker (bibliographic data - ISBN)– Office of Publications of the European Community (OPOCE)– Wanfang Data

• OECD• National Agricultural Library/USDA• DSpace (MIT + HP)• ADL (DoD Advanced Distributed Learning initiative)• Australian National Data Service (ANDS)• EPIC (European Persistent Identifier Consortium)• GENI (Global Environment for Network Innovations)

Corporation for National Research Initiatives

• Assigned Prefixes– DOI – 211, 323– Other – 1,569

• Handles– DOI – 49.8 M– Other - Additional millions (total per prefix known only to prefix manager)

• Handle Services– Global

• Six service sites (three CNRI, one CrossRef, one CNNIC, one GWDG)– Locals

• >1000 registered LHS’s• Traffic

– Global: 100 million per month– CNRI-run proxy servers: tens of millions per month

Handle System Usage (Jan 2011)

Corporation for National Research Initiatives

HANDLE.NET Version 7.0

• Major upgrade; released December 2010• Berkeley DB is default storage system• Important new features:

• A single template handle in the form of a base formula will allow any number of extensions to that base to be resolved according to a pattern, without registering each as a handle.

• Handle values can be signed with "offline" private keys. • A new handle value type, 10320/loc, specifies a list of

URL locations (including information that differentiates the locations) to which a handle can resolve.

• A DNS interface means handle servers can be used to host DNS names.

Corporation for National Research Initiatives

• Server (v7.0)– Java 1.4.2 and higher

• Client Library– Java & C versions available

• Proxy servlet– Java servlet, typically runs under Apache Tomcat– Build your own or use hdl.handle.net

• Misc. CNRI software (admin tools, browser plug-ins, etc.)• Misc. community software (alternate clients, database modules,

etc.)• All available at www.handle.net• Alternate complete implementations

– Two known to CNRI, neither public– Both developed from spec, but they talked to us

Handle System Software

Handle String

• <prefix> / <suffix>• Examples

• 10.1525/bio.2009.59.5.9• 4263537/5030

• Character Set: Unicode 2.0• Encoding: UTF-8• Prefixes

• Currently allocating only numeric• Any text possible

Handles Resolve to Typed Data

Handle Data Type Handle Data

10.123/456 URL http://acme.com/...

URL http://a-books.com/...

HS_ADMIN user123

XYZ 1001110011110

Corporation for National Research Initiatives

10.1525/bio.2009.59.5.9 http://caliber.ucpress.net/doi/abs/10.1525/bio.2009.59.5.9URL

HS_ADMIN handle=0.na/10.1525; index=200; [delete hdl,add val,read val,modify val,del admin,add admin,list]

10320/loc <locations chooseby="locatt, country, weighted"> <location id="1" cr_type="MR-LIST" href="http://mr.crossref.org/ iPage?doi=10.1525%2Fbio.2009.59.5.9" weight="1" /> <location id="2" cr_src="unca" label="SECONDARY_BIOONE" cr_type="MR-LIST" href="http://www.bioone.org/doi/full/10.1525/ bio.2009.59.5.9" weight="0" /> </locations>

Handles Resolve to Typed Data

Handle Data Type Handle Data

Corporation for National Research Initiatives

Handle Resolution

The Handle Systemis a collection ofhandle services,

GHRLHS

LHS LHS

LHS

each of which consists of one ormore replicated sites,

Site 1 Site 2

Site 1

Site 2

Site 3 …... Site n

each of which mayhave one or moreservers.

123.456/abc URL 4 http://www.acme.com/

http://www.ideal.com/8URL

#1 #2 #n#4#3

#1 #2

...

Corporation for National Research Initiatives

Handle Clients

Global Handle Registry

Client gets requestto resolve hdl:123/456

1. Client sends request to Global to resolve 0.NA/123 (prefix handle for 123/456)hdl:123/456

Corporation for National Research Initiatives

Handle Clients

Global Handle Registry

Client gets requestto resolve hdl:123/456

2. Global Responds with Service Information for 123

Service InformationAcme Local Handle Service

IP xc xc xc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

..

..

..

xcxcxc

..

..

..

xcxcxc

..

..

..

...

xcccxvxccxxccx

xcccxvxccxxccx

xcccxvxccxxccx

hdl:123/456

Corporation for National Research Initiatives

Handle Clients

Primary Site

123.45.67.8

Port #

Secondary Site B

Server 1

Server 1

Server 2

Server 3

Server 1

Server 2 123.52.67.9

321.54.678.12

321.54.678.14

762.34.1.1

123.45.67.4

Public Key ...

2641

K03RLQ...

2641

2641

2641

2641

2641

5&M#FG...

F^*JLS...

3E$T%...

A2S4D...

N0L8H7...

...

...

...

...

...

...

IP Address

Secondary Site A

xcccxv xc xc xc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

..

..

..

xcxcxc

..

..

..

xcxcxc

..

..

..

...

xcccxvxccxxccx

xcccxvxccxxccx

xcccxvxccxxccx

Service Information - Acme Local Handle Service

Corporation for National Research Initiatives

Handle Clients

Primary Site

123.45.67.8

Port #

Secondary Site B

Server 1

Server 1

Server 2

Server 3

Server 1

Server 2 123.52.67.9

321.54.678.12

321.54.678.14

762.34.1.1

123.45.67.4

Public Key ...

2641

K03RLQ...

2641

2641

2641

2641

2641

5&M#FG...

F^*JLS...

3E$T%...

A2S4D...

N0L8H7...

...

...

...

...

...

...

IP Address

Secondary Site A

xcccxv xc xc xc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

..

..

..

xcxcxc

..

..

..

xcxcxc

..

..

..

...

xcccxvxccxxccx

xcccxvxccxxccx

xcccxvxccxxccx

Service Information - Acme Local Handle Service

Corporation for National Research Initiatives

Handle Clients

Primary Site

123.45.67.8

Port #

Secondary Site B

Server 1

Server 1

Server 2

Server 3

Server 1

Server 2 123.52.67.9

321.54.678.12

321.54.678.14

762.34.1.1

123.45.67.4

Public Key ...

2641

K03RLQ...

2641

2641

2641

2641

2641

5&M#FG...

F^*JLS...

3E$T%...

A2S4D...

N0L8H7...

...

...

...

...

...

...

IP Address

Secondary Site A

xcccxv xc xc xc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

..

..

..

xcxcxc

..

..

..

xcxcxc

..

..

..

...

xcccxvxccxxccx

xcccxvxccxxccx

xcccxvxccxxccx

Service Information - Acme Local Handle Service

Corporation for National Research Initiatives

Handle Clients

Client gets requestto resolve hdl:123/456

hdl:123/456

3. Client queries Server 3 in Secondary Site A for 10.1000/1

#1

#1#2

#3

Secondary Site A

Secondary Site B

Acme Local Handle Service

Global Handle Registry

#1 #2

Primary Site

Corporation for National Research Initiatives

Handle Clients

Client gets requestto resolve hdl:123/456

hdl:123/456

#1

#1#2

#3

Secondary Site A

Secondary Site B

Acme Local Handle Service

Global Handle Registry

#1 #2

Primary Site

4. Server responds with handle data

Corporation for National Research Initiatives

LHS

LHS LHS LHS

LHS

LHS

LHS

LHSGHR

LHS

Handle Clients

Handle System

Proxy/Web Server

HTTP Get

HandleResolution

http://hdl.handle.net/123/456

Resolution With a Web Browser

Corporation for National Research Initiatives

Handle Clients

Resolution With a Web Browser

LHS

LHS LHS LHS

LHS

LHS

LHS

LHSGHR

LHS

Handle System

http://acme.com/index.html

Proxy/Web Server

HTTP Redirect

HandleData

Corporation for National Research Initiatives

LHS

LHS LHS LHS

LHS

LHS

LHS

LHSGHR

LHS

Handle Clients

Handle System

Resolution with a Handle Client Plug-in

hdl:123/456

HandleResolution

HandleData

Corporation for National Research Initiatives

Handle Clients

LHS

LHS LHS LHS

LHS

LHS

LHS

LHSGHR

LHS

Handle System

Handle Admin via Web Form

Web Server and/or AdminServlets

Corporation for National Research Initiatives

Handle Clients

LHS

LHS LHS LHS

LHS

LHS

LHS

LHSGHR

LHS

Handle System

Handle Admin via Web Form

Web Server and/or AdminServlets

Corporation for National Research Initiatives

Handle Clients

LHS

LHS LHS LHS

LHS

LHS

LHS

LHSGHR

LHS

Handle System

Custom Admin Client

Corporation for National Research Initiatives

Handle Clients

LHS

LHS LHS LHS

LHS

LHS

LHS

LHSGHR

LHS

Handle System

Handle AdministrationEmbedded in

Another Process

Handle ResolutionEmbedded in

Another Process

Corporation for National Research Initiatives

Template Handles

• An unlimited number of handles are computed on the fly from a single registered template

• Re-write rules and delimiter can be defined at the prefix level, e.g., use ‘-’ as delimiter and re-write any URL values, e.g., for any handle under the prefix 123

• Any handle under that prefix can be divided into base and extension, e.g., 123/456-abc has a base of 123/456 and and extension of abc. The base is registered.

• The data at 123/456 will then be combined with the extension string (abc) using the re-write rule

• Resolve “123/456-abc” and get back http://repository.com/getobject?id=123/456&part=abc

• Resolve “123/456-def” and get back http://repository.com/getobject?id=123/456&part=def

Corporation for National Research Initiatives

Template Handles

• Directly results from modularity of the current implementation• Backend handle storage is pluggable• A new storage module allows handles to be computed• The rest of the handle resolution mechanisms are unchanged,

only the storage module was enhanced• Any exception handles can be individually registered to over-ride

the template• Re-write rules at the base level will over-ride the prefix level rules• Re-write rules use Java regular expression language• Templates allow handle strings to remain static in reference form

while millions of resolution values can be changed at a single stroke

Corporation for National Research Initiatives

Offline Signatures

• Handle values can be signed with "offline" private keys that need not exist on any Internet-connected machine.

• This additional layer of verification has been applied to all entries in the Global Handle Registry.

• Any party that has the authority to create handle records can use this capability to sign their handle records.

• There is a simple (but flexible) API for building handle value digests and signing those digests.

Corporation for National Research Initiatives

Multiple Resolution

• Structured alternatives, e.g., multiple locations, in a single handle value• Include selection criteria in that same value• Handle client application, e.g., proxy server, performs evaluation• Type = 10320/loc; value =

• <locations chooseby=“locatt, country, weight”>– <location id=0 href=“http://abc…. Country=“gb” weight=0>– <location id=1 href=“http://def… weight=1>– <location id=2 href=“http://xyz… weight=1>

• <locations/>• If the user is in the UK they are redirected to http://abc…, if not then

either http://def... or http://xyz... at random, 50/50• Currently deployed in CNRI-run proxies and also available in the open

source proxy code• Approach extensible for future selection methods, e.g., chooseby

language or other value known to the proxy

Corporation for National Research Initiatives

The evaluation falls through the first two criteria and the proxy uses 'weighted' as the selection criteria. The first location (http://mr.crossref.org) wins with a weight of 1.

That location goes to a script on the CrossRef site that builds the page a user sees when resolving the DOI name as http://dx.doi.org/10.1525/bio.2009.59.5.9. The page is built to include the original URL value

plus the 10320/loc data plus some additional information held by CrossRef.

10.1525/bio.2009.59.5.9 http://caliber.ucpress.net/doi/abs/10.1525/bio.2009.59.5.9URL

HS_ADMIN handle=0.na/10.1525; index=200; [delete hdl,add val,read val,modify val,del admin,add admin,list]

Multiple Resolution "Chooseby"

10320/loc <locations chooseby="locatt, country, weighted"> <location id="1" cr_type="MR-LIST" href="http://mr.crossref.org/ iPage?doi=10.1525%2Fbio.2009.59.5.9" weight="1" /> <location id="2" cr_src="unca" label="SECONDARY_BIOONE" cr_type="MR-LIST" href="http://www.bioone.org/doi/full/10.1525/ bio.2009.59.5.9" weight="0" /> </locations>

Corporation for National Research Initiatives

The page displayed includes both the original URL and the added BioOne link:

TYPE = URLVALUE = http://caliber.ucpress.net/doi/abs/10.1525/bio.2009.59.5.9

TYPE = 10320/locVALUE = http://www.bioone.org/doi/full/10.1525/bio.2009.59.5.9

Multiple Resolution "Chooseby"

Corporation for National Research Initiatives

Resolving to Metadata: Special Cases• Use the multiple resolution option (handle value type

10320/loc) to redirect to metadata services• Allow it to be defined at the prefix level, with individual

handle override

• Trigger by content negotiation in http request (linked data)

• Trigger by URL parameters

• Being tested with DOIs• Test version of dx.doi.org proxy up and running since mid-

October

• All non-standard content negotiation requests would go to RA based services, e.g., metadata.crossref.org

• Requested specific metadata through URL parameters, redirected to some service, e.g., EIDR registry

Corporation for National Research Initiatives

Using a Resolution System With Existing Identifiers

• No lack of identifiers in the world• Actionable ISBN scheme

– Example: 10.97812345/99990– The syntax specification, reading from left to right,

is:• Handle System DOI name prefix = "10.”• ISBN (GS1) Bookland prefix = "978." or "979.”• ISBN Publisher prefix = variable length numeric

string of 2 to 8 digits• Prefix/suffix divider = "/”• ISBN Title enumerator and checkdigit =

variable length numeric string of 8 to 2 digits

Corporation for National Research Initiatives

• Specification– RFC 3650: Overview– RFC 3651: Namespace and Service Definition– RFC 3652: Protocol

• DoDI 1322.26• ISO standards track for DOI• U.S. Patent 6,135,646

– Intent was to protect the technology as usage grew– Never used by CNRI, but has been referenced by others as prior art– It has served its purpose well and it expires in 2013

• HSAC - Handle System Advisory Committee– Approx 15 members representing big users– Maturation has diminished need for advice– Time for the next stage

Handle System Management & Standards


Recommended