+ All Categories
Home > Business > CrossRef Technical Working Group

CrossRef Technical Working Group

Date post: 30-Jun-2015
Category:
Upload: crossref
View: 2,076 times
Download: 3 times
Share this document with a friend
Description:
Presentation from Chuck Koscher at the 2009 Technical Working Group meeting in Cambridge, MA
26
1 CrossRef 2009 Annual Member Meeting - Boston Page 1 Agenda 2:00 – 2:20 Status, Changes, Issues - Chuck Koscher 2:20 – 2:45 Metadata Quality - Chuck Koscher 2:45 – 3:00 System rewrite - Chuck Koscher 3:00 – 3:20 New Initiatives - Geoff Bilder 3:20 – 3:30 Coffee and Tea Break 3:30 – 5:00 Publisher system discussions - PLOS - Richard Cave - APA - Beverly Jamison - J. Wiley & Sons - Matt Larson
Transcript
Page 1: CrossRef Technical Working Group

1

CrossRef 2009 Annual Member Meeting - Boston Page 1

Agenda

2:00 – 2:20 Status, Changes, Issues - Chuck Koscher

2:20 – 2:45 Metadata Quality - Chuck Koscher

2:45 – 3:00 System rewrite - Chuck Koscher

3:00 – 3:20 New Initiatives - Geoff Bilder

3:20 – 3:30 Coffee and Tea Break

3:30 – 5:00 Publisher system discussions - PLOS - Richard Cave - APA - Beverly Jamison - J. Wiley & Sons - Matt Larson

Page 2: CrossRef Technical Working Group

2

CrossRef 2009 Annual Member Meeting - Boston Page 2

2002 2005 2009

10/13/2009 1.900 sec (heavy load) 1.2 (moderate load) 0.680 (light load) 2007 0.500 sec 2005 0.300 sec 2003 0.625 sec

  Query response time load

System status

Page 3: CrossRef Technical Working Group

3

CrossRef 2009 Annual Member Meeting - Boston Page 3

  Deposit times (2009) System status

June July August Sept October Less than 5 mn: 107888 (53 %) 141105 (83 %) 131661 (91 %) 83379 (57 %) 33546 (52 %)

Less than 1 hr: 35189 (17 %) 22389 (13 %) 10753 (7 %) 33829 (23 %) 18165 (28 %)

Less than 6 hr: 31666 (15 %) 3666 (2 %) 903 (0 %) 24201 (16 %) 8037 (12 %)

Less than 12 hr: 23482 (11 %) 181 (0 %) 0 (0 %) 2411 (1 %) 1855 (2 %)

Less than 18 hr: 4019 (1 %) 713 (0 %) 0 (0 %) 968 (0 %) 1950 (3 %)

Less than 24 hr 0 (0 %) 3 (0 %) 0 (0 %) 0 (0 %) 0 (0 %)

More than 24 hr: 0 (0 %) 1 (0 %) 1 (0 %) 1 (0 %) 0(0 %)

Total deposits: 203001 168058 143318 144790 63555

Page 4: CrossRef Technical Working Group

4

CrossRef 2009 Annual Member Meeting - Boston Page 4

System status

  Operations changes

  Starting to use HAProxy for internal load balancing and redundancy   Using Alertra for external monitoring   VMWare virtual servers   Now migrating Oracle from 9 to 11g (allows active read-only standby)   Using Jira for all [email protected] activities   Berkeley DB based service for OpenURL DOI queries (metadata lookups)

  Testing a process for <unstructured_citations>   Two technologies being used

  refXpress from Inera which parses a reference and breaks it into parts   CitationQueryEngine, internally developed Lucene based search

 Trial run Number of unstructured citations : 1,158,889 Number of DOIs processed : 3,150,525 Number of refXPress DOIs found : 47,165 Number of CQE DOIs found (score>2.2) : 139,721

Page 5: CrossRef Technical Working Group

5

CrossRef 2009 Annual Member Meeting - Boston Page 5

<citation key="10.1016/S0736-0266(02)00040-2-BIB21"> <author>Valero-Cuevas</author> <cYear>2000</cYear> <unstructured_citation> Applying principles of robotics to understand the biomechanics, neuromuscular control and clinical rehabilitation of human digits. In: IEEE International Conference on Robotics and Automation, San Francisco, CA, 2000. </unstructured_citation> </citation>

  CQE: score 3.159, refXpress: unparsed, XMLquery: nomatch

1

Page 6: CrossRef Technical Working Group

6

CrossRef 2009 Annual Member Meeting - Boston Page 6

<citation key="BIB14"> <volume_title>Macromolecules 1995</volume_title> <author>Butler</author> <unstructured_citation>; ; ; ; ; Macromolecules 1995, 28: 6383. </unstructured_citation> </citation>

  CQE: score 1.89 , refXpress: parsed, XMLquery: nomatch 10.1021/ma00123a001 10.1021/ma00123a001 -

2

Page 7: CrossRef Technical Working Group

7

CrossRef 2009 Annual Member Meeting - Boston Page 7

  CQE: score 0.48 , refXpress: unparsed, XMLquery: nomatch 10.1007/s11626-009-9184-7 - -

<citation key="BIB3"> <volume_title>Taurine 3: cellular and regulatory mechanisms</volume_title> <author>Chen</author> <first_page>397</first_page> <cYear>1998a</cYear> <unstructured_citation> 1998a. Effect of taurine on human fetal neuron cells: proliferation and differentiation. In: editors. Taurine 3: cellular and regulatory mechanisms. New York: Kluwer Academic/Plenum Publishers. p 397-403. </unstructured_citation> </citation>

Springer has assigned DOIs to Taurine 4,6 and 7!

3

Page 8: CrossRef Technical Working Group

8

CrossRef 2009 Annual Member Meeting - Boston Page 8

  CQE: score 0.99, refXpress: parsed, XMLquery: nomatch 10.1007/BF01067277 10.1023/A:1015801207091

<citation key="10.1021/js950353+-BIB19"> <volume_title>Pharm. Res.</volume_title> <author>Shukla</author> <volume>8</volume> <first_page>1396</first_page> <cYear>1991</cYear> <unstructured_citation>; Pharm. Res. 1991, 8, 1396-1400.</unstructured_citation> </citation>

<query key="MyKey4"> <issn>0724-8741</issn> <journal_title>Pharm. Res.</journal_title> <volume>8</volume> <first_page>1396</first_page> <year>1991</year> </query>

<query key="10.1021/js950353+-BIB19"> <volume_title>Pharm. Res.</volume_title> <author>Shukla</author> <volume>8</volume> <first_page>1396</first_page> <year>1991</year> </query>

4

Page 9: CrossRef Technical Working Group

9

CrossRef 2009 Annual Member Meeting - Boston Page 9

<citation key="b64_1025"> <unstructured_citation> Xu C, Taoka S, Crofts AR, Govindjee (1991) Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids. Biochim Biophys Acta 1098: 32-40 </unstructured_citation> </citation>

  CQE: score 2.39, refXpress: semi-parsed, XMLquery: nomatch 10.1016/0167-4838(91)90582-K 10.1016/0005-2728(91)90006-A

5

<journal_title> Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology </journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>C</given_name> <surname>Xu </surname> </contributor> </contributors> <volume>1098</volume> <issue>1</issue> <first_page>32</first_page> <last_page>40</last_page> <year media_type="print">1991</year> <publication_type>full_text</publication_type> <article_title> Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids </article_title>

<journal_title> Biochimica et Biophysica Acta (BBA) - Bioenergetics </journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>C</given_name> <surname>Xu </surname> </contributor> </contributors> <volume>1098</volume> <issue>1</issue> <first_page>32</first_page> <last_page>40</last_page> <year media_type="print">1991</year> <publication_type>full_text</publication_type> <article_title> Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids </article_title>

Page 10: CrossRef Technical Working Group

10

CrossRef 2009 Annual Member Meeting - Boston Page 10

<citation key="b53_366"> <unstructured_citation> 53. O.S. Gudmundsson, S.D.S. Jois, D.G. Vander Velde, T.J. Siahaan, B. Wang, and R.T. Borchardt (1999 ) The effect of conformation on the membrane permeability of coumarinic acid- and phenylpropionic acid-based cyclic prodrugs of opioid peptides.J. Pept. Res.53 , 383 -392 . </unstructured_citation> </citation>

  CQE: score 3.41, refXpress: semi-parsed, XMLquery: nomatch 10.1034/j.1399-3011.1999.00077.x 10.1034/j.1399-3011.1999.00076.x

<doi type="journal_article"> 10.1034/j.1399-3011.1999.00076.x</doi> <issn type="print">1397-002X</issn> <issn type="electronic">1399-3011</issn> <journal_title>Journal of Peptide Research</journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>O.S.</given_name> <surname>Gudmundsson</surname> </contributor> </contributors> <volume>53</volume> <issue>4</issue> <first_page>383</first_page> <last_page>392</last_page> <year media_type="print">1999</year> <publication_type>full_text</publication_type> <article_title> The effect of conformation on the membrane permeation of coumarinic acid- and phenylpropionic acid-based cyclic prodrugs of opioid peptides </article_title>

<doi type="journal_article"> 10.1034/j.1399-3011.1999.00077.x</doi> <issn type="print">1397-002X</issn> <issn type="electronic">1399-3011</issn> <journal_title>Journal of Peptide Research</journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>O.S.</given_name> <surname>Gudmundsson</surname> </contributor> </contributors> <volume>53</volume> <issue>4</issue> <first_page>403</first_page> <last_page>413</last_page> <year media_type="print">1999</year> <publication_type>full_text</publication_type> <article_title> The effect of conformation of the acyloxyalkoxy-based cyclic prodrugs of opioid peptides on their membrane permeability </article_title>

6

Page 11: CrossRef Technical Working Group

11

CrossRef 2009 Annual Member Meeting - Boston Page 11

Changes (problems)

  Notable software error this past 12 months

  URLs in Handle rewritten with an older value (effected some publishers who had deposited as-crawled URLs AND did URL mods via ownership transfer)

  Medium-big changes   Book Volume-title/ author/ year rule: match on (only) Book title DOIs (sample2)   Added a false positive prevention rule

IF a (XML) query contains article title and that title is not an exact match with the deposited title DO NOT MATCH except if author and first-page are an EXACT match

  Small-medium changes   Matching special characters in author names   Matching compound surnames   Removed ability to avoid conflicts   DOI character limits: "a-z", "A-Z", "0-9" and "-._;()/“

  Title lock-down (ISSN check disallowing a deposit)

Page 12: CrossRef Technical Working Group

12

CrossRef 2009 Annual Member Meeting - Boston Page 12

Issues

  Ongoing …..

  Too many alternative (publication) titles. Be CAREFULL!!! Can really mess up title fuzzy matching (we do have a schematron monitor)

  Deleting DOIs:   Change the publication title   Change the DOI’s title (article title) to the DOI itself   Remove optional metadata   Set publication date to the deletion date

  Conflicts

  Conflicts reduce matching rates!

  Timestamps

DOIs are deposited with a timestamp to ensure the latest metadata gets inserted. Timestamps are essential when we have to re-process deposits. Problems occur when DOI ownership occurs (e.g. what is the timestamp?)

Solution: Crossref will provide a means to retrieve current timestamp.

New

Page 13: CrossRef Technical Working Group

13

CrossRef 2009 Annual Member Meeting - Boston Page 13

=========================================== Created: 2006-04-04 04:10:03.0 ConfID: 263262 CauseID: 246648646 OtherID: 64341060, JT: Ophthalmic and Physiological Optics MD: Brown, 15 ,3,163,1995,Differences in visual acuity between the eyes: determination of normal limits in a clinical population DOI: 10.1046/j.1475-1313.1995.9590568m.x(85579-R 263262-null )

DOI: 10.1016/0275-5408(95)90568-M ===========================================

Conflicts

<query key="MyKey1" enable-multiple-hits="false"> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <first_page>163</first_page> <year>1995</year> </query>

<query key="MyKey1" status="multiresolved" fl_count="2"> <doi type="journal_article">10.1046/j.1475-1313.1995.9590568m.x</doi> <issn type="print">02755408</issn> <issn type="electronic">14751313</issn> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <issue>3</issue> <first_page>163</first_page> <year>1995</year> <publication_type>full_text</publication_type> </query> <query key="MyKey1" status="multiresolved" fl_count="0"> <doi type="journal_article">10.1016/0275-5408(95)90568-M</doi> <issn type="print">02755408</issn> <issn type="electronic">14751313</issn> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <issue>3</issue> <first_page>163</first_page> <year>1995</year> <publication_type>full_text</publication_type> </query>

<query key="MyKey1" status="unresolved" fl_count="0"> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <first_page>163</first_page> <year>1995</year> </query>

Match Fails

enable-multiple-hits=“true"

Page 14: CrossRef Technical Working Group

14

CrossRef 2009 Annual Member Meeting - Boston Page 14

Conflicts

Page 15: CrossRef Technical Working Group

15

CrossRef 2009 Annual Member Meeting - Boston Page 15

  What to do?

Wiley/Blackwell owns this journal

<?xml version="1.0" encoding="UTF-8"?> <doi_batch_diagnostic> <submission_id>923604608</submission_id> <record_diagnostic doi="10.1046/j.1475-1313.1995.9590568m.x"> <conflict status="Success" ids="85579,263262"> <msg>Marked as alias</msg> <doi_list> <doi>10.1016/0275-5408(95)90568-M</doi> </doi_list> </conflict> </record_diagnostic> </doi_batch_diagnostic>

H:[email protected];op=PRIMARY 10.1046/j.1475-1313.1995.9590568m.x

Resolve_conflit.txt

Conflicts

Process log (email)

Page 16: CrossRef Technical Working Group

16

CrossRef 2009 Annual Member Meeting - Boston Page 16

<query key="MyKey1" enable-multiple-hits="false"> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <first_page>163</first_page> <year>1995</year> </query>

<body> <query key="MyKey1" status="resolved" fl_count="2"> <doi type="journal_article">10.1046/j.1475-1313.1995.9590568m.x</doi> <issn type="print">02755408</issn> <issn type="electronic">14751313</issn> <journal_title match="exact">Ophthalmic and Physiological Optics</journal_title> <author match="exact">Brown</author> <volume match="exact">15</volume> <issue>3</issue> <first_page match="exact">163</first_page> <year match="exact">1995</year> <publication_type>full_text</publication_type> </query> </body>

Match Succeeds !!!

Conflicts

Page 17: CrossRef Technical Working Group

17

CrossRef 2009 Annual Member Meeting - Boston Page 17

  What do YOU need to do

1.  Go to http://www.crossref.org/06members/59conflict.html

2.  Determine the nature of your conflicts

1.  If they only involves your own DOIs

  Construct the necessary conflict resolution files and upload them using doi.crossref.org

  Use the screens at doi.crossref.org Metadata Admin tab to fix them

2.  If they involve someone else’s DOIs

  Construct the necessary conflict resolution files

  Email them to [email protected]

Conflicts

Audits will be coming next year and un-resolved conflicts may co$t you

Page 18: CrossRef Technical Working Group

18

CrossRef 2009 Annual Member Meeting - Boston Page 18

Metadata Quality   Metadata quality is good enough for linking (besides conflict problems) … but it is not good enough for other purposes (display).

No Volume No Issue No Page No Author No Article Title One Author No First Name Initial Only DOI Total

3,055,090 5,582,856 1,359,241 3,807,396 988,764 16,139,751 4,835,479 12,039,157 38,193,723

  Schematron rules Contributor checks Alert if only single author is present (not reported but recorded) Alert if only first initial is deposited Check for numbers in given name / surname Check for punctuation in given name / surname currently checks for: _\/*@()[] Check for ndash in name Check for Jr or Sr in surname Alert if all caps Alert if more than 3 spaces are present Alert if space in surname when no given name is present Alert if surname ends with jr,JR Alert if surname contains 'et. al. Alert if surname/given name contains &amp; or &amp;# (malformed entity) Alert if multiple ??? are present

Page Ranges Alert for _ or - in first or last page Alert if first and last page are identical

Edition / Issue info Check for 'edition' in <edition> Check for 'issue' in <issue> Check for 'no' or 'number' in volume/issue/edition

Citation Checks All surname checks All page range check Year range check

Article Title Check for single word title Alert if all caps Alert if title name contains &amp; or &amp;# (malformed entity)

Other Alert for year beyond current year Alert if neither first page or author are present Alert if more than 2 alternate titles Alert if DOI contains character not in allowed

Page 19: CrossRef Technical Working Group

19

CrossRef 2009 Annual Member Meeting - Boston Page 19

<pages> <first_page>305???306</first_page> </pages>

  What else is bad quality?

Metadata Quality

  224,000 DOIs with bad page number (really effects matching)

  DOI links that still work: 14,985 journals crawled in 2009

69.25% are confirmed good, 22.8% unconfirmed, 5% confirmed not good sum(dois) sum(checked) sum(confirmed) sum(semiconfirmed) sum(nonconfirmed) sum(bad) sum(login) 25,977,348 361,514 206,204 44,168 82,565 1,140 16,950

Western Journal of Medicine 10.1136/ewjm.172.6.364 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.2.84 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.1.43 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.1.61-a http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.174.2.103 http://www.pubmedcentral.nih.gov/

Page 20: CrossRef Technical Working Group

20

CrossRef 2009 Annual Member Meeting - Boston Page 20

Metadata Quality   Schematron reports are run once a week. From: <[email protected]> Date: October 3, 2009 12:42:27 PM EDT To: <[email protected]>, <[email protected]> Subject: Schematron Report for prefix(es) 10.1109 [email protected] The results of a weekly metadata quality check are listed below. The affected DOIs were deposited successfully but the metadata attached to the DOI may need some attention. http://www.crossref.org/schematron/data/st_20091003_5431.xml http://www.crossref.org/schematron/data/st_20091003_5347.xml http://www.crossref.org/schematron/data/st_20091003_5430.xml http://www.crossref.org/schematron/data/st_20091003_5348.xml http://www.crossref.org/schematron/data/st_20091003_5411.xml

<person_name sequence="first" contributor_role="author"> <given_name>AdriËnne M.</given_name> <surname>Mendrik $^*$</surname> </person_name>

http://ftp.crossref.org/schematron/data/st_20091010_2004.xml http://ftp.crossref.org/schematron/data/st_20091010_3553.xml http://ftp.crossref.org/schematron/data/st_20091003_5411.xml http://ftp.crossref.org/schematron/data/st_20091003_59687.xml http://ftp.crossref.org/schematron/data/st_20091003_5837.xml

Page 21: CrossRef Technical Working Group

21

CrossRef 2009 Annual Member Meeting - Boston Page 21

System rewrite   May 2008: Board endorses plan to address a significant rewrite/upgrade

  June 2008-Feb 2009: TWG subgroup (rewrite2) meets to define requirements and other project parameters   Oct: Scenario options documented and cost comparisons profiled, started negotiations with Atypon re: new contract.  Nov: Report presented to board and to rewrite2 group for direction and validation

  Dec 08- May09: Negotiations with Atypon

  Oct 12,09: New contract signed

•  That CrossRef should ultimately own the intellectual property in the software at the heart of its operations •  That CrossRef should not risk or jeopardize the reliability and throughput offered by the existing system •  That CrossRef should remain free to develop further applications for other purposes which need to interface to the reference-linking systems and/or its data •  Recognized that CrossRef is not likely to establish internal resources sufficient to manage independently the development and maintenance of this magnitude a system.

Core Needs

Page 22: CrossRef Technical Working Group

22

CrossRef 2009 Annual Member Meeting - Boston Page 22

System rewrite

2009 2010 2011

Existing System (EDS) EDS mods to use NQS

New Query System (NQS) New Deposit System (NDS)

System

Both query and deposit transactions Deposit transactions Query transactions

  NQS will make use of the existing Oracle database (minimal mods to the schema)

  EDS will communicate with NQS via JMI (Java Message Interface)

  May use Spring framework, if not initially more likely later on (NDS)

  NDS will include significant data model and process changes

  Title management   Conflicts   Oracle schema cleanup

  NQS/NDS combined will allow integration of currently stand-alone functions (OAI-PMH)

  After NQS/NDS: possibly augment/replace back end database (satellite DBs)

Page 23: CrossRef Technical Working Group

23

CrossRef 2009 Annual Member Meeting - Boston Page 23

System rewrite Current organization

System (resin)

HAProxy

www.crossref.org /openurl /iPage /query/xref.cgi /* oai.crossref.org/OAIHandler

System (resin)

doi.crossref.org /*

openurl, iPage SIGG,other (Tomcat)

openurl, iPage SIGG,other (Tomcat)

www (Apache)

Oracle (prime)

Oracle (passive-stndby)

Oracle (CMD)

Lucene (2)

BerkelyDB(2) Daily Replication Constant

Replication

Deposit Processor (Java app)

Stored Query Processor (Java app)

Page 24: CrossRef Technical Working Group

24

CrossRef 2009 Annual Member Meeting - Boston Page 24

System rewrite New organization

Hisham?

Persistent Data Access

HAProxy

www.crossref.org/openurl doi.crossref.org/* oai.crossref.org/OAIHandler /iPage /query/xref.cgi /*

Oracle (prime)

Oracle (active-stndby)

Oracle (CMD)

Lucene BerkelyDB Constant

Replication

Metadata

Query Dispatch

HAProxy (standby)

www (Apache)

NQS (Spring) (Tomcat)

Citation Lookup

Metadata Access

EDS (resin)

Deposit Processor (Java app)

Direct JDBC

Page 25: CrossRef Technical Working Group

25

CrossRef 2009 Annual Member Meeting - Boston Page 25

New initiatives, technical perspectives.

… Geoff

Page 26: CrossRef Technical Working Group

26

CrossRef 2009 Annual Member Meeting - Boston Page 26


Recommended