1
CrossRef 2009 Annual Member Meeting - Boston Page 1
Agenda
2:00 – 2:20 Status, Changes, Issues - Chuck Koscher
2:20 – 2:45 Metadata Quality - Chuck Koscher
2:45 – 3:00 System rewrite - Chuck Koscher
3:00 – 3:20 New Initiatives - Geoff Bilder
3:20 – 3:30 Coffee and Tea Break
3:30 – 5:00 Publisher system discussions - PLOS - Richard Cave - APA - Beverly Jamison - J. Wiley & Sons - Matt Larson
2
CrossRef 2009 Annual Member Meeting - Boston Page 2
2002 2005 2009
10/13/2009 1.900 sec (heavy load) 1.2 (moderate load) 0.680 (light load) 2007 0.500 sec 2005 0.300 sec 2003 0.625 sec
Query response time load
System status
3
CrossRef 2009 Annual Member Meeting - Boston Page 3
Deposit times (2009) System status
June July August Sept October Less than 5 mn: 107888 (53 %) 141105 (83 %) 131661 (91 %) 83379 (57 %) 33546 (52 %)
Less than 1 hr: 35189 (17 %) 22389 (13 %) 10753 (7 %) 33829 (23 %) 18165 (28 %)
Less than 6 hr: 31666 (15 %) 3666 (2 %) 903 (0 %) 24201 (16 %) 8037 (12 %)
Less than 12 hr: 23482 (11 %) 181 (0 %) 0 (0 %) 2411 (1 %) 1855 (2 %)
Less than 18 hr: 4019 (1 %) 713 (0 %) 0 (0 %) 968 (0 %) 1950 (3 %)
Less than 24 hr 0 (0 %) 3 (0 %) 0 (0 %) 0 (0 %) 0 (0 %)
More than 24 hr: 0 (0 %) 1 (0 %) 1 (0 %) 1 (0 %) 0(0 %)
Total deposits: 203001 168058 143318 144790 63555
4
CrossRef 2009 Annual Member Meeting - Boston Page 4
System status
Operations changes
Starting to use HAProxy for internal load balancing and redundancy Using Alertra for external monitoring VMWare virtual servers Now migrating Oracle from 9 to 11g (allows active read-only standby) Using Jira for all [email protected] activities Berkeley DB based service for OpenURL DOI queries (metadata lookups)
Testing a process for <unstructured_citations> Two technologies being used
refXpress from Inera which parses a reference and breaks it into parts CitationQueryEngine, internally developed Lucene based search
Trial run Number of unstructured citations : 1,158,889 Number of DOIs processed : 3,150,525 Number of refXPress DOIs found : 47,165 Number of CQE DOIs found (score>2.2) : 139,721
5
CrossRef 2009 Annual Member Meeting - Boston Page 5
<citation key="10.1016/S0736-0266(02)00040-2-BIB21"> <author>Valero-Cuevas</author> <cYear>2000</cYear> <unstructured_citation> Applying principles of robotics to understand the biomechanics, neuromuscular control and clinical rehabilitation of human digits. In: IEEE International Conference on Robotics and Automation, San Francisco, CA, 2000. </unstructured_citation> </citation>
CQE: score 3.159, refXpress: unparsed, XMLquery: nomatch
1
6
CrossRef 2009 Annual Member Meeting - Boston Page 6
<citation key="BIB14"> <volume_title>Macromolecules 1995</volume_title> <author>Butler</author> <unstructured_citation>; ; ; ; ; Macromolecules 1995, 28: 6383. </unstructured_citation> </citation>
CQE: score 1.89 , refXpress: parsed, XMLquery: nomatch 10.1021/ma00123a001 10.1021/ma00123a001 -
2
7
CrossRef 2009 Annual Member Meeting - Boston Page 7
CQE: score 0.48 , refXpress: unparsed, XMLquery: nomatch 10.1007/s11626-009-9184-7 - -
<citation key="BIB3"> <volume_title>Taurine 3: cellular and regulatory mechanisms</volume_title> <author>Chen</author> <first_page>397</first_page> <cYear>1998a</cYear> <unstructured_citation> 1998a. Effect of taurine on human fetal neuron cells: proliferation and differentiation. In: editors. Taurine 3: cellular and regulatory mechanisms. New York: Kluwer Academic/Plenum Publishers. p 397-403. </unstructured_citation> </citation>
Springer has assigned DOIs to Taurine 4,6 and 7!
3
8
CrossRef 2009 Annual Member Meeting - Boston Page 8
CQE: score 0.99, refXpress: parsed, XMLquery: nomatch 10.1007/BF01067277 10.1023/A:1015801207091
<citation key="10.1021/js950353+-BIB19"> <volume_title>Pharm. Res.</volume_title> <author>Shukla</author> <volume>8</volume> <first_page>1396</first_page> <cYear>1991</cYear> <unstructured_citation>; Pharm. Res. 1991, 8, 1396-1400.</unstructured_citation> </citation>
<query key="MyKey4"> <issn>0724-8741</issn> <journal_title>Pharm. Res.</journal_title> <volume>8</volume> <first_page>1396</first_page> <year>1991</year> </query>
<query key="10.1021/js950353+-BIB19"> <volume_title>Pharm. Res.</volume_title> <author>Shukla</author> <volume>8</volume> <first_page>1396</first_page> <year>1991</year> </query>
4
9
CrossRef 2009 Annual Member Meeting - Boston Page 9
<citation key="b64_1025"> <unstructured_citation> Xu C, Taoka S, Crofts AR, Govindjee (1991) Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids. Biochim Biophys Acta 1098: 32-40 </unstructured_citation> </citation>
CQE: score 2.39, refXpress: semi-parsed, XMLquery: nomatch 10.1016/0167-4838(91)90582-K 10.1016/0005-2728(91)90006-A
5
<journal_title> Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology </journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>C</given_name> <surname>Xu </surname> </contributor> </contributors> <volume>1098</volume> <issue>1</issue> <first_page>32</first_page> <last_page>40</last_page> <year media_type="print">1991</year> <publication_type>full_text</publication_type> <article_title> Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids </article_title>
<journal_title> Biochimica et Biophysica Acta (BBA) - Bioenergetics </journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>C</given_name> <surname>Xu </surname> </contributor> </contributors> <volume>1098</volume> <issue>1</issue> <first_page>32</first_page> <last_page>40</last_page> <year media_type="print">1991</year> <publication_type>full_text</publication_type> <article_title> Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids </article_title>
10
CrossRef 2009 Annual Member Meeting - Boston Page 10
<citation key="b53_366"> <unstructured_citation> 53. O.S. Gudmundsson, S.D.S. Jois, D.G. Vander Velde, T.J. Siahaan, B. Wang, and R.T. Borchardt (1999 ) The effect of conformation on the membrane permeability of coumarinic acid- and phenylpropionic acid-based cyclic prodrugs of opioid peptides.J. Pept. Res.53 , 383 -392 . </unstructured_citation> </citation>
CQE: score 3.41, refXpress: semi-parsed, XMLquery: nomatch 10.1034/j.1399-3011.1999.00077.x 10.1034/j.1399-3011.1999.00076.x
<doi type="journal_article"> 10.1034/j.1399-3011.1999.00076.x</doi> <issn type="print">1397-002X</issn> <issn type="electronic">1399-3011</issn> <journal_title>Journal of Peptide Research</journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>O.S.</given_name> <surname>Gudmundsson</surname> </contributor> </contributors> <volume>53</volume> <issue>4</issue> <first_page>383</first_page> <last_page>392</last_page> <year media_type="print">1999</year> <publication_type>full_text</publication_type> <article_title> The effect of conformation on the membrane permeation of coumarinic acid- and phenylpropionic acid-based cyclic prodrugs of opioid peptides </article_title>
<doi type="journal_article"> 10.1034/j.1399-3011.1999.00077.x</doi> <issn type="print">1397-002X</issn> <issn type="electronic">1399-3011</issn> <journal_title>Journal of Peptide Research</journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>O.S.</given_name> <surname>Gudmundsson</surname> </contributor> </contributors> <volume>53</volume> <issue>4</issue> <first_page>403</first_page> <last_page>413</last_page> <year media_type="print">1999</year> <publication_type>full_text</publication_type> <article_title> The effect of conformation of the acyloxyalkoxy-based cyclic prodrugs of opioid peptides on their membrane permeability </article_title>
6
11
CrossRef 2009 Annual Member Meeting - Boston Page 11
Changes (problems)
Notable software error this past 12 months
URLs in Handle rewritten with an older value (effected some publishers who had deposited as-crawled URLs AND did URL mods via ownership transfer)
Medium-big changes Book Volume-title/ author/ year rule: match on (only) Book title DOIs (sample2) Added a false positive prevention rule
IF a (XML) query contains article title and that title is not an exact match with the deposited title DO NOT MATCH except if author and first-page are an EXACT match
Small-medium changes Matching special characters in author names Matching compound surnames Removed ability to avoid conflicts DOI character limits: "a-z", "A-Z", "0-9" and "-._;()/“
Title lock-down (ISSN check disallowing a deposit)
12
CrossRef 2009 Annual Member Meeting - Boston Page 12
Issues
Ongoing …..
Too many alternative (publication) titles. Be CAREFULL!!! Can really mess up title fuzzy matching (we do have a schematron monitor)
Deleting DOIs: Change the publication title Change the DOI’s title (article title) to the DOI itself Remove optional metadata Set publication date to the deletion date
Conflicts
Conflicts reduce matching rates!
Timestamps
DOIs are deposited with a timestamp to ensure the latest metadata gets inserted. Timestamps are essential when we have to re-process deposits. Problems occur when DOI ownership occurs (e.g. what is the timestamp?)
Solution: Crossref will provide a means to retrieve current timestamp.
New
13
CrossRef 2009 Annual Member Meeting - Boston Page 13
=========================================== Created: 2006-04-04 04:10:03.0 ConfID: 263262 CauseID: 246648646 OtherID: 64341060, JT: Ophthalmic and Physiological Optics MD: Brown, 15 ,3,163,1995,Differences in visual acuity between the eyes: determination of normal limits in a clinical population DOI: 10.1046/j.1475-1313.1995.9590568m.x(85579-R 263262-null )
DOI: 10.1016/0275-5408(95)90568-M ===========================================
Conflicts
<query key="MyKey1" enable-multiple-hits="false"> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <first_page>163</first_page> <year>1995</year> </query>
<query key="MyKey1" status="multiresolved" fl_count="2"> <doi type="journal_article">10.1046/j.1475-1313.1995.9590568m.x</doi> <issn type="print">02755408</issn> <issn type="electronic">14751313</issn> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <issue>3</issue> <first_page>163</first_page> <year>1995</year> <publication_type>full_text</publication_type> </query> <query key="MyKey1" status="multiresolved" fl_count="0"> <doi type="journal_article">10.1016/0275-5408(95)90568-M</doi> <issn type="print">02755408</issn> <issn type="electronic">14751313</issn> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <issue>3</issue> <first_page>163</first_page> <year>1995</year> <publication_type>full_text</publication_type> </query>
<query key="MyKey1" status="unresolved" fl_count="0"> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <first_page>163</first_page> <year>1995</year> </query>
Match Fails
enable-multiple-hits=“true"
14
CrossRef 2009 Annual Member Meeting - Boston Page 14
Conflicts
15
CrossRef 2009 Annual Member Meeting - Boston Page 15
What to do?
Wiley/Blackwell owns this journal
<?xml version="1.0" encoding="UTF-8"?> <doi_batch_diagnostic> <submission_id>923604608</submission_id> <record_diagnostic doi="10.1046/j.1475-1313.1995.9590568m.x"> <conflict status="Success" ids="85579,263262"> <msg>Marked as alias</msg> <doi_list> <doi>10.1016/0275-5408(95)90568-M</doi> </doi_list> </conflict> </record_diagnostic> </doi_batch_diagnostic>
H:[email protected];op=PRIMARY 10.1046/j.1475-1313.1995.9590568m.x
Resolve_conflit.txt
Conflicts
Process log (email)
16
CrossRef 2009 Annual Member Meeting - Boston Page 16
<query key="MyKey1" enable-multiple-hits="false"> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <first_page>163</first_page> <year>1995</year> </query>
<body> <query key="MyKey1" status="resolved" fl_count="2"> <doi type="journal_article">10.1046/j.1475-1313.1995.9590568m.x</doi> <issn type="print">02755408</issn> <issn type="electronic">14751313</issn> <journal_title match="exact">Ophthalmic and Physiological Optics</journal_title> <author match="exact">Brown</author> <volume match="exact">15</volume> <issue>3</issue> <first_page match="exact">163</first_page> <year match="exact">1995</year> <publication_type>full_text</publication_type> </query> </body>
Match Succeeds !!!
Conflicts
17
CrossRef 2009 Annual Member Meeting - Boston Page 17
What do YOU need to do
1. Go to http://www.crossref.org/06members/59conflict.html
2. Determine the nature of your conflicts
1. If they only involves your own DOIs
Construct the necessary conflict resolution files and upload them using doi.crossref.org
Use the screens at doi.crossref.org Metadata Admin tab to fix them
2. If they involve someone else’s DOIs
Construct the necessary conflict resolution files
Email them to [email protected]
Conflicts
Audits will be coming next year and un-resolved conflicts may co$t you
18
CrossRef 2009 Annual Member Meeting - Boston Page 18
Metadata Quality Metadata quality is good enough for linking (besides conflict problems) … but it is not good enough for other purposes (display).
No Volume No Issue No Page No Author No Article Title One Author No First Name Initial Only DOI Total
3,055,090 5,582,856 1,359,241 3,807,396 988,764 16,139,751 4,835,479 12,039,157 38,193,723
Schematron rules Contributor checks Alert if only single author is present (not reported but recorded) Alert if only first initial is deposited Check for numbers in given name / surname Check for punctuation in given name / surname currently checks for: _\/*@()[] Check for ndash in name Check for Jr or Sr in surname Alert if all caps Alert if more than 3 spaces are present Alert if space in surname when no given name is present Alert if surname ends with jr,JR Alert if surname contains 'et. al. Alert if surname/given name contains & or &# (malformed entity) Alert if multiple ??? are present
Page Ranges Alert for _ or - in first or last page Alert if first and last page are identical
Edition / Issue info Check for 'edition' in <edition> Check for 'issue' in <issue> Check for 'no' or 'number' in volume/issue/edition
Citation Checks All surname checks All page range check Year range check
Article Title Check for single word title Alert if all caps Alert if title name contains & or &# (malformed entity)
Other Alert for year beyond current year Alert if neither first page or author are present Alert if more than 2 alternate titles Alert if DOI contains character not in allowed
19
CrossRef 2009 Annual Member Meeting - Boston Page 19
<pages> <first_page>305???306</first_page> </pages>
What else is bad quality?
Metadata Quality
224,000 DOIs with bad page number (really effects matching)
DOI links that still work: 14,985 journals crawled in 2009
69.25% are confirmed good, 22.8% unconfirmed, 5% confirmed not good sum(dois) sum(checked) sum(confirmed) sum(semiconfirmed) sum(nonconfirmed) sum(bad) sum(login) 25,977,348 361,514 206,204 44,168 82,565 1,140 16,950
Western Journal of Medicine 10.1136/ewjm.172.6.364 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.2.84 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.1.43 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.1.61-a http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.174.2.103 http://www.pubmedcentral.nih.gov/
20
CrossRef 2009 Annual Member Meeting - Boston Page 20
Metadata Quality Schematron reports are run once a week. From: <[email protected]> Date: October 3, 2009 12:42:27 PM EDT To: <[email protected]>, <[email protected]> Subject: Schematron Report for prefix(es) 10.1109 [email protected] The results of a weekly metadata quality check are listed below. The affected DOIs were deposited successfully but the metadata attached to the DOI may need some attention. http://www.crossref.org/schematron/data/st_20091003_5431.xml http://www.crossref.org/schematron/data/st_20091003_5347.xml http://www.crossref.org/schematron/data/st_20091003_5430.xml http://www.crossref.org/schematron/data/st_20091003_5348.xml http://www.crossref.org/schematron/data/st_20091003_5411.xml
<person_name sequence="first" contributor_role="author"> <given_name>AdriËnne M.</given_name> <surname>Mendrik $^*$</surname> </person_name>
http://ftp.crossref.org/schematron/data/st_20091010_2004.xml http://ftp.crossref.org/schematron/data/st_20091010_3553.xml http://ftp.crossref.org/schematron/data/st_20091003_5411.xml http://ftp.crossref.org/schematron/data/st_20091003_59687.xml http://ftp.crossref.org/schematron/data/st_20091003_5837.xml
21
CrossRef 2009 Annual Member Meeting - Boston Page 21
System rewrite May 2008: Board endorses plan to address a significant rewrite/upgrade
June 2008-Feb 2009: TWG subgroup (rewrite2) meets to define requirements and other project parameters Oct: Scenario options documented and cost comparisons profiled, started negotiations with Atypon re: new contract. Nov: Report presented to board and to rewrite2 group for direction and validation
Dec 08- May09: Negotiations with Atypon
Oct 12,09: New contract signed
• That CrossRef should ultimately own the intellectual property in the software at the heart of its operations • That CrossRef should not risk or jeopardize the reliability and throughput offered by the existing system • That CrossRef should remain free to develop further applications for other purposes which need to interface to the reference-linking systems and/or its data • Recognized that CrossRef is not likely to establish internal resources sufficient to manage independently the development and maintenance of this magnitude a system.
Core Needs
22
CrossRef 2009 Annual Member Meeting - Boston Page 22
System rewrite
2009 2010 2011
Existing System (EDS) EDS mods to use NQS
New Query System (NQS) New Deposit System (NDS)
System
Both query and deposit transactions Deposit transactions Query transactions
NQS will make use of the existing Oracle database (minimal mods to the schema)
EDS will communicate with NQS via JMI (Java Message Interface)
May use Spring framework, if not initially more likely later on (NDS)
NDS will include significant data model and process changes
Title management Conflicts Oracle schema cleanup
NQS/NDS combined will allow integration of currently stand-alone functions (OAI-PMH)
After NQS/NDS: possibly augment/replace back end database (satellite DBs)
23
CrossRef 2009 Annual Member Meeting - Boston Page 23
System rewrite Current organization
System (resin)
HAProxy
www.crossref.org /openurl /iPage /query/xref.cgi /* oai.crossref.org/OAIHandler
System (resin)
doi.crossref.org /*
openurl, iPage SIGG,other (Tomcat)
openurl, iPage SIGG,other (Tomcat)
www (Apache)
Oracle (prime)
Oracle (passive-stndby)
Oracle (CMD)
Lucene (2)
BerkelyDB(2) Daily Replication Constant
Replication
Deposit Processor (Java app)
Stored Query Processor (Java app)
24
CrossRef 2009 Annual Member Meeting - Boston Page 24
System rewrite New organization
Hisham?
Persistent Data Access
HAProxy
www.crossref.org/openurl doi.crossref.org/* oai.crossref.org/OAIHandler /iPage /query/xref.cgi /*
Oracle (prime)
Oracle (active-stndby)
Oracle (CMD)
Lucene BerkelyDB Constant
Replication
Metadata
Query Dispatch
HAProxy (standby)
www (Apache)
NQS (Spring) (Tomcat)
Citation Lookup
Metadata Access
EDS (resin)
Deposit Processor (Java app)
Direct JDBC
25
CrossRef 2009 Annual Member Meeting - Boston Page 25
New initiatives, technical perspectives.
… Geoff
26
CrossRef 2009 Annual Member Meeting - Boston Page 26