Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd

Post on 01-Feb-2016

35 views 0 download

Tags:

description

Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd.edu. Life Science Data Repositories. NLM is the elephant in the room .. However .. There are thousands on community maintained efforts – all want an NAR publication - PowerPoint PPT Presentation

transcript

Towards Data Attribution &Citation in the Life Sciences

Philip E. BourneUCSD

pbourne@ucsd.edu

8/22/11 Data Attribution and Citation

Life Science Data Repositories

NLM is the elephant in the room .. However .. There are thousands on community maintained

efforts – all want an NAR publication The ability to cite and attribute the data are highly

variable:– DOIs assigned in some cases, but not used– Attribution is through the metadata in most cases– Citation is typically by the associated literature reference if it exists,

and/or a database identifier– The use of data repositories such as Dryad is compelling for the long

tail problem– Data journals are on the horizon

8/22/11 Data Attribution and Citation

Consider the PDB as a Use Case

Oldest data resource in biology?

A resource used by ~ 200,000 individuals per month – increasing number of school kids!

A resource distributing worldwide the equivalent to ¼ the National Library of Congress each month

A bicoastal/worldwide resource

1TB

8/22/11 Data Attribution and Citation

Nu

mb

er o

f re

leas

ed e

ntr

ies

Year

PDB Typical Growth Curve – But the Complexity!

8/22/11

People are doing more with the data

Number of visits and page views is growing faster than number of unique visitors

The Data May Save Lives?

* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm

Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010

1RUZ: 1918 H1 Hemagglutinin

Structure Summary page activity forH1N1 Influenza related structures

*

3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir

PDB Data Attribution and Citation

About 25% of our budget has been spent on data remediation – multiple versions supported – the copy of record (as defined by the publication) is always available

Cant publish unless data are deposited – motivated by the community - very good data to publication correspondence

Data objects are discreet and we assign DOIs – but they are not used – database identifiers preferred

8/22/11 Data Attribution and Citation

Ah yes .. But the CD4 Story…

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Literature/Data Integration

1. User clicks on content

2. Metadata and webservices to data provide an interactive view that can be annotated

3. Selecting features provides a data/knowledge mashup

4. Analysis leads to new content I can share

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

PLoS Comp. Biol. 2005 1(3) e348/22/11

www.rcsb.org/pdb/explore/literature.do?structureId=1TIM

Example of Interoperability: The Database View

BMC Bioinformatics 2010 11:220

Example of Interoperability – The Literature View

From Anita de Waard, Elsevier

Acknowledgements

Funding Agencies: NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK

128/22/11 Data Attribution and Citation