Date post: | 09-May-2015 |
Category: |
Education |
Upload: | philip-bourne |
View: | 579 times |
Download: | 1 times |
Using Open Access Content:Ten Simple Observations
SciVee & Beyond the PDF
Philip E. Bourne
University of California San Diego
www.sdsc.edu/pb
http://www.slideshare.net/pebourne/p-lo-s
My Two Lectures
1. The promise - Open Access, Open Science with particular reference to PLoS
2. The fulfillment - What Open Access facilitates and examples of how it benefits science
The fulfillment - What Open Access facilitates and examples of how it benefits science
• What you might get from this lecture:
– How others are using open science including open access content
– Ideas for how you might use the content
Todays Exemplars
http://www.mendeley.com/
http://getutopia.com/documents/http://www.scivee.tv/node/17389
Let me Start with a Few Observations
Observation 1. Scientific culture is causing us to try and write more
and read more
You Cannot Possibly Read a Fraction of the Papers You Should
write more and read more Renear & Palmer 2009 Science 325:828-832
Scanning More Reading Less
Renear & Palmer 2009 Science 325:828-832write more and read more
And So…
• There has been a paradigm shift which places more emphasis on writing and less on reading – witness blogs, use of literature aggregators (e.g. PubMed), H-factors, etc.
• We need help in assimilating knowledge
write more and read more
Observation 2
In 1993 there were very few electronic journals, by 2003 nearly all were on-line, by 2013 there will
be little or no paper
Most traditional publishers have only really achieved an electronic print like experience – the power of the medium is for the taking
Observation 3. The Sociology of Scientific Disciplines is Different
Observation 4:
• The biomedical sciences is progressive:– Alternative business models have gained
ground – Open Access– Databases are becoming more like journals and
journals are becoming more like databases– New modes of knowledge and data access are
gaining some ground e.g.• Textpresso – ontology-based mining and retrieval
system• iHOP Information Hyperlinked over Proteins
Observation 5.I Believe Open Access IF
Fully Accepted Could Profoundly Change Scholarly
Discourse
It remains a big IF
Open Access: Taking Full Advantage of the ContentPLoS Comp. Biol. 2008 4(3) e1000037
Its Happening in the Closed Access Space
• A very clever idea – The App model
• Leverage content• Provide an open API• Get the community to
do all the work• Drive folks to buy
content
Why Don’t We Have Such Developments in OA?
Growth of PubMed Central
Open access could profoundly change scholarly discourse
Open Access(Creative Commons License)
1. All published materials available on-line free to all (author pays model)
2. Unrestricted access to all published material in various formats eg XML provided attribution is given to the original author(s)
3. Copyright remains with the author
Open access could profoundly change scholarly discourse
Open Access(Creative Commons License)
1. All published materials available on-line free to all (reader pays model)
2. Unrestricted access to all published material in various formats eg XML provided attribution is given to the original author(s)
3. Copyright remains with the author Open Access: Taking Full Advantage of the ContentPLoS Comp. Biol. 2008 4(3) e1000037
Open access could profoundly change scholarly discourse
Observation 6
A biological database is not really that different from a biological journal – this can be exploited
PLoS Comp. Biol. 2005 1(3) e34
The Data Knowledge Cycle
BiocurationElectronic Supplements
Databases versus journals
Both Are Under Stress
• PubMed contains ~21M entries (May 2011)
• ~100,000 papers indexed per month
• In Feb 2009:– 67,406,898 interactive
searches were done– 92,216,786 entries were
viewed
• 1330 databases reported in NAR 2011
• MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times
PLoS Comp. Biol. 2005 1(3) e34
Some More Comparisons
• Journals have a pretty standardized interface
• Journals have a business model
• The quality is declining as numbers increase (?)
• Audience believes they are sustainable
• Efforts to make the interfaces different!
• Little attempt at a business model compared to the Web 2.0 world
• Quality is increasing (?)• Not well sustained
PLoS Comp. Biol. 2008. 4(7): e1000136Databases versus journals
Some More Comparisons
• New publishing models eg open access, self publishing, open review
• Web 2.0 influence eg social networks
• Use of rich media• The review process is
failing• New metrics
• Read and write eg Wikis
• New services eg restful, widgets
• Use of Rich Media• Crowd review emerging
Databases versus journals
Duh
• If we need to acquire more knowledge quickly
• If more literature and data are becoming open
• If both are under stress
• Why don’t we merge journals and databases for a new learning experience
23
The Test Bed
http://www.wwpdb.org/
http://www.plos.org/ http://www.pubmedcentral.nih.gov/
Merge journals and databases
The World Wide Protein Data Bank
• The single worldwide repository for data on the structure of biological macromolecules
• Vital for drug discovery and the life sciences
• 38 years old• Free to allhttp://www.wwpdb.org
Merge journals and databases
The World Wide Protein Data Bank
• Paper not published unless data are deposited – strong data to literature correspondence
• Highly structured data conforming to an extensive ontology
• DOI’s assigned to every structure
http://www.wwpdb.org
Merge journals and databases
The PLoS/PMC Corpus – Under the Hood
• Conforms well/partially to the NLM DTD – little markup of content
• PMC – some PDFs !
• The lack of conformance will come back to haunt us!
Author Submission via the Web Depositor Submission via the Web
Syntax Checking Syntax Checking
Review by Scientists &Editors
Review by Annotators
Corrections by AuthorCorrections by Depositor
Publish – Web Accessible Release – Web Accessible
Similar Processes Lead to Similar Resources
Merge journals and databases
So the processes are not that dissimilar it is the final product that is perceived so differently
Even that might be changing slowly?
PLoS Comp. Biol. 2008 4(12) e1000247
Merge journals and databases
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
Merged: The Database View
Merge journals and databases
Merged: The Literature ViewNucleic Acids Research 2008 36(S2) W385-389
http://biolit.ucsd.edu
Merge journals and databases
Merge journals and databases
ICTP Trieste, December 10, 2007 32
Merge journals and databases
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
The Near Future
1. User reads a paper
2. Clicks on a figure. Figure can be manipulated, annotated, interrogated
3. Clicking the figure gives a composite database journal view
4. This takes you to yet more papers or databases
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
http://biolit.ucsd.edu
Enhanced modes of learning
Observation 7: This is Literature Post-processing
Better to Get the Authors Involved
• Authors are the absolute experts on the content
• More effective distribution of labor
• Add metadata before the article enters the publishing process
Merge journals and databases – requires semantic enrichment
Word 2007 Add-in for authors
• Allows authors to add metadata as they write, before they submit the manuscript
• Authors are assisted by automated term recognition– OBO ontologies– Database IDs
• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable
• Open source, Microsoft Public License
http://www.codeplex.com/ucsdbiolit
Merge journals and databases – requires semantic enrichment
Challenges• Author use
– Familiarity with ontologies, terms– Agreement between co-authors
• End-use of semantically enriched manuscript
– Combine with NLM XML standard• Article Authoring Add-in
Merge journals and databases – requires semantic enrichment
Challenges:Author Use
IF one or more publishers fast tracked a paper that had semantic
markup I would argue it would catch on in no time
Merge journals and databases – requires semantic enrichment
Observation 8: There Are Some Simple Things We Can Do to
Mine the Corpus
Where We Would Like to Be: Data Clustering via the Literature
Immunology Literature
Cardiac DiseaseLiterature
Shared FunctionEnhanced modes of learning
Observation 9: The Use of Rich Media is Underutilized
Yes YouTube Can Increase the Rate of Discovery
Pubcast – Video Integrated with the Full Text of the Paper
AndroidiPhone
Windows Phone 7
Step 1presenter starts
PowerPoint
Step 2presenter starts
recording onsmart phone
Step 3presenter stops recording and
initiates upload
Slides
Website
Step 5slides and podcastare automatically
synchronizedSync FilePodcast
Step 6listener
plays back synchronized presentation
Proposal - The TeachU WorkflowMacPC
Step 4slides areuploaded
Lessons
• It is a form of expression the current YouTubers embrace and may become as ubiquitous as papers and slide presentations in the next few years
• We are reinventing television
• Its only going to work if it is easy to publish and the reward is obvious
Observation 10:Scientific Reproducability
Requires we Publish Workflows
Yes The Workflow is Real
Reproducibility
• My views of reproducibility:– We all express the importance, but the only time
it is tested is when something is truly novel or error is suspected
– Reproducability covers a spectrum of meaning – by whom and with how much effort
– The longer the time lag the less likely something is reproducible
Workflow Tools Might be the Answer
Taverna
Wings
Consider an Example: Our Own Experience in Capturing the Scientific Process to Make
it Open and Reproducable
• Its hard and embarrassing• We have a working prototype using Wings• I can feel the potential productivity gains• My students are more doubtful• Its been a lot of fun and will enable us to
improve our processes regardless of the workflow system itself
Problems with Publishing Workflows
• Workflows are not linear• Workflow : paper is not 1:1• Confidentiality• Peer review• Infrastructure• Community acceptance• Reward system• No publisher seems willing to touch them
Where Will It All End?http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.html
General References
• What Do I Want from the Publisher of the Future PLoS Comp Biol 6(5): e1000787
• Fourth Paradigm: Data Intensive Scientific Discovery http://research.microsoft.com/enus/collaboration/fourthparadigm/
References to Exemplars• Semantic Biochemical Journal - 2010: Using Utopia
• Article of the Future, Cell, 2009:• Prospect, Royal Society of Chemistry, 2009:• Adventures in Semantic Publishing, Oxford U, 2009:
• The Structured Digital Abstract, Seringhaus/Gerstein, 2008• CWA Nanopublications – 2010• https://sites.google.com/site/beyondthepdf/
• https://sites.google.com/site/futureofresearchcommunications/
Acknowledgements
• BioLit Team– Lynn Fink– Parker Williams– Marco Martinez– Rahul Chandran– Greg Quinn
• Microsoft Scholarly Communications– Pablo Fernicola– Lee Dirks– Savas Parastitidas– Alex Wade– Tony Hey
• wwPDB Team– Boki Beran
– Wolfgnag Bluhm
– Andreas Prlic
– Greg Quinn
– Peter Rose
– Ben Yutick
– Chunxaio Zhu
http://biolit.ucsd.eduhttp://www.codeplex.com/ucsdbiolit