+ All Categories
Home > Documents > Content Mining a short introduction to practices and policies

Content Mining a short introduction to practices and policies

Date post: 11-Jan-2016
Category:
Upload: lixue
View: 31 times
Download: 0 times
Share this document with a friend
Description:
Content Mining a short introduction to practices and policies. Summary of a study for the Publishing Research Consortium into Journal Article Mining, By Eefke Smit, Maurits van der Graaf (2011) Full study available on PRC website. Let ’ s start with a potential user (1). Use-case-1: - PowerPoint PPT Presentation
Popular Tags:
18
Content Mining a short introduction to practices and policies Summary of a study for the Publishing Research Consortium into Journal Article Mining, By Eefke Smit, Maurits van der Graaf (2011) Full study available on PRC website
Transcript

STM Future Lab

Content Mininga short introduction to practices and policiesSummary of a study for the Publishing Research Consortiuminto Journal Article Mining,By Eefke Smit, Maurits van der Graaf (2011)

Full study available on PRC website

1Lets start with a potential user (1)Use-case-1: keeping up-to-dateSince 1982: 90,000 journal articles on neuroregeneration (e.g. spinal cord injury)New articles: on average 22 journal articles per day on neuroregeneration

Prof. Joost Verhaagen PhD, Netherlands Institute for Neuroscience, Amsterdam

2Lets start with a potential user (2)Use-case-2: Information needed as result of laboratory experimentsWhich molecules do play a role in this process?Typical outcome of an experiment: hundreds of molecules show enhanced activityNext step: how to filter out the relevant molecules?You would like to have for each of these molecules a meta-analysis about what is already known about these molecules in other processes

Prof. Joost Verhaagen PhD, Netherlands Institute for Neuroscience, Amsterdam

3The essence of TDM is:So much information to analyze:Can a machine do this for him ?

Text mining tool for semantic search by PubTator, see http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/tutorial/index.htmlWhat TDM looks like:

Typical text mining consists ofProcessing large corpora of text in an automated wayTo identify entities, instances, actions, relationships and patterns and do further analysis (e.g. on assertions or sentiments)Examples: genes, proteins, gene-disease patterns, compound properties, chemical structures, side effects of drugsText mining output typically consists of a new metadata layer for the information:Article clusters and categorisations, indexesTopical maps, to show the occurrence of topics and their inter-relationshipsDatabases with facts, patterns, relationships, statements, assertions, properties found in the articles, Visualisations like graphs, mappings, plot-graphs and topical maps

Study commisioned in 2011 by the Publishing Research ConsortiumAuthors:Eefke Smit, Intnl Assoc of STM PublishersMaurits van der Graaf, Pleiade Management & Consultancy Two parts:Qualitative study:29 interviews with experts in academia, research, libraries, vendors and publishersQuantitative studySurvey among publishers (members Crossref & STM)190 responsesFull report on PRC website www.publishingresearch.netArticle in the 1st issue of 2012 of Learned Publishing

7Optimists and Pessimists on TDMSkeptics about TDM:Has always over-promisedOnly in specialized fieldsTools still complicatedManual curation necessaryHigh investmentsDomain dependentNo common dictionaryOverambition in the promise of knowledge discoveryOptimists about TDM:Vast digital corpus available and growingMore and more application areas (business, legal, social, etc)Tools improving fastManual work reducedPublic domain or domain precisionProcessing power less of a problem, analytical tools better, visualisation adds to analysis

8

Publishers are optimistic:Opinions/ expectations for Content Mining in the next 3 years

Publishers are optimistic, continued:Opinions/ expectations for Content Mining on scholarly content in the next 3 years

10

but publishers do not yet get many mining requests from 3rd parties:11

Publishers are liberal in allowing mining:How case-by-case requests are treated 12

and plan more mining themselves:for retrieval and navigation

Cross-sector solutions to facilitate Content Mining betterSuggestions made by experts during the interviews:

Standardization of Content FormatsOne Content Mining platformCommonly agreed access and permission termsOne window for mining permissionsCollaboration with national libraries

(ad 3: most interviewed experts do NOT see Open Access as a related issue; access terms also relate to datafile delivery or mining on the platform itself)

Survey results for the 5 suggestions for cross-sector solutions

Standardisation best prefered,of content formats and of APIsTop 3 for all Respondents:Standardisation of FormatsOne Mining PlatformAgreed Permission Terms

Top 3 for Experts only: Standardisation of FormatsAgreed Permission TermsOne Mining PlatformExperts believe less in one platform and support standardisation even stronger, not just for content, also for APIs:

Progress since 2011 on collective solutions facilitating TDM:Commonly agreed permission terms:STM standard clause for TDM on subscribed content for non commercial TDMSTM standard clause for pharma-TDM via Pharma-Documentention-Ring (PDR)One window for mining permissions across publishers:PLS/ CLA prototype, sepcially serving small and medium sized publishersObtain minable content in one place and provide standardized APIs and content formats:CCC content mining platformCrossref project-Prospect: single point web access for multiple publishers

Questions ?

Eefke SmitDirector Standards and TechnologyInternational Association of STM [email protected]


Recommended