+ All Categories
Home > Science > Optimising the use of existing knowledge

Optimising the use of existing knowledge

Date post: 14-Jul-2015
Category:
Upload: velterop
View: 626 times
Download: 0 times
Share this document with a friend
Popular Tags:
57
Defragmentation: Maximising the Use of Existing Knowledge Jan Velterop APE 2015 Berlin 21 January 2015
Transcript

Defragmentation:

Maximising the Use of

Existing Knowledge

Jan Velterop — APE 2015 — Berlin 21 January 2015

Open Access…

…is not the goal

It is a means

to reach the goal

And the goal is…?

Maximal usefulness of existing

scientific research results in order to

achieve:

efficient, fast, and effective new

knowledge creation and discovery

i.e. highest

possible return on

public investment

optimal dissemination…

…of knowledge

The ultimate goal, to which

Open Access is merely a

means, may not be widely

understood – by publishers

The ultimate goal, to which

Open Access is merely a

means, may not be widely

understood

That may be why there are

a lot of different

interpretations of what

Open Access actually is

(in spite of the clear definition given in the

Budapest Open Access Initiative)

The fact that not all published

research is accessible to all

researchers, leads to ‘lamp

post research’

Lamp post research

Looking merely at the literature that

one can access – which is not

necessarily the literature that is

potentially important to one’s research

Lamp post research:

Publicatarrh

&

Datarrhoea

In year Cumulative

Number of abstracts in PubMed

11,135,542

In year Cumulative

Number of abstracts in PubMed

…averaging

more than 2

abstracts

added every

minute

in 2014…

On the impossibility of being expert

BMJ 2010; 341 doi: http://dx.doi.org/10.1136/bmj.c6815 (Published 14 December 2010)

More scientific and medical papers are being

published now than ever before.

Authors Alan G Fraser and Frank D Dunstan

think that new strategies are needed to deal with

this avalanche of information

new strategies are needed

How does a researcher

decide what’s ‘relevant’

anyway?

How are we filtering or

choosing?

Possible

solutions?

problemEvery has its solution

problemEvery has itssolution

Possible solutions?

Publish fewer articlesDon’t be ridiculous!

Find better ways to decide what’s

truly relevantNow you’re talking!

First

create an overview…

…only then

start digging

We need the equivalent of aerial surveys

— ‘knowledge drones’? —

Some of my professors were already known as

‘knowledge drones’ :-)

How might we create overviews?

Getting the picture from a large number of data points

‘Whole-o-gram’

Getting a better picture from even more data points

Homing in

on detail

It’s not just about finding

information

It’s also – and possibly more –

about the value & power of

‘recombinant knowledge’

Saving significant time-to-knowledge

After analysis in BRAIN: 4 minutes

Arriving at this conclusion (review in Frontiers Immunology)

after reading 221 papers: weeks

5

“Chronic immune activation is the primary driver in HIV pathogenesis”

What stands in the way?

different…

• publishers

• journals

• platforms

• licences

• formats

• silos

• languages

First of all: fragmentationAnd also, of course: access

(lack of)

Not to the whole article…but to the data

and assertions buried in them

Plenty of initiatives to find stuff:

• PubChase – Open Access Biomedical Journal

Reference Library

• Paperity

• SciLit – Database of Scientific and Scholarly

Literature

• Google Scholar

• Et cetera

Some go further:

• Europe PubMed Central – offering semantic

tools

0

1000000

2000000

3000000

4000000

TitleFull-text in PMC

of which with CC-licence

all full-text articles in PubMedCentral (100%)

all articles with CC-licences (11.9%)

all articles with CC-BY licences (8.7%)

3,087,430

366,973 270,114

Europe-PMC, 19 December 2014“The majority of articles in PMC are subject to traditional copyright restrictions”

Not many ‘true’ open access:

What we need is information

extracted from as many articles as

possible

The more we have, the ‘sharper’

the knowledge picture

Fragmentation and lack of access are

encumbrances to seamless knowledge-

pattern-analyses and themed collection

building (e.g. of graphs)…

…which are fast becoming an absolute

necessity due to the vast amounts of

published material, growing every year,

and, of course, in the aggregate

“As the rate of publishing accelerates,

the need for computational support to

work out which articles to read, and how

to interpret, reproduce and validate the

claims they contain is growing.”

Quote from ‘Lazarus’:

http://www.bbsrc.ac.uk/pa/grants/AwardDetails.aspx?FundingReference=BB/L005298/1

Traditional publications are aimed at

consumption by humans;

“stories that persuade with data”*

Not easily amenable to

machine-processing

* Anita de Waard, Elsevier

In the life-science literature, we typically find:

• drug-like molecules represented as illustrations;

• biochemical properties as tables or graphs;

• protein/DNA sequences buried amongst text;

• references and citations with arcane formats;

• other objects of biological interest being given

ambiguous names.

And, horrors like this (from PLOS, h/t Peter Murray-Rust):

+ (plus underscored) isn’t the same as ± (plus-minus)!

• re-type figures from tables;

• chase citations through digital

libraries;

• redraw molecules by hand;

• et cetera.

tedious, error-prone, wasteful

scientists should be able to use

their precious time better

This creates the need to:

ocumentsVia UD, LAZARUS ‘resurrects’ knowledge from being

buried in articles:

• entities (‘concepts’, incl. synonyms, e.g. proteins)

• phrases, statements, assertions (e.g. triples)

• molecules (incl. Markush structure groups)

• graphs

• tables http://utopiadocs.com

• entities (‘concepts’, incl. synonyms, e.g. proteins)

• phrases, statements, assertions (e.g. triples)

• molecules (incl. Markush structure groups)

• graphs

• tables

These are captured – with their provenance, e.g.

DOI – in a ‘Knowledge Graph’ of their relationships

When assertions are captured, they are compared to

the Knowledge Graph and labelled as ‘new’ (to the

Graph) or ‘already found earlier’

“Lazarus to harness the crowd reading life-

science articles to resurrect the swathes of

legacy data buried in charts, tables, diagrams

and free-text, to liberate processable data into a

shared resource that benefits the community.”

“…activities currently carried out anyway by

individuals for their own purposes (annotating,

cross-referencing articles with databases,

organising collections of articles).”

VHL protein binds to HIF-α which is ubiquitinated and tagged for degradation in the proteasome.

These ‘assertions’ form the ‘knowledge

profile’ of an article, and are added to a

growing ‘knowledge graph’ which can

be analysed for trends, clusters, areas

of intensive activity, et cetera.

Some other initiatives to bring

the open literature together so

that it can be used for large

scale semantic analyses:

libraccess.org

The goal of Libraccess is to

aggregate, de-duplicate, clean and

index scientific resources in open

access repositories, from

all countries, from all disciplines,

and make them available to all,

through a website and with APIs.

Research Pad

Open Access Journal Reference Library

(www.researchpad.co)

Converting all that’s open (CC-BY) into ePub format

for tablets and smartphones.

What I find most interesting, however, is their plan*

to make the whole body of all literature that’s openly

accessible available in XML for semantic analysis†

* being worked on as we speak, they confirmed to me†

I hope they will add the ‘knowledge profiles’ of paywalled

articles created by Lazarus

Build collection of favouritesRead full textshare with othersInspect metrics

Thank you

Jan Velterop — APE 2015 — Berlin 21 January 2015

[email protected]


Recommended