+ All Categories
Home > Health & Medicine > Cochrane workshop2016

Cochrane workshop2016

Date post: 14-Feb-2017
Category:
Upload: petermurrayrust
View: 784 times
Download: 0 times
Share this document with a friend
32
Workshop overview • Y/our backgrounds and interests and what we want • How does mining work and what can it do for YOU/Cochrane? • Demonstration with emphasis on dictionaries. • What would YOU like a system to do? • Your dictionary/ies in action • Advanced (chemistry, diagram mining) ANY early adopter can obtain our (Open) software and run it at home for any resource (medical, agricultural, government, climate, etc.). We will help you during next 24 hours. All material CC BY.
Transcript
Page 1: Cochrane workshop2016

Workshop overview• Y/our backgrounds and interests and what we want• How does mining work and what can it do for YOU/Cochrane?• Demonstration with emphasis on dictionaries.• What would YOU like a system to do?• Your dictionary/ies in action• Advanced (chemistry, diagram mining)

• ANY early adopter can obtain our (Open) software and run it at home for any resource (medical, agricultural, government, climate, etc.). We will help you during next 24 hours.

• All material CC BY.

Page 2: Cochrane workshop2016

Cochrane UK & Ireland Symposium 2016,

Birmingham, UK, 2016-03-15

Let the Machine Help with your

Systematic Reviews

Peter Murray-Rust1,2

Christopher Kittel2

[1]University of Cambridge[2]TheContentMine

Simple, Universal, Knowledge creation and re-use

Page 3: Cochrane workshop2016

The Right to Read is the Right to Mine* *PeterMurray-Rust, 2011

http://contentmine.org

Page 4: Cochrane workshop2016

Resources• Europe PubMedCentral http://europepmc.org/ • ContentMine toolkit https://github.com/ContentMine/ • Wikidata: https://www.wikidata.org/wiki/Wikidata:Main_Page • Hypothes.is https://hypothes.is/ [1]

• Etherpad: http://pads.cottagelabs.com/p/cochrane2016

• Note: early adopters can obtain our (Open) software and run it at home…

• [1] Not used in CochraneBham workshop

Page 5: Cochrane workshop2016

Europe PubMedCentral

Page 6: Cochrane workshop2016
Page 7: Cochrane workshop2016

catalogue

getpapers

query

DailyCrawl

EPMC, arXivCORE , HAL,(UNIV repos)

ToCservices

PDF HTMLDOC ePUB TeX XML

PNGEPS CSV

XLSURLsDOIs

crawl

quickscrape

normaNormalizerStructurerSemanticTagger

Text

DataFigures

ami

UNIVRepos

search

LookupCONTENTMINING

Chem

Phylo

Trials

CrystalPlants

COMMUNITY

plugins

Visualizationand Analysis

PloSONE, BMC, peerJ… Nature, IEEE, Elsevier…

Publisher Sites

scrapersqueries

taggers

abstract

methods

references

CaptionedFigures

Fig. 1

HTML tables

30, 000 pages/day Semantic ScholarlyHTML

Facts

CONTENTMINE Complete OPEN Platform for Mining Scientific Literature

dictionaries

Page 8: Cochrane workshop2016

Dictionaries!

Page 9: Cochrane workshop2016

abstract

methods

references

CaptionedFigures

Fig. 1

HTML tables

abstract

methods

references

CaptionedFigures

Fig. 1

HTML tables

Dict A

Dict B

ImageCaption

TableCaption

MININGwith sectionsand dictionaries

[W3C Annotation / https://hypothes.is/ ]

Page 10: Cochrane workshop2016

Disease Dictionary (ICD-10)

<dictionary title="disease"> <entry term="1p36 deletion syndrome"/> <entry term="1q21.1 deletion syndrome"/> <entry term="1q21.1 duplication syndrome"/> <entry term="3-methylglutaconic aciduria"/> <entry term="3mc syndrome” <entry term="corpus luteum cyst”/> <entry term="cortical blindness" />

SELECT DISTINCT ?thingLabel WHERE { ?thing wdt:P494 ?wd . ?thing wdt:P279 wd:Q12136 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }}

wdt:P494 = ICD-10 (P494) identifierwd:Q12136 = disease (Q12136) abnormal condition that affects the body of an organism

Wikidata ontology for disease

Page 11: Cochrane workshop2016

• ChEBI (chemicals at EBI) ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/names_3star.tsv.gz)

• combined with WIKIDATA: World Health Organisation International Nonproprietary Name (P2275)

* => 4947 items in the dictionary (inn.xml)

DRUGS<dictionary title="inn"><entry term="(r)-fenfluramine"/><entry term="abacavir"/><entry term="abafungin"/><entry term="abafungina"/><entry term="abafungine"/><entry term="abafunginum"/><entry term="abamectin"/><entry term="abarelix"/><entry term="abatacept"/>

Page 12: Cochrane workshop2016

<dictionary title="funders"><!— from http://help.crossref.org/funder-registry with thanks --><entry id="http://dx.doi.org/10.13039/100001436" term="1675 Foundation"/><entry id="http://dx.doi.org/10.13039/100004343" term="3M"/><entry id=“http://dx.doi.org/10.13039/501100005957” term="8020 Promotion Foundation"/><entry id="http://dx.doi.org/10.13039/501100007139" term="A Richer Life Foundation"/><entry id="http://dx.doi.org/10.13039/100006543" term="A World Celiac Community Foundation"/><entry id="http://dx.doi.org/10.13039/100001962" term="A-T Children's Project"/><entry id="http://dx.doi.org/10.13039/100008456" term="A. Alfred Taubman Medical Research Institute"/>

11566 entries

Funders Dictionary

Page 13: Cochrane workshop2016

Dengue Mosquito

Page 14: Cochrane workshop2016

<dictionary name="genus"> <entry term="Aa"/> <entry term="Aaaba"/> <entry term="Aacanthocnema"/> <entry term="Aaosphaeria"/> <entry term="Aaptos"/> <entry term="Aaptosyax"/> <entry term="Aaroniella"/> <entry term="Aaronsohnia"/> <entry term="Abablemma"/>

Genera from NCBI TaxDump

Page 15: Cochrane workshop2016

<dictionary title="hgnc"> <entry term="A1BG" name="alpha-1-B glycoprotein"/> <entry term="A1BG-AS1" name="A1BG antisense RNA 1"/> <entry term="A1CF" name="APOBEC1 complementation factor"/> <entry term="A2M" name="alpha-2-macroglobulin"/> <entry term="A2M-AS1" name="A2M antisense RNA 1 (head to head)"/> <entry term="A2ML1" name="alpha-2-macroglobulin-like 1"/> <entry term="A2ML1-AS1" name="A2ML1 antisense RNA 1"/>

Human Genes (HGNC)

Page 16: Cochrane workshop2016

<entry term="Aaas" name="achalasia, adrenocortical insufficiency, alacrimia"/><entry term="Aacs" name="acetoacetyl-CoA synthetase"/><entry term="Aadac" name="arylacetamide deacetylase (esterase)"/><entry term="Aadacl2" name="arylacetamide deacetylase-like 2"/><entry term="Aadacl3" name="arylacetamide deacetylase-like 3"/><entry term="Aadat" name="aminoadipate aminotransferase"/><entry term="Aaed1" name="AhpC/TSA antioxidant enzyme domain containing 1"/><entry term="Aagab" name="alpha- and gamma-adaptin binding protein"/><entry term="Aak1" name="AP2 associated kinase 1"/><entry term="Aamdc" name="adipogenesis associated Mth938 domain containing"/><entry term="Aamp" name="angio-associated migratory protein"/>

Mouse genes (JAXson)

Page 17: Cochrane workshop2016

Ebola!

Page 18: Cochrane workshop2016

<dictionary title="tropicalVirus"> <entry term="ZIKV" name="Zika virus"/> <entry term="Zika" name="Zika virus"/> <entry term="DENV" name="Dengue virus"/> <entry term="Dengue" name="Dengue virus"/> <entry term="CHIKV" name="Chikungunya virus"/> <entry term="Chikungunya" name="Chikungunya virus"/> <entry term="WNV" name="West Nile virus"/> <entry term="West Nile" name="West Nile virus"/> <entry term="YFV" name="Yellow fever virus"/> <entry term="Yellow fever" name="Yellow fever virus"/> <entry term="HPV" name="Human papilloma virus"/> <entry term="Human papilloma virus" name="Human papilloma virus"/></dictionary>

Terms co-ocurring with “Zika”

Page 19: Cochrane workshop2016

<dictionary title="cochrane"> <entry term="Cochrane Library"/> <entry term="Cochrane Reviews"/> <entry term="Cochrane Central Register of Controlled Trials"/> <entry term="Cochrane"/> <entry term="randomize"/> <entry term="meta-analysis"/> <entry term="Embase"/> <entry term="MEDLINE"/> <entry term="eligibility"/> <entry term="exclusion"/> <entry term="outcome"/> <entry term="Review Manager"/> <entry term="STATA"/> <entry term="RCT"/></dictionary>

Terms lexically related to “meta-analysis”

Page 20: Cochrane workshop2016

Mining strategy• Discover. negotiate permissions . => bibliography• Crawl / Scrape (download), documents AND

supplemental • Normalize. PDF => XML• Index: facets => Facts and snippets (“entities”)• Interpret/analyze entities => relationships,

aggregations (“Transformative”) • Publish

Page 21: Cochrane workshop2016

catalogue

getpapers

query

DailyCrawl

EuPMC, arXivCORE , HAL,(UNIV repos)

ToCservices

PDF HTMLDOC ePUB TeX XML

PNGEPS CSV

XLSURLsDOIs

crawl

quickscrape

normaNormalizerStructurerSemanticTagger

Text

DataFigures

ami

UNIVRepos

search

LookupCONTENTMINING

Chem

Phylo

Trials

CrystalPlants

COMMUNITY

plugins

Visualizationand Analysis

PloSONE, BMC, peerJ… Nature, IEEE, Elsevier…

Publisher Sites

scrapersqueries

taggers

abstract

methods

references

CaptionedFigures

Fig. 1

HTML tables

30, 000 pages/day Semantic ScholarlyHTML

Facts

CONTENTMINE Complete OPEN Platform for Mining Scientific Literature

Page 22: Cochrane workshop2016

Demo

PMR runs getpapers and ami

Chris runs Python visualization of drug co-occurrence

Page 23: Cochrane workshop2016

Systematic Reviews

Can we:• eliminate true negatives automatically?• extract data from formulaic language?• mine diagrams?• Annotate existing sources?• forward-reference clinical trials?

Page 24: Cochrane workshop2016

Polly has 20 seconds to read this paper…

…and 10,000 more

Page 25: Cochrane workshop2016

ContentMine software can do this in a few minutes

Polly: “there were 10,000 abstracts and due to time pressures, we split this between 6 researchers. It took about 2-3 days of work (working only on this) to get through ~1,600 papers each. So, at a minimum this equates to 12 days of full-time work (and would normally be done over several weeks under normal time pressures).”

Page 26: Cochrane workshop2016

400,000 Clinical TrialsIn 10 government registries

Mapping trials => papers

http://www.trialsjournal.com/content/16/1/80

2009 => 2015. What’s happened in last 6 years??

Search the whole scientific literatureFor “2009-0100068-41”

Page 27: Cochrane workshop2016
Page 28: Cochrane workshop2016

Diagram Mining

Page 29: Cochrane workshop2016

Ln Bacterial load per fly

11.5

11.0

10.5

10.0

9.5

9.0

6.5

6.0

Days post—infection

0 1 2 3 4 5

Bitmap Image and Tesseract OCR

Page 30: Cochrane workshop2016
Page 31: Cochrane workshop2016
Page 32: Cochrane workshop2016

Recommended