+ All Categories
Home > Documents > Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt...

Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt...

Date post: 24-Sep-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
30
Arcas: Using Python to access open research literature @NikoletaGlyn
Transcript
Page 1: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

Arcas: Using Python to access open researchliterature

@NikoletaGlyn

Page 2: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might
Page 3: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

The illustrated guide to a Ph.D.

Matt Might

http://matt.might.net/articles/phd-school-in-pictures/

Page 4: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might
Page 5: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

ARTICLE

JOURNAL REVIEW

PUBLISHED

Page 6: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

Sustainable Software

Page 7: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might
Page 8: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might
Page 9: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

0.5min+ 100× 1.5min+ 10× 0.5min =155.5min⇒ 2h and 35.5min

Page 10: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

API

Page 12: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might
Page 13: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

15min+ 1min+ 50min = 66min⇒ 1h and 6min

Page 17: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

API1Query

XML

API2Query

XML

API3Query

XML

API4Query

XML

API5Query

XML

API6Query

XML

Page 18: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

ARCAS

API1Query

XML

API2Query

XML

API3Query

XML

API4Query

XML

API5Query

XML

API6Query

XML

Page 19: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

$ pip install arcas

Page 20: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

>>> import arcas

>>> api = arcas.Arxiv()

>>> parameters = api.parameters_fix(

... title=’sustainable software’, records=1, start=1)

>>> url = api.create_url_search(parameters)

>>> request = api.make_request(url)

>>> root = api.get_root(request)

>>> raw_article = api.parse(root)

>>> article = api.to_dataframe(raw_article[0])

>>> api.export(article, "result.json")

Page 21: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

{"key":{"0":"Ahern2013"},

"unique_key":{"0":"698d27415f69258ef122f46b184a77e0"},

"title":{"0":"VisIt: Experiences with Sustainable Software"},

"author":{"0":"Sean Ahern","1":"Eric Brugger"},

"abstract":{"0":" The success of the VisIt visualization..."},

"date":{"0":2013},

"journal":{"0":"arXiv"},

"provenance":{"0":"arXiv"}}

Page 22: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

>>> for p in [arcas.Arxiv, arcas.Nature, arcas.Ieee, arcas.Plos]:

... api = p()

... parameters = api.parameters_fix(

... title=’sustainable software’, records=1, start=1)

... url = api.create_url_search(parameters)

... request = api.make_request(url)

... root = api.get_root(request)

... raw_article = api.parse(root)

... try:

... for art in raw_article:

... article = api.to_dataframe(art)

... api.export(article, "result_from_{}.json".format(

... api.__class__.__name__))

... except TypeError:

... pass

Page 23: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

15min+ 5min = 20min

Page 24: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

2000

2002

2004

2006

2008

2010

2012

2014

2016

2018

year

2

4

6

8

10

12

14

16

num

ber o

f rec

ords

Articles per Year (N = 87)

Page 25: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

2000

2002

2004

2006

2008

2010

2012

2014

2016

year

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0nu

mbe

r of r

ecor

dsProvenance

IEEEarXivPLOS

Page 26: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might
Page 27: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

Birgit Penzenstadler

Page 28: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

Arcas

tools.py

doc/

arcas.readthedocs.io/

ieee nature

arxiv . . .

test ieee test nature

test arxiv . . .

Page 29: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

$ arcas_scrape --version

Arcas 0.0.3

$ arcas_scrape -p arxiv -t "Sustainable Software" -r 1

http://export.arxiv.org/api/query?search_query=ti:Sustainable

Software&max_results=1&start=1

Page 30: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might

@NikoletaGlynhttps://github.com/ArcasProject/Arcas

https://nikoleta-v3.github.io


Recommended