+ All Categories
Home > Documents > Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab...

Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab...

Date post: 26-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
79
Accessing Historical Data en masse Ian Milligan Assistant Professor
Transcript
Page 1: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Accessing Historical Data

en masse

Ian Milligan Assistant Professor

Page 2: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Hello!• Who am I?

• Ian Milligan (Assistant Professor, University of Waterloo)

• Canadian, digital, youth, and web archives.

[email protected]

• @ianmilligan1

• Slides will be all available at http://ianmilligan.ca/getting-data/, as well as links to tutorials and data

Page 3: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Why gather digitized historical data en masse?• It can let you grab data from across the globe

for minimal extra effort;

• When digitized, it can save time + effort (no more right clicking);

• Can let you explore extremely large datasets to find patterns, inferences, etc. in bodies that you couldn’t otherwise read!

Page 4: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Pitfalls?

• Digitization has proceeded unevenly: requires institutional money and support, so replicates holdings of elite + western institutions;

• We may not know how it works - Optical Character Recognition (OCR) for plain text, collection biases, etc.

Page 5: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Pitfalls?

0"

1"

2"

3"

4"

5"

6"

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Ave

rage

num

ber

of a

ppea

ranc

es, d

ivid

ed b

y ye

ar

Years

Globe

Star

Telegram

Gazette

Citizen

Gap between appearance and usage in ProQuest dissertations

Impact of Pages of the Past and Canada's Heritage Online

Pre–Pages of the Past and Canada's Heritage Online

Page 6: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Handle with Care

Page 7: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

But still, these sources present considerable power when used by the right historians

(you!)

Page 8: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Different Methods• The Dream Case

• Application Programming Interfaces (APIs)

• Scraping Data yourself (Outwit Hub)

• Computational Methods (Python, Bash, Programming Historian)

• HistoryCrawler Virtual Machine

Page 9: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

The Dream Case

• A Dream:

• For you to find on your own websites;

• And for you to create for others if you make databases…

Page 10: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

The Dream Case

• Examples

• http://edh-www.adw.uni-heidelberg.de/home

• http://www.cwgc.org/find-war-dead.aspx

• Lexis|Nexis

• Sometimes limited (i.e. CWGC to 50,000 records, Lexis|Nexis to a few hundred) which requires multiple searches

Page 11: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

http://adamcrymble.blogspot.ca/2014/01/does-your-online-collection-need-api.html

Page 12: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Or maybe you just want a few

documents?

Page 13: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Worth bookmarking• Google Books Advanced Search: http://

books.google.com/advanced_book_search

• Internet Archive Advanced Search: http://archive.org/advancedsearch.php

• Hathi Trust Advanced Search: http://babel.hathitrust.org/cgi/ls?a=page;page=advanced

• (Let’s Visit Each)

Page 14: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Google Books

• In ‘advanced search,’ select ‘Full view only’

• Do a search, pre-1923 content will be most fruitful

Page 15: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Internet Archive

Page 16: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Hathi Trust• The world’s backup drive for libraries - 4.5+ billion pages!

Page 17: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Also..

• Sometimes a colleague might have compiled this data for you…

• Shawn Graham (Carleton) has compiled a great list: https://github.com/hist3907b-winter2015/module2-findingdata

Page 18: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

But if the dream case doesn’t work out, it’s

OK.

Page 19: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Different Methods• The Dream Case

• Application Programming Interfaces (APIs)

• Scraping Data yourself (Outwit Hub)

• Computational Methods (Python, Bash, Programming Historian)

• HistoryCrawler Virtual Machine

Page 20: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

(this one is a bit difficult, but it helps us get some foundational concepts)

Page 21: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Application Programming Interfaces• API - programs talking to

each other

• In our context, it’s a way to send an HTTP request and get some responses

• (this is relatively complex, but will make more sense as we proceed through workshop)

Page 22: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

APIs

• JSON format (instead of human-readable format like HTML, machine-readable).

• So if I own 3 iPhones and an iPad (I don’t), I’d structure it like this

{  "iphones"  :  "3",  "ipads"  :  "1"  }  

Page 23: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

APIs

Page 24: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

APIs (added &fmt=json)

Page 25: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Good intro - start studying URLs

http://search.canadiana.ca/search?df=1800&dt=1900&q=psycholog*&fmt=json

http://search.canadiana.ca/support/api [instructions]

Page 26: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

URLs

• 939 pages of results (!)

• Each document in this case has a unique record key

• But we do figure out the URL formula

• http://search.canadiana.ca/search/X?df=1800&dt-­‐1900&q=psycholog*&fmt=json  

• And solve for X, where X is a value between 1 and 939

Page 27: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

URLs

• http://search.canadiana.ca/search/1?df=1800&dt-­‐1900&q=psycholog*&fmt=json  

• http://search.canadiana.ca/search/2?df=1800&dt-­‐1900&q=psycholog*&fmt=json  

• http://search.canadiana.ca/search/3?df=1800&dt-­‐1900&q=psycholog*&fmt=json  

• http://search.canadiana.ca/search/4?df=1800&dt-­‐1900&q=psycholog*&fmt=json

Page 28: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

URLs

"contributor"  :  [  

                 "oocihm",  

                 3837,

EACH item on these pages has a unique number that it, and only it, has. If we can get a list of those oocihm

numbers, we could get EVERY full text item in a database.

Page 29: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

URLs• How do we get those key values? (stay tuned)

• But once we have them, we’d see that we have a list of files like:

• http://eco.canadiana.ca/view/X/?r=0&s=1&fmt=json&api=text=1; (where X is the oocihm information)

• So a code like http://eco.canadiana.ca/view/oocihm.16278/?r=0&s=1&fmt=json&api_text=1 would get the full text of an item.

• You’d have to automate this to get all full text sources having to do with psychology. But how?

Page 30: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Downloading all the files

• We can turn to some other resources - which are a useful demonstration of how DH involves code sharing

• http://ianmilligan.ca/2014/01/07/historians-love-json-or-one-quick-example-of-why-it-rocks/

• https://canzac.wordpress.com/2014/09/02/canadiana-in-context/

• I’ll explain code and share

Page 31: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Different Methods• The Dream Case

• Application Programming Interfaces (APIs)

• Scraping Data yourself (Outwit Hub)

• Computational Methods (Python, Bash, Programming Historian)

• HistoryCrawler Virtual Machine

Page 32: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub

• A free software suite that finds ‘structure’ in web pages and grabs the information that you’re looking for.

• Free in limited version.

• https://www.outwit.com/products/hub/

Page 33: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub

• Starting database to try this out on - Suda On Line - a 10th century Byzantine Greek historical encyclopedia

• http://www.stoa.org/sol/

Page 34: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub

Page 35: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub

Page 36: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub

Adler NumberBegins after: Adler  number:  </strong> Ends before: <br/>TranslationBegins after: <div  class=“translation”> Ends before: </div>

Page 37: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub

• Step One: install Outwit Hub

• Step Two: paste URL into the bar at the top of the page

• Step Three: click ‘scrapers,’ then ‘new,’ give it a name.

• Step Four: Say no thanks to buying it (at least now).

Page 38: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can
Page 39: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub

Adler NumberBegins after: Adler  number:  </strong> Ends before: <br/>TranslationBegins after: <div  class=“translation”> Ends before: </div>

Page 40: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub

Page 41: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub

Page 42: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Outwit Hub• Press ‘Catch’ if you want to

keep going with other websites

• CATCH moves it into your memory

• Or you can press ‘Export’ when you’re done to generate a spreadsheet

• (Do a second search for ‘rome’ and see it auto catch)

Page 43: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

It’s a good introduction, but sometimes you need better tools…

Page 44: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Different Methods• The Dream Case

• Application Programming Interfaces (APIs)

• Scraping Data yourself (Outwit Hub)

• Computational Methods (Python, Bash, Programming Historian)

• HistoryCrawler Virtual Machine

Page 45: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

The Dreaded Command Line

• Most of these programs are based in a UNIX environment

• Ian Milligan and James Baker (British Library), “Introduction to the Bash Command Line.” http://programminghistorian.org/lessons/intro-to-bash

Page 46: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

The Dreaded Command Line

• Not so bad once you get into it!

• Allows you to run some pretty fine-tuned commands, and begin to rapidly move around your computer.

• Does have a learning curve, but it is worth it.

Page 47: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Basic Programming

• ProgrammingHistorian.org

• Basic programming techniques with an applied perspective

• Not general examples, but specific ones.

Page 48: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Basic Programming

Page 49: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Wget• A powerful tool for

retrieving online material

• Command line only (!)

• Easy way to install on OS X:

• Install homebrew (one line to install at brew.sh)

• and then ‘brew install wget’

Page 50: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

To install on all platforms: http://programminghistorian.org/lessons/automated-downloading-

with-wget

Page 51: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

The Internet Archive

• 15 PB of awesome historical, cultural sources

• But occasionally cumbersome to access en masse

Page 52: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Wget and the Internet Archive

• http://blog.archive.org/2012/04/26/downloading-in-bulk-using-wget/

• Let’s grab all the files relating to a given collection

Page 53: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Finding a Collection

• The Boston Public Library Anti-Slavery Collection

• https://archive.org/details/bplscas

• (but there are many other ones)

Page 54: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Finding a Collection

• Everything in the Internet Archive has a unique URL, like this: http://archive.org/details/[IDENTIFIER]

• So an item might be: http://archive.org/details/lettertowilliaml00doug

• And the collection is: http://archive.org/details/bplscas/

Page 55: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Finding a Collection

• Create a directory to store all our files

• Visit the advanced search page (http://archive.org/advancedsearch.php)

• Click on ‘collection’ - big list loads. Click on ‘bplscas’ and then search

Page 56: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Finding a Collection

• 8,265 results. That’d be a lot of ‘right clicking’ to download.

• We confirm that this is indeed what we want.

• So we go back.

Page 57: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Finding a Collection• So we do this

• Scroll down and do a search for “collection: bplscas”, we can sort by “date asc” - ascending dates, and we select CSV FORMAT.

• Number of results: 7971

• Click ‘search’ and download the file

Page 58: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Finding a Collection

• Looks like this. It’s one line per file.

• The first one is dialoguscreatura00nico

• Put that into the search bar, press enter.. and voila..

Page 59: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Finding a Collection

Page 60: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Finding a Collection

• We can now download every single entry in that list - in this case, everything within the Boston Public Library Slavery collection.

• We can decide if we want every single format (probably not), or perhaps just the TXT files, or the PDFs, etc.

Page 61: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Finding a Collection

• Step One: Open the CSV file and delete the first line that reads ‘identifier’

• Step Two: Save it as a text file - itemlist.txt

• Step Three: use WGET. Copy commands from the Internet Archive.. :)

Page 62: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Example Commands

• All files:

• wget  -­‐r  -­‐H  -­‐nc  -­‐np  -­‐nH  -­‐-­‐cut-­‐dirs=2  -­‐e  robots=off  -­‐l1  -­‐i  ./itemlist.txt  -­‐B  ‘http://archive.org/download/'  

• Certain file formats

• wget  -­‐r  -­‐H  -­‐nc  -­‐np  -­‐nH  -­‐-­‐cut-­‐dirs=2  -­‐A  .pdf,.epub  -­‐e  robots=off  -­‐l1  -­‐i  ./itemlist.txt  -­‐B  'http://archive.org/download/'

Page 63: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Our command

• We just want the TXT files

• wget  -­‐r  -­‐H  -­‐nc  -­‐np  -­‐nH  -­‐-­‐cut-­‐dirs=2  -­‐A  .txt  -­‐e  robots=off  -­‐l1  -­‐i  ./itemlist.txt  -­‐B  'http://archive.org/download/'

Page 64: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Exploring

• Now we have LOTS of text files. Or PDFs. Or EPUBs. Or whatever we want for whatever purposes.

Page 65: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Programmatically Interacting

• Caleb McDaniel’s “Data Mining the Internet Archive Collection” at http://programminghistorian.org/lessons/data-mining-the-internet-archive

• Uses the Python programming language to download metadata (information about information)

Page 66: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Programmatically Interacting

• It goes through and grabs the MARC data (library records) for everything in the Anti-Slavery Collection

• It is decently documented and we don’t have time today. However, we can steal his code.

Page 67: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Stealing Code#!/usr/bin/python  

import  internetarchive  

import  time  

error_log  =  open('bpl-­‐marcs-­‐errors.log',  'a')  

search  =  internetarchive.search_items('collection:bplscas')  

for  result  in  search:  

       itemid  =  result['identifier']  

       item  =  internetarchive.get_item(itemid)  

       marc  =  item.get_file(itemid  +  '_marc.xml')  

       try:  

Page 68: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Programmatically Interacting

• We save this file into a new directory (slavery-marc) and then run it.

• BORROWING CODE IS OK.

• On command line we could type:

• python  ia-­‐download.py

Page 69: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Programmatically Interacting

• The results!

• Using his pymarc script to generate location data.

Page 70: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Programatically Interacting

• Other tools

• Adam Crymble, “Downloading Multiple Records Using Query Strings.” [http://programminghistorian.org/lessons/downloading-multiple-records-using-query-strings]

Page 71: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can
Page 72: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Two Main Programs• obo.py

• (which contains definitions for several functions that you call)

• download-searches.py

• Where you can swap out your query and get files, download them, all without visiting the site

Page 73: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Different Methods• The Dream Case

• Application Programming Interfaces (APIs)

• Scraping Data yourself (Outwit Hub)

• Computational Methods (Python, Bash, Programming Historian)

• HistoryCrawler Virtual Machine

Page 74: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

HistoryCrawler

• Download link: http://ianmilligan.ca/historycrawler [link to repository at York University, Toronto]

• Instructions: http://williamjturkel.net/2014/09/09/creating-the-historycrawler-virtual-machine/

• Solving problems of dependencies, reproducibility, working on a virtual environment for scholars

Page 75: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

HistoryCrawler• Step One: Download

HistoryCrawler201407-32b.ova from previous links

• Step Two: Install Oracle VM Virtual Box (https://www.virtualbox.org/)

• Step Three: File —> Import Appliance —> Select the ova file to generate your machine

• Step Four: Press ‘start.’ You may have to wait ~ 1-2 minutes.

• Step Five: password is ‘go’

Page 76: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

HistoryCrawler

Page 77: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Tutorials

• Mary Beth Start (PhD Candidate, Western University, Ontario): http://marybethstart.wordpress.com/2014/09/09/getting-started-virtualbox-and-historycrawler/

• William Turkel (Associate Professor, Western University, Ontario): http://williamjturkel.net/how-to/#virtualmachine

Page 78: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

HistoryCrawler: A platform for teaching?• Does require a decent computer to run it on

• But

• eliminates problems of dependencies;

• installation issues;

• gets everybody on same platform;

• allows for sharing and reproducibility of research inputs/outputs;

• Still in progress - would love any feedback.

Page 79: Accessing Historical Data · Why gather digitized historical data en masse? • It can let you grab data from across the globe for minimal extra effort; • When digitized, it can

Conclusions, Questions & Your

Own Data?

Ian Milligan Assistant Professor


Recommended