Date post: | 21-Jan-2016 |
Category: |
Documents |
Upload: | maurice-joseph |
View: | 221 times |
Download: | 0 times |
Community-Supported Data Repositories in Paleoecology and Paleoclimatology: The ‘Middle
Tail’ between Geoscientific Users and Geoinformatics
Neotoma DBwww.neotomadb.org C4P
Jack Williams, Allan Ashworth, Brian Bills, Jessica Blois, Don Charles, Simon Goring, Russ Graham, Eric Grimm, Alison Smith, & Mark Uhen
Part I: Building the Middle Tail: Community-Led Data RepositoriesPart II: Interconnecting the Middle Tail: Cyberinfrastructure for the Paleogeosciences
Many Big Questions require assembly of individual paleorecords into larger networks
Do global temperatures lead or lag CO2 during deglaciations?
21,000 11,000 Modern15,000 7,000
%
Spruce distributions: last glacial maximum to present
%
%
%
No Data
Williams et al. (2004) Ecological Monographs
SprucePollen
Ice IceIce
How far and fast can species migrate when climates change?
Global temperatures & CO2: 22ka->0ka
Shakun et al. (2012) Nature
Paleoecological Data: Key characteristics• ‘Long Tail’: Collected in the field by small
scientific teams. Scientists vary w.r.t. data management expertise, capacity, interest
• Highly valuable: specimens & samples collected decades ago are still analyzed
• Distributed scientific expertise: by proxy type, region, time period, and/or taxonomic group
C4P
“Big Data”
“Long Tail”
Datasets
Dat
a S
ize
Neotoma DBwww.neotomadb.org
Solution: Community-Led Data Repositories (COLDARs) as ‘middle tail’ for long-tail data
Neotoma DBwww.neotomadb.org
Key Characteristics
Open Data
Curated by Community
Added Value by serving community-specific needs (e.g. age models, taxonomy)
Paleobiology DBpaleobiodb.org
Neotoma DBwww.neotomadb.org
accessible
small data
BIG DATA
findable
identification,persistence
authorization,protocols
context,provenance
re-usable
harmonized, community governance & input
interoperable
“… data have no value or meaning in isolation; they exist within a knowledge infrastructure — an ecology of people, practices, technologies, institutions, material objects, and relationships.” - C.L. Borgman
Moving up the Value Chain: Generic Depositories vs. Community-Led Repositories
Modified from K. Lehnert
Community-Led Repositories
Generic Depositories
Neotoma Paleoecology Database: Community-Led Repository for Quaternary and Pliocene Data
Design Concepts• Spatiotemporal Database: species
occurrences & abundances in space & time
• Age Controls and Age Models stored
• Centralized IT and Distributed Scientific Governance Neotoma composed of several constituent databases (e.g. North American Pollen Database, FAUNMAP)
• Open Data accessible via Explorer, APIs, R Neotoma
• Broad User Community: Paleoecologists, ecosystem modellers, paleoclimatologists, biogeographers, educators, … Neotoma DB
www.neotomadb.org
• Time: Late Neogene (~last 5 million years)
• Most records: 104-105 yrs• Space: North American to
Global• Paleoecological Data
• Plants & pollen• Vertebrates• Ostracodes• Diatoms• Insects• Testate Amoebae• Physical Sedimentology
Brewer et al. 2012 TREE
Neotoma Domain Temporal Domains of Paleoecological Databases
Neotoma DBwww.neotomadb.org
Recent uploads to Neotoma Pubs Citing Neotoma & Constituent DBs
Neotoma Uploads, Citations, and Usage
Last updated: July 2015
2014 Usage StatisticsNeotoma Explorer: 1,918 unique usersNeotoma APIs: 1,562 unique usersNeotoma APIs: 241,469 requests
Neotoma DBwww.neotomadb.org
Data Preparation & Submission
Data Search & Retrieval
Neotoma Explorer
APIs
neotoma (R)
Neotoma DB
Tilia
Data Exploration & Visualization
Data Archival
Ice Age Mapper Niche ViewerStratigraphic DiagramsExplorer
Data Submission Web Application Downloadable
Database Snapshots
Neotoma Software EcosystemExists
In Development
Amoebae Data StewardsDeveloper Team
Bills (lead)AndersonBucklandDavisGoringGrimmRothWilliams
Executive Team
Grimm, Williams + 1 more
Users & Informaticists
Paleobiological Data Consortium
Neotoma Leadership Council
Graham, Blois, Davis, Barnosky, Colburn, Etnier, Jacisin, Maguire, Milideo, Smith, Warren
Josh Miller, Russ Graham
Grimm, Williams, Bills + 1 Developer & 3 Data Stewards
Bob Booth
Betancourt, Holmgren, Latorre, Rylander
Ashworth, Buckland, Punel
Alison Smith, Brandon Curry
Don Charles, Sonja HausmannBob Booth
Suzanne Pilaar Birch, Chris WidjaJon Nichols
Grimm, Bradshaw, Giesecke, Williams, Goring, Evans, Fletcher, Hopf, Markgraf, McGeever, Mitchell
Training Workshops
Diatoms
Insects
Middens
Pollen
Plant Macros
Vertebrates
Biomarkers
Isotopes
Taphonomy
Ostracodes
Neotoma Governance (Proposed)
Neotoma DBwww.neotomadb.org
Next Challenge: Organizing and Interconnecting the Middle Tail
C4P CINERGI Catalog: 224 Databases, 23 with geologic time metadata
C4PCINERGIhttp://pivots2.azurewebsites.net/c4p.html#pv-file-selection
EarthCube RCN: Cyberinfrastructure for Paleobioscience (C4P)
Goals
Build new partnerships and collaborations among geoscientists and technologists
Survey and catalog existing resources
Share news of the latest advances in cyberscience and paleogeoinformatics
Facilitate development of common standards and semantic frameworks C4P
EarthCube RCN: Cyberinfrastructure for Paleobioscience (C4P)
C4P
Activities
• Webinars & YouTube Channel: https://www.youtube.com/user/cyber4paleo
• CINERGI Catalog of paleoresources (databases, software, etc.) http://earthcube.org/content/cinergi-c4p-resource-viewer
• Paleobiology Workshop (May 2014)
• Geochronology Workshop (Oct 2014)
• Early Career Workshops – GSA 2014, 2015
• New Initiatives: Paleobiological Data Consortium (Neotoma/PBDB/…, PBDB-iDigBio, Open Core Data (CDSCO/IEDA/Neotoma/…)
PALEOBIOLOGICAL DATA
CONSORTIUM
COMMUNITYGEODATA
OPEN-SOURCE
BIODATA
Paleobiology DB
NOW DBContinental Scientific Drilling Office (CDSCO)
Digimorph
NOAA Paleoclimatology
DarwinCore
iDigPaleo
MorphoBank
Neotoma DB
VertNet
Early Career Members-at-Large
ROpenSci
GBIF/BISON
STEPPE
Open Geospatial Consortium
Integrated Earth Data Alliance
iDigBio
C4P
• Share best practices & protocols
• Build compatibility between geo- & bioinformatics
Current & Future Neotoma, C4P, & PDC Activities
1. Data Uploads (Neotoma; e.g. MIOMAP, Mexican Quaternary Mammal DB, ongoing)
2. All Hands Neotoma Workshop at AGU (Neotoma; Dec 2015)
3. One-Stop Queries for Neotoma & Paleobio DBs (Harmonized APIs & R packages) (PDC, ongoing)
4. Hackathon for Paleobiological Data (C4P; Summer 2016, invitations TBD!)
5. New tools for data visualization & exploration (Neotoma Taxa Mapper & Niche Viewer)
Neotoma DBwww.neotomadb.org C4P PDC
Sounds great! What’s in it for me?1. Interested in using Neotoma to archive your data and
make it available to others?• Catch me after session• Talk to a Data Steward• WebEx training for new Stewards
2. Interested in using Neotoma & other paleobio resources?• Neotoma Explorer walkthrough exercise: http://
serc.carleton.edu/neotoma/activities.html • neotoma (R) paper (Goring et al. 2015 Open Quaternary)• User workshops: ESA2016, IBS2017• Hackathon Summer 2016
3. Interested in integrating your resource (software/DBs) to Neotoma & other paleobio resources?• Catch me after session• Hackathon Summer 2016
Neotoma DBwww.neotomadb.org C4P PDC
This talk represents the work of many
Neotoma PIs & Developers: Eric C. Grimm, Russ Graham, Mike Anderson, Allan Ashworth, Brian Bills, Jessica Blois, Bob Booth, Ed Davis, Don Charles, Simon Goring, Steve Jackson, Alison Smith, Jack Williams
C4P RCN Steering Committee: Kerstin Lehnert, David Anderson, Doug Fils, Leslie Hsu, Chris Jenkins, Anders Noren, Tom Olsewski, Dena Smith, Mark Uhen, Jack Williams
Neotoma DBNSF-Geoinformatics
NSF-Earth Cube
Eric Grimm
C4P
Paleobiological Data Consortium: Mark Uhen, Jack Williams, Brian Bills, Jessica Blois, Ed Davis, Simon Goring, Russ Graham, Michael McClennen, Shanan Peters, Alison Smith
NSF-Earth CubePaleobio Data Consortium