Date post: | 25-May-2015 |
Category: |
Technology |
Upload: | university-of-california-curation-center |
View: | 1,489 times |
Download: | 1 times |
DCXL: Digital Curation for Excel
Carly Strasser UC3, California Digital Library [email protected]
Funders: Gordon & Betty Moore Foundation, Microsoft Research
22 Sept 2011 UC3 Webinar Series California Digital Library
Build on existing cyberinfrastructure
Create new cyberinfrastructure
Support communities
Community Engagement
Roadmap
4. How to get involved in DCXL
1. An overview: why is DCXL needed?
2. Goals of DCXL project 3. Progress & future plans
Digital data +
Complex workAlows
Data
Maximum Likelihood estimation
Matrix Models
Images Tables Paper
Models
UGLY TRUTH
are not taught data management
don’t know what metadata are can’t name data centers or repositories
don’t share data publicly or store it in an archive
aren’t convinced they should share data
5shortessays.blogspot.com
Most Earth | Environmental | Ecological scientists…
From Stephanie Hampton (2010) ESA Workshop on Best Practices
2 tables Random notes
From Stephanie Hampton (2010) ESA Workshop on Best Practices
Wash Cres Lake Dec 15 Dont_Use.xls
9
Collaboration and Data Sharing
What is this?
The path of research products
Data
Metadata
Recreated from Klump et al. 2006
www �
noaa.gov www.collectionconnection.alcts.ala.org
www.Tlickr.com/photos/csessums
blog.seattlepi.com blog.disorder2order.com
Data Sharing
Data Management
Data Reuse
Data
Metadata
Recreated from Klump et al. 2006
www �
The path of research products
www �
noaa.gov www.collectionconnection.alcts.ala.org
digital-servers.com
Barriers
Cost
Software, hardware Personnel
Time
cultblender.wordpress.com
ttatteredntornprims.blogspot.com/
Barriers
• Not the norm • Lack of training • Disparate data
Cost: time, personnel, software, hardware Culture of Science
free-photos.biz
Barriers
ConZlict
Missed opportunities
Misuse of data
Cost: time, personnel, software, hardware Culture of Science Loss of rights or bene:its
colouringbook.org
wattsupwiththat.com
Barriers
Cost: time, personnel, software, hardware Culture of Science Loss of rights or bene:its Lack of incentives
Reward structure
Few requirements
Time consuming & expensive
georgevanantwerp.com
Roadmap
4. How to get involved in DCXL
1. An overview: why is DCXL needed?
2. DCXL project overview 3. Progress & future plans
DCXL Project Goals
• Increase interoperability publishability archivability
• Focus on atmospheric, ecological, hydrological, and oceanographic data
“A transformation in the conduct of a segment of scientiTic research by enabling and promoting publishing, sharing,
and archiving of tabular data”
= Sharing = Publishing = Archiving
DCXL Project Goals
Open Source & Free Excel Add-in
Software program that extends the capabilities of larger programs
Complements basic Excel functionality From www.webopedia.com
www.ablebits.com
DCXL Add-in Goals
Archiving
Sharing
Publishing
Easier
Harder
DCXL Project Deliverables
• Excel add-‐in • Publicly available source code • Technical documentation • End user documentation • Publicly available requirements
• Community
storageplusgulfport.com
DCXL Project Outcomes
Enable citation & allow credit Enable policy enactment Enable re-‐use by eliminating barriers Save time for researcher Encourage creation of extensions
Process
Assess needs • Quantitative
– Surveys
Process
Assess needs • Quantitative
– Surveys – Quick poll
Process
Assess needs • Quantitative
– Surveys – Quick poll
• Qualitative – Interviews
?
Process
Assess needs Gather requirements
Recruitment tools DCXL/data management seminars Listservs & email Blog, Facebook, Twitter Face-‐to-‐face interactions Flyers
Process
Assess needs Gather requirements
Locations Conferences UC campus visits Remote/web-‐based
Process
Assess needs Gather requirements
Stakeholders & contributors Libraries Scientists Repositories Experts: MSR, GBMF Personnel on related projects
Process
Requirements
Quick poll Survey Interview
Email Seminars Flyers
Social media
Social media, emails, campus visits
Social media, emails
Scientists
Data Centers Libraries
Funders Related projects
CDL
Implementation
Assess needs Gather requirements Build requirements document
Implementation
Assess needs Gather requirements Build requirements document Build community
Libraries Scientists Repositories Programmers/Developers
26 Sept DCXL Kickoff Meeting
7 Oct Finalize Requirements Gathering Framework
9 Nov 1st draft of Requirements to MSR
30 Nov 2nd draft of Requirements to MSR
5-9 Dec AGU Meeting, San Francisco
15 Dec Final Requirements to MSR
2012
16 Jan Receive Excel Add-in Version 1
23 Jan Rollout Excel Add-in Version 1
16-19 Feb AAAS meeting: Add-in user testing
20-24 Feb Ocean Sciences meeting: Add-in user testing
26 Feb 1st Draft of updated Requirements based on Version 1 to MSR
2 Apr Deliver updated Requirements based on Version 1 to MSR
28 May Receive Excel Add-in Version 2
29 May- 24 Jun User testing of Version 2
25 Jun Rollout Excel Add-in Version 2
7-10 July CSEE meeting: Add-in debut & demo
13 July Final code, technical documentation, and requirements published
31 July End user documentation published
Timeline
Roadmap
4. How to get involved in DCXL
1. An overview: why is DCXL needed?
2. DCXL project overview 3. Progress & future plans
Ecological Society of America Summer 2011 Meeting
ESA Overview
• Everyone uses Excel – Most use Excel for organizing raw data – Most import spreadsheets into other programs for analysis – ~75% are embarrassed about using Excel
• Excitement about open source • Minimal knowledge about data management, organization, and archiving
• 55 surveys from diverse group
0
5
10
15
20
25
30
35
40
45
50
Mac PC Linux
Operating System
0 10 20 30 40 50 60
Organization
Visualization
Statistics
Other Analyses
Sharing
Use Excel for...
# Respondents (out of 55)
0
5
10
15
20
25
30
Never Every day
# repsondents
How often do you use Excel?
Rarely Every day
0 10 20 30 40 50 60 70 80 90 100
Multiple Tables
Multiple Tabs
Pivot Tables
Headers
Embedded formulas
Macros
Cell shading
Comments
Percent
What features are used in Excel?
American Fisheries Society Summer 2011 Meeting
Ray Troll (trollart.com)
AFS Overview
• Everyone uses Excel • Most use it only for data organization and sharing • 36 surveys from diverse group • Heavy MS Access use • 100% PC
0
2
4
6
8
10
12
14
16
18
Rarely Every day
# respondents
How often do you use Excel?
0 10 20 30 40 50 60 70 80 90 100
Organizing data
Visualizing data
Statistics
Simple Calculations
Sharing data
Tasks performed in Excel?
% respondents (n = 36)
0
10
20
30
40
50
60
Organize my data for my own use
Organize my data for others to use more easily
Archive my data
Create metadata
Share my data publicly
No opinion
% Respondents
What should the add-in help you do?
AFS Overview
• Everyone uses Excel • Most use it only for data organization and sharing • 36 surveys from diverse group • Heavy MS Access use • 100% PC • Data hoarders
Myoverstuffedbookshelf.blogspot.com
Roadmap
4. How to get involved in DCXL
1. An overview: why is DCXL needed?
2. DCXL project overview 3. Progress & future plans
Get Involved
Now: General info Blog Forum Calendar
dcxl.cdlib.org
Later: Requirements Documentation
Get Involved
@dcxlCDL
www.facebook.com/DCXLatCDL
Acknowledgements
• CDL: Rachael Hu, Trisha Cruse, John Kunze, Tracy Seneca • MSR: Lee Dirks • GBMF: Chris Mentzel
Carly Strasser [email protected]