+ All Categories
Home > Documents > BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman...

BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman...

Date post: 27-Mar-2015
Category:
Upload: william-mcfadden
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
19
BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé – Data integration made easier ~1515 – break ~1530 – Todd O’Brien – Better data, better science ~1600 – Discussion session – your input to the cookbook and the way forward
Transcript
Page 1: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

BEER* workshop1300 – Raymond Pollard – Being a Data Scientist is FUN!

~1345 – Robert Groman – Has data management gone mainstream?

~1430 – Gwen Moncoiffé – Data integration made easier

~1515 – break~1530 – Todd O’Brien – Better data, better science

~1600 – Discussion session – your input to the cookbook and the way forward

*Being Efficient and Environmentally Responsible

Page 2: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

IMBER BEER Imbizo

• Welcome• Times and discussion - ask (or write down) pertinent questions - this is a workshop

• Tea/coffee/BEER• Who are we

CROZEX and Crozet (Possession Island)

Page 3: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

IMBER Data Management

• Data Management Committee• Arrange data by Project• First task is to engage and educate researchers to how good organization of data will benefit them

• Before, during and after the project field phase

Page 4: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

The bottom line

• DM cannot be an afterthought

• If you give DM some thought when you first plan a project, it will be– relatively straightforward– not too much effort– remarkably useful to all participants

– valuable to those who come after

Page 5: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

DM topics (data management)

• Cookbook (http://planktondata.net/imber/)

• Recognition for DM• Data Scientist• Best Practice (e.g. BCO-DMO*)• Data and Metadata (e.g. CSR) cruise

summary

• Data Centers – national (e.g. BODC)• Data Centers – specialist (e.g. OBIS, CCHDO, COPEPOD)

• IMBER Data Portal

Biological and Chemical Oceanography Data Management Office

Page 6: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Writing papers

• Writing papers is an essential part of a researcher’s job

• Writing papers is time consuming• Writing papers is tedious/boring• Writing papers needs attention to detail

• Publications are a legacy of your research

Page 7: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Data management

• Data management is an essential part of a researcher’s job

• Data management is time consuming

• Data management is tedious/boring

• Data management needs attention to detail

• Data sets are a legacy of your research

Page 8: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

So why do we accept that we must write papers, but treat

DM as the poor relation?

• Because everybody else does!• Because we get recognition for publishing• But not for DM - seek to change this• But in fact:• Our published interpretation may be wrong• A good data set can be reinterpretted (..Fe)

• So the data set is a more objective legacy• of a cruise (say) which cost a huge amount and cannot easily be repeated

Page 9: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Recognition for DM

• Carrots and sticks• SCOR is considering how to allocate DOIs (Digital Object Identifiers) to data sets– At what level?– Quality control?

• Put it on your CV– Act as Data Scientist to a project/cruise

– Breadth of interest– Management experience– Contribute to promotion/pay rise

Page 10: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Being a Data Scientist is FUN!

Raymond Pollard

Page 11: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

So, what is a Data Scientist?

• The Data Scientist is someone who helps and advises the project/cruise Principal Scientist and researchers to document their data sets so that they are properly described

• The DS also interacts with PIs and Data Specialists to calibrate, validate, save and archive data

• Why is it FUN? - because you learn so much yourself by having to talk to people

• Can be full or part-time; paid or unpaid; hire, cajole or volunteer

Page 12: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Key role 1 - talking to people

• Find out what they do and how they document it - methods, accuracy, …

• What do they need from others - positions, water temperature, …

• How do they store and back up their data. Do they back it up??!

• What do they do with the data - calibrate, compare, sort, …

Page 13: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Range of data

• Be aware of huge range of data types and quantities. People are blinkered by their own experience

• E.g.volumes:– nutrients - 24 values per CTD cast– T&S - 5,000 to 100,000 values per cast

– Turbulence - millions• Storage

– Nutrients - PC spreadsheet– T&S, navigation - central workstation

– Turbulence - dedicated workstation

Page 14: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Key role 2 - helping PIs

• back up their data– paper copies– copy to central server

• document their data, e.g.– help with metadata– create forms for them

• obtain data from others for them

• by masterminding an Event Log

Page 15: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Key role 3 - documentation

• Document as much as possible yourself• Take copies of PI’s handwritten records• Use USB stick to copy their spreadsheets– be diplomatic– assure them you will NOT copy to others– emphasize the value of duplication

• Create your own summary spreadsheets

Page 16: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Key role 4 - assist Principal Scientist

• Help PS enforce unique referencing

• Maintain and post an Event Log– of stations occupied– accurate station times and positions, etc

• Quietly advise PS if a PI is not coping– with data rate– documentation

• Prepare or help PS prepare CSR

Page 17: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Why can’t the PS do most DS tasks?

• Not his priority (optimize cruise program)

• Maybe not his forte• Too much work

Page 18: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

Possible role 5 - primary data

• Scientists often seem to assume that universally required data (time, navigation, CTD depth, temp, surface and met data) appears from thin air

• In fact, those data need careful calibration

• DS may need to do this, if no other person is responsible – at least check it

• e.g. WHPO => CCHDO GEOTRACES (Chris Measures)

Page 19: BEER* workshop 1300 – Raymond Pollard – Being a Data Scientist is FUN! ~1345 – Robert Groman – Has data management gone mainstream? ~1430 – Gwen Moncoiffé

What does the DS gain?

• Broadening your experience, learning from other PIs

• Advancing your own DM skills• Great management training! (listening to others, looking for problems)

• Looks great on your CV• You might even get paid


Recommended