+ All Categories
Home > Documents > Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano...

Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano...

Date post: 21-Jan-2016
Category:
Upload: angela-arnold
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
S Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation Laboratory – LAA Computing Engineering Dept., Engineering School Universidade de São Paulo, Brazil
Transcript
Page 1: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

S

Data Quality Resources in Species Occurrence

Digitization

Allan Koch VeigaEtienne Americo Cartolano Jr

Antonio Mauro Saraiva

Agricultural Automation Laboratory – LAAComputing Engineering Dept., Engineering School

Universidade de São Paulo, Brazil

Page 2: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Outline

Background

Biodiversity Data Digitizer (BDD) & IABIN

Data Quality Methodology

Data Quality Tools BDD Geo Tool BDD Taxon Tool

Conclusion

Page 3: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Background

Importance of Species Occurrence Data GBIF Portal IABIN Portal

Data quality impacts the uses of data

Location | Taxonomic data domain Georeferencing | Identification are two major

causes of error in species occurrence data

Need to improve Data Quality (DQ)

Page 4: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Data quality & IABIN-PTN

Inter-American Biodiversity Information Network (IABIN) Pollinators Thematic Network (PTN) GEF-funded project (2006-2011) (~$180k)

11 countries in Latin America ~400,000 records

Responsibilities Development of tools for data digitization and

integration Data Digitization Training and support Reviewing proposals, reports, data

Close contact with data owners / providers

Page 5: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Data Quality & IABIN-PTN

Opportunities & needs Discuss digitization issues with the

grantees Standards: importance and role (TDWG) Data quality: concepts

Improve data quality Provide mechanisms integrated to

digitization tools versus isolated tools

Page 6: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Biodiversity Data Digitizer (BDD)

Designed for easy: Digitization Manipulation Publication

Rich data content

FAO-GEF pollinator project

Darwin Core

EOL/Plinian Core

Interaction Extension

FAO Deficit Protocol

FAO Monitoring

Protocol

MRTG Schema

Dublin Core

Demo: Thu

Page 7: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Location Data Domain

DQ Assessment MethodologyWhat is Data Quality?

Completeness Consistency Credibility Accuracy Precision

Data Domain (context)

Dimension (aspect) Problem (error patterns)

Missing value

Incorrect value

Nonatomic value

Inconsistent value

Incorrect value

Missing value

Incorrect value

Nonatomic value

Missing value

Incorrect value

Nonatomic valueInformation

contamination

Nonatomic value

Information contaminati

on

Information contaminati

onInformation contaminati

on

Page 8: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

DQ Management Methodology

How to improve the DQ?

Reducing Errors

Detection and CorrectionPrevention

Error prevention is considered superior to error detection

Page 9: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Resources to Improve DQ on BDD

Tools to prevent errors on occurrence data digitization

Integrated to BDD species occurrence data-entry interface

BDD Geo Tool prevent location data digitization errors

BDD Taxon Tool prevent taxonomic data digitization errors

Page 10: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

BDD Geo ToolStep 1 of 3 – Primary Data

Page 11: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

BDD Geo ToolStep 2 of 3 – Data Source

Page 12: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

BDD Geo ToolStep 3 of 3 – Uncertainty

Page 13: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

BDD Geo ToolLocation data form is filled

Page 14: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

BDD Geo Tool

Improved

Completeness: adds data not available before (ex. lat/long, municipality)

Consistency: consistent data obtained from a consistent source (avoiding errors like lat:0, long:0, municipality: New Orleans )

Credibility: associate data to a credible source (BioGeomancer, Google, GeoNames)

Accuracy: better than center of mass of a region

Precision: uncertainty indicator increases data fitness for use

Page 15: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

BDD Taxon ToolStep 1 of 2 – Taxonomic Name Selection

Page 16: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

BDD Taxon ToolStep 2 of 2 – Taxonomic Hierarchy Selection

Page 17: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

BDD Taxon ToolTaxonomic data form filled

Page 18: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

BDD Taxon Tool

Improved

Completeness: taxonomic hierarchy is filled from a taxon name

Consistency: consistent data are obtained from a consistent source (Catalog of Life)

Credibility: data associate to a credible source (Catalog of Life)

Accuracy: avoid spelling mistakes / entering an incorrect taxonomic hierarchy

Precision: complete scientific names suggestions

Page 19: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Conclusion

Integrated existing techniques, tools, and credible data sources to a species occurrence data-entry tool

Improved completeness, consistency, accuracy and precision of species occurrence data

Error prevention in taxonomic and location data

Tools available for an audience with little literacy on data digitization and DQ

Page 20: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Conclusion

Next steps

Other tools, techniques, dimensions and error patterns and domains of data quality in biodiversity are yet to be explored and added

Work on error correction on existing data

Spreadsheet based data correction

Suggestions and collaboration are welcome!

Page 21: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Acknowledgements

IABIN – PTN Laurie Adams (P2), Mike Ruggiero (ITIS), Mike Frame, Liz

Sellers and Ben Wheeler (USGS) Pedro Correa (University of São Paulo) All data grantees

FAO-UNEP-GEF Pollinator project in Brazil Barbara Gemmil-Herren (FAO) Ministry of the Environment - Brazil All data grantees

Page 22: Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Thank you

Allan Koch Veiga [email protected]

Etienne Americo Cartolano [email protected]

Antonio Mauro Saraiva [email protected]

Agricultural Automation Laboratory – LAAComputing Engineering Dept., Engineering School

Universidade de São Paulo, Brazil


Recommended