+ All Categories
Home > Documents > GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon...

GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon...

Date post: 20-Jan-2016
Category:
Upload: bernadette-allen
View: 225 times
Download: 0 times
Share this document with a friend
Popular Tags:
16
GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon Wade Sheldon University of Georgia University of Georgia GCE-LTER GCE-LTER
Transcript
Page 1: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

GCE Data Toolbox -- metadata-based tools for automated data processing and analysis

Wade SheldonWade Sheldon

University of GeorgiaUniversity of Georgia

GCE-LTERGCE-LTER

Page 2: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Rationale Data processing, quality control, data analysis and Data processing, quality control, data analysis and

metadata generation traditionally carried out as metadata generation traditionally carried out as separate activities, often in different time frames separate activities, often in different time frames using different technologiesusing different technologies

Problems:Problems: Metadata may not reflect all processing stepsMetadata may not reflect all processing steps Much routine data analysis done w/o Q/C, metadataMuch routine data analysis done w/o Q/C, metadata No economy of scale – leads to “one-off” solutionsNo economy of scale – leads to “one-off” solutions

Metadata generation should ideally occur Metadata generation should ideally occur throughout the data cycle and “inform” data throughout the data cycle and “inform” data analysisanalysis

Page 3: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Design Goals Develop Integrated Storage StandardDevelop Integrated Storage Standard

Tabular DataTabular Data QA/QC InformationQA/QC Information Metadata (overall data set & columns/attributes)Metadata (overall data set & columns/attributes)

Develop Software to Support StandardDevelop Software to Support Standard Code Library/APICode Library/API User InterfacesUser Interfaces

Apply Technology to Acquire, Manage, Apply Technology to Acquire, Manage, Distribute GCE-LTER DataDistribute GCE-LTER Data

Explore Use as Prototype Technology for Explore Use as Prototype Technology for Metadata-based Data Processing, SynthesisMetadata-based Data Processing, Synthesis

Page 4: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Storage Standard Developed Using MATLABDeveloped Using MATLAB®®

Local expertise, large scientific user baseLocal expertise, large scientific user base Cross-platform (Win32, Solaris, *nix, Mac OS/x)Cross-platform (Win32, Solaris, *nix, Mac OS/x) Rapid development environmentRapid development environment Supports multiple interfaces (interactive command line, batch-Supports multiple interfaces (interactive command line, batch-

mode scripts, GUI, WWW)mode scripts, GUI, WWW) Good interoperability with other technologies (Java, PERL, SQL)Good interoperability with other technologies (Java, PERL, SQL)

Defined “GCE Data Structure” Spec. (based on Defined “GCE Data Structure” Spec. (based on MATLAB/C structures)MATLAB/C structures) Structure with 17 named fieldsStructure with 17 named fields Specific content rules for each field (software validation)Specific content rules for each field (software validation) Combines data, metadata, QA/QC, processing historyCombines data, metadata, QA/QC, processing history

Page 5: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Storage Standard

Category Field DescriptionStructure Info title Title of the Overall Data Set

version List of Toolbox Versions Used

datafile List of Data Files Processed

createdate Date of Creation

editdate Date of Last Edit

history Processing History

Metadata metadata General Metadata (parseable array)

name Column Names

description Column Descriptions

units Column Units

datatype Physical Data Types (Storage types)

variabletype Logical Data Types (Variable types)

numbertype Numerical Types

precision Decimal Places to Display

Data Table values Table of Data Values (numerical, text)

QA/QC Info criteria QA/QC Criteria

flags QA/QC Flags Assigned

GCE Data Structure Specification (v1.1)

Page 6: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Software – GCE Data Toolbox Core Function LibraryCore Function Library

Create, Validate StructuresCreate, Validate Structures Import Data, Metadata (ASCII, MATLAB, SQL)Import Data, Metadata (ASCII, MATLAB, SQL) Manipulate Data, Metadata (unit conversions, add/delete/update)Manipulate Data, Metadata (unit conversions, add/delete/update) Export Data, Metadata (various formats)Export Data, Metadata (various formats) Dynamic, Rule-base QA/QC FlaggingDynamic, Rule-base QA/QC Flagging

Self-documenting ProcessingSelf-documenting Processing Operation Logging (Processing History)Operation Logging (Processing History) Transparent Metadata Creation/UpdatingTransparent Metadata Creation/Updating Dynamic (JIT) Metadata Generation for ColumnsDynamic (JIT) Metadata Generation for Columns

Support for Metadata “Templating”Support for Metadata “Templating” Application of Boilerplate Metadata based on Parameter MatchingApplication of Boilerplate Metadata based on Parameter Matching Supports Rapid Documentation of Routine Data SourcesSupports Rapid Documentation of Routine Data Sources

Page 7: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Software – GCE Data Toolbox Support for AnalysisSupport for Analysis

Descriptive Statistics, ReportsDescriptive Statistics, Reports Visualization, MappingVisualization, Mapping

Support for SynthesisSupport for Synthesis Composite Data Set CreationComposite Data Set Creation

Multiple Data Set Merge/ConcatenationMultiple Data Set Merge/Concatenation Relational JoinRelational Join Metadata Content MeshingMetadata Content Meshing

Data Set SummarizationData Set Summarization Statistical Data Reduction/Re-samplingStatistical Data Reduction/Re-sampling

Data Set StandardizationData Set Standardization Unit Conversions (automatic, interactive)Unit Conversions (automatic, interactive) Template-based Semantic MappingTemplate-based Semantic Mapping Automatic Semantic Mediation (Automatic Semantic Mediation (prototype stageprototype stage))

Page 8: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Software – User Interfaces Unattended Batch Mode ProcessingUnattended Batch Mode Processing Interactive Command Line Processing Interactive Command Line Processing

(conventional MATLAB UI)(conventional MATLAB UI) Full help text for each functionFull help text for each function Well-defined input/output argumentsWell-defined input/output arguments

GUI ApplicationsGUI Applications Standard Forms, Dialogs, ControlsStandard Forms, Dialogs, Controls No MATLAB Experience RequiredNo MATLAB Experience Required

WWW – MATLAB Web ServerWWW – MATLAB Web Server HTML Forms, Querystring InputHTML Forms, Querystring Input HTML Pages and/or Static File OutputHTML Pages and/or Static File Output

Page 9: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Command-Line Interface

Page 10: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

GUI Applications

Page 11: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

WWW Interface

Page 12: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Current Applications Automated Data ProcessingAutomated Data Processing

Direct data import from data logger files, WWW data Direct data import from data logger files, WWW data sources (USGS), SQL queriessources (USGS), SQL queries

Automatic metadata creation (templates, data mining)Automatic metadata creation (templates, data mining) Rule-based QA/QC flaggingRule-based QA/QC flagging

Data Set PackagingData Set Packaging Batch processing to create/update data, metadata Batch processing to create/update data, metadata

productsproducts On-demand generation of data, metadata, stat reports in On-demand generation of data, metadata, stat reports in

custom formats (end-user scripts, GUI applications, custom formats (end-user scripts, GUI applications, WWW forms)WWW forms)

Page 13: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Current Applications

Data Exploration/Analysis by PIsData Exploration/Analysis by PIs Descriptive Statistics based on attribute metadataDescriptive Statistics based on attribute metadata Visualization with Interactive Filtering (Visualization with Interactive Filtering (Frequency Frequency

Histograms, 2D Plots, Map Plots)Histograms, 2D Plots, Map Plots)

Data Reduction/Re-sampling to Provide Data Reduction/Re-sampling to Provide Customized Data at Various “Scales”Customized Data at Various “Scales” Aggregated StatisticsAggregated Statistics Binned StatisticsBinned Statistics Query/Filtering (sub-selection)Query/Filtering (sub-selection)

Page 14: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Current Applications

Data Harvesting (GCE)Data Harvesting (GCE) USGS Data (WWW real-time, daily, finalized data)USGS Data (WWW real-time, daily, finalized data) Campbell Scientific Data Arrays (post-processing Campbell Scientific Data Arrays (post-processing

triggered after LoggerNet Retrieval)triggered after LoggerNet Retrieval) Sea-Bird Hydrographic DataSea-Bird Hydrographic Data

USGS Data Harvesting Service for USGS Data Harvesting Service for HydroDBHydroDB Weekly harvest for 31 stations/7 LTER SitesWeekly harvest for 31 stations/7 LTER Sites Automatic Resampling, Unit Conversions, Q/CAutomatic Resampling, Unit Conversions, Q/C

Page 15: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Availability

Description, Screen-shots, Fully-functional Description, Screen-shots, Fully-functional Toolbox Available on WWW:Toolbox Available on WWW:http://gce-lter.marsci.uga.edu/lter/research/tools/data_toolbox.htmhttp://gce-lter.marsci.uga.edu/lter/research/tools/data_toolbox.htm

Requires MATLAB 5.3, 6.0, 6.5 (any platform)Requires MATLAB 5.3, 6.0, 6.5 (any platform) ““Public” Version CompiledPublic” Version Compiled Source Code Requests Considered on Case-by-Source Code Requests Considered on Case-by-

Case BasisCase Basis

Page 16: GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.

Future Development Plans

EML 2.0 SupportEML 2.0 Support Metadata-mediated Data Set IntegrationMetadata-mediated Data Set Integration

Unit conversionsUnit conversions Re-samplingRe-sampling

More WWW Interface DevelopmentMore WWW Interface Development


Recommended