Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | bernadette-allen |
View: | 225 times |
Download: | 0 times |
GCE Data Toolbox -- metadata-based tools for automated data processing and analysis
Wade SheldonWade Sheldon
University of GeorgiaUniversity of Georgia
GCE-LTERGCE-LTER
Rationale Data processing, quality control, data analysis and Data processing, quality control, data analysis and
metadata generation traditionally carried out as metadata generation traditionally carried out as separate activities, often in different time frames separate activities, often in different time frames using different technologiesusing different technologies
Problems:Problems: Metadata may not reflect all processing stepsMetadata may not reflect all processing steps Much routine data analysis done w/o Q/C, metadataMuch routine data analysis done w/o Q/C, metadata No economy of scale – leads to “one-off” solutionsNo economy of scale – leads to “one-off” solutions
Metadata generation should ideally occur Metadata generation should ideally occur throughout the data cycle and “inform” data throughout the data cycle and “inform” data analysisanalysis
Design Goals Develop Integrated Storage StandardDevelop Integrated Storage Standard
Tabular DataTabular Data QA/QC InformationQA/QC Information Metadata (overall data set & columns/attributes)Metadata (overall data set & columns/attributes)
Develop Software to Support StandardDevelop Software to Support Standard Code Library/APICode Library/API User InterfacesUser Interfaces
Apply Technology to Acquire, Manage, Apply Technology to Acquire, Manage, Distribute GCE-LTER DataDistribute GCE-LTER Data
Explore Use as Prototype Technology for Explore Use as Prototype Technology for Metadata-based Data Processing, SynthesisMetadata-based Data Processing, Synthesis
Storage Standard Developed Using MATLABDeveloped Using MATLAB®®
Local expertise, large scientific user baseLocal expertise, large scientific user base Cross-platform (Win32, Solaris, *nix, Mac OS/x)Cross-platform (Win32, Solaris, *nix, Mac OS/x) Rapid development environmentRapid development environment Supports multiple interfaces (interactive command line, batch-Supports multiple interfaces (interactive command line, batch-
mode scripts, GUI, WWW)mode scripts, GUI, WWW) Good interoperability with other technologies (Java, PERL, SQL)Good interoperability with other technologies (Java, PERL, SQL)
Defined “GCE Data Structure” Spec. (based on Defined “GCE Data Structure” Spec. (based on MATLAB/C structures)MATLAB/C structures) Structure with 17 named fieldsStructure with 17 named fields Specific content rules for each field (software validation)Specific content rules for each field (software validation) Combines data, metadata, QA/QC, processing historyCombines data, metadata, QA/QC, processing history
Storage Standard
Category Field DescriptionStructure Info title Title of the Overall Data Set
version List of Toolbox Versions Used
datafile List of Data Files Processed
createdate Date of Creation
editdate Date of Last Edit
history Processing History
Metadata metadata General Metadata (parseable array)
name Column Names
description Column Descriptions
units Column Units
datatype Physical Data Types (Storage types)
variabletype Logical Data Types (Variable types)
numbertype Numerical Types
precision Decimal Places to Display
Data Table values Table of Data Values (numerical, text)
QA/QC Info criteria QA/QC Criteria
flags QA/QC Flags Assigned
GCE Data Structure Specification (v1.1)
Software – GCE Data Toolbox Core Function LibraryCore Function Library
Create, Validate StructuresCreate, Validate Structures Import Data, Metadata (ASCII, MATLAB, SQL)Import Data, Metadata (ASCII, MATLAB, SQL) Manipulate Data, Metadata (unit conversions, add/delete/update)Manipulate Data, Metadata (unit conversions, add/delete/update) Export Data, Metadata (various formats)Export Data, Metadata (various formats) Dynamic, Rule-base QA/QC FlaggingDynamic, Rule-base QA/QC Flagging
Self-documenting ProcessingSelf-documenting Processing Operation Logging (Processing History)Operation Logging (Processing History) Transparent Metadata Creation/UpdatingTransparent Metadata Creation/Updating Dynamic (JIT) Metadata Generation for ColumnsDynamic (JIT) Metadata Generation for Columns
Support for Metadata “Templating”Support for Metadata “Templating” Application of Boilerplate Metadata based on Parameter MatchingApplication of Boilerplate Metadata based on Parameter Matching Supports Rapid Documentation of Routine Data SourcesSupports Rapid Documentation of Routine Data Sources
Software – GCE Data Toolbox Support for AnalysisSupport for Analysis
Descriptive Statistics, ReportsDescriptive Statistics, Reports Visualization, MappingVisualization, Mapping
Support for SynthesisSupport for Synthesis Composite Data Set CreationComposite Data Set Creation
Multiple Data Set Merge/ConcatenationMultiple Data Set Merge/Concatenation Relational JoinRelational Join Metadata Content MeshingMetadata Content Meshing
Data Set SummarizationData Set Summarization Statistical Data Reduction/Re-samplingStatistical Data Reduction/Re-sampling
Data Set StandardizationData Set Standardization Unit Conversions (automatic, interactive)Unit Conversions (automatic, interactive) Template-based Semantic MappingTemplate-based Semantic Mapping Automatic Semantic Mediation (Automatic Semantic Mediation (prototype stageprototype stage))
Software – User Interfaces Unattended Batch Mode ProcessingUnattended Batch Mode Processing Interactive Command Line Processing Interactive Command Line Processing
(conventional MATLAB UI)(conventional MATLAB UI) Full help text for each functionFull help text for each function Well-defined input/output argumentsWell-defined input/output arguments
GUI ApplicationsGUI Applications Standard Forms, Dialogs, ControlsStandard Forms, Dialogs, Controls No MATLAB Experience RequiredNo MATLAB Experience Required
WWW – MATLAB Web ServerWWW – MATLAB Web Server HTML Forms, Querystring InputHTML Forms, Querystring Input HTML Pages and/or Static File OutputHTML Pages and/or Static File Output
Command-Line Interface
GUI Applications
WWW Interface
Current Applications Automated Data ProcessingAutomated Data Processing
Direct data import from data logger files, WWW data Direct data import from data logger files, WWW data sources (USGS), SQL queriessources (USGS), SQL queries
Automatic metadata creation (templates, data mining)Automatic metadata creation (templates, data mining) Rule-based QA/QC flaggingRule-based QA/QC flagging
Data Set PackagingData Set Packaging Batch processing to create/update data, metadata Batch processing to create/update data, metadata
productsproducts On-demand generation of data, metadata, stat reports in On-demand generation of data, metadata, stat reports in
custom formats (end-user scripts, GUI applications, custom formats (end-user scripts, GUI applications, WWW forms)WWW forms)
Current Applications
Data Exploration/Analysis by PIsData Exploration/Analysis by PIs Descriptive Statistics based on attribute metadataDescriptive Statistics based on attribute metadata Visualization with Interactive Filtering (Visualization with Interactive Filtering (Frequency Frequency
Histograms, 2D Plots, Map Plots)Histograms, 2D Plots, Map Plots)
Data Reduction/Re-sampling to Provide Data Reduction/Re-sampling to Provide Customized Data at Various “Scales”Customized Data at Various “Scales” Aggregated StatisticsAggregated Statistics Binned StatisticsBinned Statistics Query/Filtering (sub-selection)Query/Filtering (sub-selection)
Current Applications
Data Harvesting (GCE)Data Harvesting (GCE) USGS Data (WWW real-time, daily, finalized data)USGS Data (WWW real-time, daily, finalized data) Campbell Scientific Data Arrays (post-processing Campbell Scientific Data Arrays (post-processing
triggered after LoggerNet Retrieval)triggered after LoggerNet Retrieval) Sea-Bird Hydrographic DataSea-Bird Hydrographic Data
USGS Data Harvesting Service for USGS Data Harvesting Service for HydroDBHydroDB Weekly harvest for 31 stations/7 LTER SitesWeekly harvest for 31 stations/7 LTER Sites Automatic Resampling, Unit Conversions, Q/CAutomatic Resampling, Unit Conversions, Q/C
Availability
Description, Screen-shots, Fully-functional Description, Screen-shots, Fully-functional Toolbox Available on WWW:Toolbox Available on WWW:http://gce-lter.marsci.uga.edu/lter/research/tools/data_toolbox.htmhttp://gce-lter.marsci.uga.edu/lter/research/tools/data_toolbox.htm
Requires MATLAB 5.3, 6.0, 6.5 (any platform)Requires MATLAB 5.3, 6.0, 6.5 (any platform) ““Public” Version CompiledPublic” Version Compiled Source Code Requests Considered on Case-by-Source Code Requests Considered on Case-by-
Case BasisCase Basis
Future Development Plans
EML 2.0 SupportEML 2.0 Support Metadata-mediated Data Set IntegrationMetadata-mediated Data Set Integration
Unit conversionsUnit conversions Re-samplingRe-sampling
More WWW Interface DevelopmentMore WWW Interface Development