Date post: | 29-Jan-2016 |
Category: |
Documents |
Upload: | hortense-hart |
View: | 214 times |
Download: | 0 times |
Long Term Ecological ResearchNetwork Office
Trends ProjectSpaghetti & Linguine
(aka Trends Data Store)
Mark [email protected]
14 September 2006
LNO NIS
Table of Contents
• Background• System Architecture• System Workflow and Architecture Details• Demonstration Screen Examples
LNO NIS
Message from IMExec - Feb 2006
• “IMExec suggests that this activity be used to scope and determine the feasibility of using EML in the development of NIS modules for solving general synthesis problems.”
• “The premise of this project is that EML will adequately describe the data set (e.g., entities, attributes, physical characteristics) to allow the capture of distributed data sets into a central SQL database.”
• “Determining the nature of this model for dynamic data delivery – whether it is more site-loaded or more (network) service-loaded – is critical.”
• “IMExec suggests that the near-term Trends NIS module activity be focused on development of a prototype for demonstration at the ASM in September.”
LNO NIS
Prerequisites
• Site data is documented with “rich” and “complete” EML
• Time-series data must be captured as “snap shots” for EML temporal coverage – i.e., no “continuous end date”
• Site data is open and accessible through a standard protocol such as HTTP
• Site EML documents are harvested on a regular basis into the LTER Metacat
LNO NIS
What is EML?
Ecological Metadata Language is…• An ecological metadata standard• Very extensible; it can be used to describe many
different types of data• Comprehensive and supports a rich set of
constructs to fully describe data including– how to access distributed data– its logical and physical structure
• Defined by an XML Schema
• For further information:– http://knb.ecoinformatics.org/software/eml/
LNO NIS
What is Metacat?
Metacat is…• A storage system for metadata and data
(optimized for use with EML)• Built on top of relational database system using
Java servlets• Requires metadata to be in XML format• Provides a customizable web interface• Support point-to-point replication
• For further information:– http://knb.ecoinformatics.org/software/metacat/
LNO NIS
Trends Data Store Architecture
SourceA
SourceB
SourceC
EML
DatasetRegistry
1 ̊� f(x) 2 ̊�
HTML
SOAP
EMLFactory
- Derived Metadata- Source Provenance- Integration Methods- Trends Contact
EML ̊Parser/Loader
Metacat/Harvester
EML.xml
TrendsMetadata
PrimaryDatabase
(source ̊data)
SecondaryDatabase
(derived ̊data)
Data ̊Integration/
Transformation
Trends Data Warehouse
Store
Front
LNO NIS
Generalized Workflow
1. Sites collect and document time-series data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset
into primary database5. Data integration/transformation converts “raw”
data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored
in Metacat8. Derived data is made available to store front
LNO NIS
Decomposed Workflow
1. Sites collect and document time-series data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset
into primary database5. Data integration/transformation converts “raw”
data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored
in Metacat8. Derived data is made available to store front
LNO NIS
LTER Site Data Collection
• Time-series data– Physical environment
(e.g., climate, …)– Human population and
economy– Biogeochemistry– Biotic structure
• Data/metadata– Relational Database– Spreadsheet– Text file– HTML/XML
LNO NIS
Generalized Workflow
1. Sites collect and document time-series data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset
into primary database5. Data integration/transformation converts “raw”
data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored
in Metacat8. Derived data is made available to store front
LNO NIS
EML, Metacat, and the Harvester
• EML Package IDknb-lter-site.XX.YYknb-lter-sev.354.1knb-lter-sev.354.2knb-lter-sev.354.3
• Metacat stores the XML of EML; new revisions take precedence – old revisions are deprecated, but not deleted
• Harvester is a time-based update process to “pull” site EML and inserts into Metacat
SourceA
SourceB
SourceC
EML
Metacat/Harvester
“independent of the Trends
Project”
LNO NIS
Generalized Workflow
1. Sites collect and document time-series data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset
into primary database5. Data integration/transformation converts “raw”
data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored
in Metacat8. Derived data is made available to store front
LNO NIS
EML Loader/Parser
• Dataset registry identifies Trends data in Metacat
• New revisions assert a “new” data load. The EML parser/loader– Translates the site EML
into the RDBMS DDL– Creates a new DB table
in the primary database based on the revision
– Loads the new data into the primary database
– Trigger to continue workflow
SourceA
SourceB
SourceC
EML
DatasetRegistry
1 ̊�EML ̊Parser/Loader
Metacat/Harvester
LNO NIS
Generalized Workflow
1. Sites collect and document time-series data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset
into primary database5. Data integration/transformation converts “raw”
data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored
in Metacat8. Derived data is made available to store front
LNO NIS
Data Transformation
• Primary DB (1°) stores site data in native schema• Transformation module reads native schema,
performs transformation/integration, and writes to global schema
• Secondary DB (2°) stores derived data in consistent global schema
1 ̊� f(x) 2 ̊�
MCM Canada Glacier Wind
date_time Timestamp of observation 15 min interval
wdir Wind direction (azimuth)
wdirstd Standard deviation of wind direction
wspd Wind speed meters/second
wspdmax Maximum wind speed meters/second
wpsdmin Minimum wind speed meters/second
Wind direction (knb-eco-trends.1.1)
Timestamp (daily)
value
Wind direction std dev (knb-eco-trends.2.1)
Timestamp (daily) value
Wind speed max (knb-eco-trends.5.1)
Timestamp (daily)
value
…
“triggered bydata load”
LNO NIS
Global Schema
knb_eco_trends_1_1scope
identifier
revision
LNO NIS
Generalized Workflow
1. Sites collect and document time-series data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset
into primary database5. Data integration/transformation converts “raw”
data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored
in Metacat8. Derived data is made available to store front
LNO NIS
EML for the “derived”
• EML Factory generates EML metadata for the derived data and inserts into Metacat
• Derived data is now accessible through the Metacat user interface
EML
2 ̊�
EMLFactory
- Derived Metadata- Source Provenance- Integration Methods- Trends Contact
Metacat/Harvester
EML.xml
TrendsMetadata
LNO NIS
Generalized Workflow
1. Sites collect and document time-series data (e.g., climate, social-economics, …)
2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset
into primary database5. Data integration/transformation converts “raw”
data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored
in Metacat8. Derived data is made available to store front
LNO NIS
Store Front
• Store Front provides API to derived data products in secondary DB
• HTML – today• Web service –
tomorrow• Issues:
– Authentication– Authorization– Provenance– Quality– Interactive Plots
2 ̊�
HTML
SOAP
Store
Front
http://fire.lternet.edu/Trends(beta site location)
LNO NIS
HTML Store Front(evolution in progress)
LNO NIS
Animated Workflow
SourceA
SourceB
SourceC
EML
DatasetRegistry
1 ̊� f(x) 2 ̊�
HTML
SOAP
EMLFactory
- Derived Metadata- Source Provenance- Integration Methods- Trends Contact
EML ̊Parser/Loader
Metacat/Harvester
EML.xml
TrendsMetadata
Store
Front
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
LNO NIS
Thank You – The End