+ All Categories
Home > Documents > 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Date post: 25-Feb-2016
Category:
Upload: martha
View: 27 times
Download: 1 times
Share this document with a friend
Description:
3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update. Zac Adelman (UNC-IE) Shawn McClure (CSU-CIRA) Tom Moore (WGA-WRAP). Summary of Past Quarter Activities. - PowerPoint PPT Presentation
Popular Tags:
45
3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update Zac Adelman (UNC-IE) Shawn McClure (CSU-CIRA) Tom Moore (WGA-WRAP)
Transcript
Page 1: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

3SAQS Technical WorkshopOctober 31 – November 1, 2013

Data WarehouseStatus and Planning Update

Zac Adelman (UNC-IE)Shawn McClure (CSU-CIRA)Tom Moore (WGA-WRAP)

Page 2: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

• Researched and experimented with large data transfer technologies (iRODS, Globus Connect, etc.)

• Configured a large dual RAID array on the primary file server (~20TB) and designed a third RAID array to bring the total storage capacity to 50TB+

• Imported the WestJump source data files onto the primary file server and organized them into a uniform folder structure (meteorology, emissions, results)

• Created an FTP site on the primary file server for facilitating direct, basic access to the source data files

• Made available the current inventory of source data files on the new FTP site

• Began the design of the content, format, and coding protocols for submitting model results and other data to the TSDW

• Began the design of the schema and code infrastructure for the “project overview and tracking” system

• Continued to refine the database, software, and website infrastructure supporting the data warehouse

• Continued to refine various pre-processing components• XML Generator for metadata• Boundary Conditions Generator• CAMx Post Processing Utility• RDBMS data import system

• Refined the logical and physical file system design

• Refined the data verification and validation system

Summary of Past Quarter Activities

Page 3: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Operational Website Components

• User Login Form

• User Registration/Modification Form

• User Profile/Account Form

• User Feedback Form

• Dataset Request Form

• Database Query Wizard

• Raw Data Download

• Interactive Charts

• Dynamic Contour Maps

• Site Metadata Reports

• Monitoring Site Metadata Browser

• File Explorer

• FTP site

Authentication and Authorization System

Page 4: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Possible Future Website Components

• Modeled Emissions Summary Tool

• Modeled-to-Observed Data Comparison Tool

• Air Quality Summary Reports

o Visibility

o Deposition

o Ozone

o Other

• Model Data Mapping Tool

• Source Apportionment Tool

• Various Unpublished Monitoring Data Tools

• Backend Web Services and Processing Components

Page 5: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

• Conduct additional use case tests

• Finalize the large data transfer system

• Import preexisting/legacy air quality studies and results

• Commence production-level data warehouse operations (hosting, data analysis and processing, maintenance, et cetera)

• Design visualization and analysis tools for modeling results and performance evaluation

• Design the “project overview and tracking” interface for the TSDW website

Summary of Coming Activities

Page 6: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

TSDW Architecture Diagram - Overview

Standard API

Data Acquisition and Import System

Data Access Layer

Generalized Software Libraries

Air Quality-Specific Software Libraries

Web Services

RDBMS Spatial DB

TSDW Website

TSDW Software Libraries

External TSDW Interface

Source Data

TSDW Data Management

NPS BLM USFS EPA States

IRMA NRIS AQS

Users and Providers

Other Data Systems

Standard API JSON HTTP OGC XML

Data Files

Data ServicesTSDW FTP Site

Third Party Software Libraries

Page 7: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

TSDW Data Flow Diagram - Overview

Photochemical Grid ModelingCMAQ CAMx

Emissions InventoriesSource Categories· Point & Area Sources· Oil and Gas· Biogenic· Fire (anthro, natural)· etc

Meteorological Inputs

Weather ObservationsLanduse/Landcover

Initial ConditionsPhysics Options

Data Sources· State & Local Agencies· EPA· Mexico· Canada· etc

Emissions Modeling (e.g. SMOKE)

BEIS MOVES

Model-Ready Processing(e.g. reformatting, regridding)

Model-Ready Processing(e.g. reformatting, regridding)

Model-Ready Processing(e.g. reformatting, regridding)

Model Inputs

Land Use & CoverBoundary Conditions

Initial ConditionsPhotolysis Rates

DBMS-Ready Model Results

Three State Data Warehouse

File Server Database Server WebsiteWeb Services

Gridded Model Results

Air Quality Modelers

Meteorological Models(e.g. WRF, MM5)

Met Data Processing(e.g. MCIP2)

Model-Ready Input Data

Planners, Stakeholders, and Users

Products, Reports, and Analyses

Oil and Gas Permits Recommendations

Monitoring Data

3SAQSAQS

IMPROVECASTNet

Data Provider Processing

Page 8: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Definition of "Use Case": A list of steps defining the interactions between a user and a system to achieve a specific goal. The "user" can be a human or an external system, depending on context.

Scopes of Use Cases: The subset of users to which the functionality of a given use case is made available • Internal: The TSDW administration and development team• External: A subset of external users that have been granted a specific role• Public: The general public - anyone who visits the TSDW website

Potential User Roles:

• Administrators• Project Managers• Project Team Members• Stakeholders• Data Providers• Planners• Public

TSDW Use Cases

Page 9: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

Obtain and Manage Model Input Data (Scope: Internal)

1. Obtain model input data from data provider(s)2. Copy model input data files to file server3. Organize model input data on the file server

a. File and folder naming conventionb. Physical file system organization (what developers see)c. Logical file system organization (what the user sees)d. Dataset partitioning (temporal, spatial, functional, etc.)

4. Perform periodic backup of "active" model input data5. Perform periodic archival of "inactive" model input data6. Track and manage the versioning of the model input data

Page 10: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

Harvest File Metadata Using the XML Metadata Generator (Scope: Internal)

1. An administrator locates the desired root folder in the file system2. An administrator executes the XML Generator program to produce XML files

containing file metadata3. (Ideally, the above two tasks could be automatically run as a "cron" task on a regular,

periodic basis, rather than as a two-step manual process.)4. The File Indexing Utility (FIU) processes the newly-generated files to extract the

relevant file metadata5. The FIU updates the RDBMS with the file metadata6. The new file metadata is automatically reflected in the TSDW File Explorer Tool

Dependencies: · The XML File Metadata Generator program· The File Indexing Utility (FIU)· The appropriate RDBMS schema, SQL scripts, and software libraries for managing

source file metadata

Page 11: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

Download Model Input Data from TSDW, Online Method (Scope: External)

1. User logs into the TSDW website2. User fills out the Dataset Request form3. The user is redirected to the Dataset Request confirmation message/page4. The DR form is passed to the Dataset Packaging System (DPS)5. The DPS registers metadata about the request into the RDBMS6. The DPS locates the physical files that are needed to fulfill the order7. The DPS assembles, organizes, and compresses the component files into a downloadable "package"8. The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle9. The DPS registers metadata about the package (including the "PackageID") into the RDBMS10. The DPS notifies the requesting user of the package's availability11. The user logs back into the TSDW website (if necessary)12. The user initiates a session of the Dataset Transfer System (DTS) to download the files13. The DTS registers metadata about the package "receipt" into the RDBMS14. The DIS notifies the appropriate TSDW administrator(s) of the download Dependencies: · Dataset Request Form· Dataset Request confirmation message/page· Dataset Packaging System (DPS) (could be one-and-the-same with iRODS or Globus)· Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata· Appropriate RDBMS schema for associating Dataset Requests with Users and Projects· A high volume data transfer program such as iRODS or Globus Connect Server

Page 12: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

"Download" Model Input Data from TSDW, Offline Method (Scope: External)

1. User logs into the TSDW website2. User fills out the Dataset Request form3. The DR form is passed to the Dataset Packaging System (DPS)4. The DPS registers metadata about the request into the RDBMS5. The DPS locates the physical files that are needed to fulfill the order6. The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle7. The DPS registers metadata about the package (including the "PackageID") into the RDBMS8. The DPS notifies the requesting user of the order receipt and future hard drive shipment9. The DPS sends a list of the files that comprise the order to a TSDW administrator10. A TSDW administrator copies the selected files onto a hard disk drive (HDD) or drives11. A TSDW administrator mails the drive(s) to the requesting user12. A TSDW administrator records the shipment in the RDBMS

Dependencies:

· Dataset Request Form· Dataset Request confirmation message/page· Dataset Packaging System (DPS)· Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata· Appropriate RDBMS schema for associating Dataset Requests with Users and Projects· A manual process for copying data files onto hard disks and mailing them to users

Page 13: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

Download Boundary Conditions Generator (Scope: External)

1. User logs into the TSDW website2. User navigates to the Modeling Utilities section of the website3. User fills out the Boundary Conditions Generator (BCG) download form4. The BCG download form is passed to the Utility Tracking System (UTS)5. The UTS extracts information from the metadata file associated with the current BCG6. The UTS associates this metadata with the appropriate User record in the RDBMS7. The UTS redirects the user to a download link for the BCG8. The user downloads the BCG and any associated instructions and configuration files9. The DIS notifies the appropriate TSDW administrator(s) of the download

Dependencies:

· Boundary Conditions Generator (BCG) program· BCG user guide· BCG download form· BCG download confirmation message/page and installation file link· The appropriate RDBMS schema, SQL scripts, and software libraries for managing BCG

download metadata

Page 14: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

Download the CAMx Post-Processing Utility (Scope: External)

1. User logs into the TSDW website2. User navigates to the Modeling Utilities section of the website3. User fills out the CAMx Post-Processing Utility (CPPU) download form4. The CPPU download form is passed to the Utility Tracking System (UTS)5. The UTS extracts information from the metadata file associated with the current CPPU6. The UTS associates this metadata with the appropriate User record in the RDBMS7. The UTS redirects the user to a download link for the CPPU8. The user downloads the CPPU and any associated instructions and configuration files9. The DIS notifies the appropriate TSDW administrator(s) of the download

Dependencies:

· CAMx Post-Processing Utility (CPPU) program· CPPU user guide· CPPU download form· CPPU download confirmation message/page and installation file link· The appropriate RDBMS schema, SQL scripts, and software libraries for managing

CPPU download metadata

Page 15: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

Upload Model Results (Scope: External)

1. User logs into the TSDW website2. User navigates to the Modeling Results Upload section of the website3. User fills out the Modeling Results Upload form

a. User provides a standard description of the model resultsb. User provides the "Package ID" of the model input data usedc. User provides the Background Conditions Generator "Version ID", if relevantd. User provides the CAMx Post-Processing Utility "Version ID", if relevante. User selects the files to uploadf. User clicks the "Submit" button on the form

4. The Model Results Upload form is passed to the Data Import System (DIS)5. The data files are uploaded and cataloged by the DIS6. The DIS creates a unique "DatasetID" that will be linked to this upload throughout its lifecycle7. The DIS registers metadata about the upload (including the "DatasetID") into the RDBMS8. The DIS notifies the uploading user of the upload success or failure (generally, its "status")9. The DIS places the file(s) into the appropriate location(s) on the TSDW file system10. The DIS notifies the appropriate TSDW administrator(s) of the upload Dependencies: · Modeling Results Upload (MRU) form· MRU system· Appropriate RDBMS schema and SQL scripts/commands for managing MRU metadata

Page 16: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

Import Database-Ready Model Results (Scope: Internal)

1. An administrator locates the newly-imported model results (which have been generated by the CPPU and uploaded to the TSDW)

2. And administrator executes the appropriate scripts/commands using the Data Import System (DIS)

3. The DIS reads and imports the database-ready model results into the RDBMSa. The DIS verifies that all the necessary metadata is present in the RDBMSb. The DIS transforms the data into the appropriate schema for importc. The DIS maps source codes and names to internal codes and names, as neededd. The DIS imports the data from the source file(s) into the RDBMSe. The DIS makes/updates the appropriate metadata records in the RDBMS for tracking the

imported model Datasetf. The imported model results become automatically available via the relevant tools on the

TSDW website Dependencies: · The CAMx Post-Processing Utility (CPPU) for generating the database-ready model results· The Dataset Import System (DIS)· Appropriate RDBMS schema and SQL scripts/commands for managing Model Results metadata

Page 17: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

Visualize and Analyze Monitoring Data (Scope: External)

1. User logs into the TSDW website2. The user chooses an appropriate visualization and/or analysis tool to use3. Using the tool, the user specifies spatial, temporal, and other dimensional filters for

the data as well as display and formatting options4. The tool displays monitoring data in various output products, such as:

a. Data summary tablesb. Bar chartsc. Line chartsd. Pie chartse. Contour maps

Dependencies:

· An appropriate collection of monitoring data· Specific design specifications for monitoring data output products· An appropriate collection of online visualization tools and technologies

Page 18: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

Visualize and Analyze Model Results (Scope: External)

1. The user logs into the TSDW website2. The user chooses an appropriate visualization and analysis tool to use3. Using the tool, the user specifies spatial, temporal, and other dimensional filters for the data as

well as display and formatting options4. The tool displays model performance and evaluation results in various output products, such as:

a. Normalized mean error and biasb. Mean normalized error and biasc. Root mean square errord. Correlation coefficientse. Soccer plotsf. Box and whisker plotsg. Bugle plotsh. Spatial statistical plotsi. Spatial concentration plots with observation overlays

Dependencies:

· An appropriate collection of model results data· Specific design specifications for model results output products· An appropriate collection of online visualization tools and technologies

Page 19: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Use Case Description

View Project Data and Metadata (Scope: External)1. A user logs into the TSDW website2. The user navigates to the Projects and Studies section of the TSDW website3. The user views metadata associated with the projects that he/she has permission to view

a. Name, purpose, descriptionb. Contact information: project manager(s), contractors, etc.c. Associated datasets: Model input data downloaded, model results uploaded, etc.d. Analysis products: Charts, graphs, summaries, etc.

4. The user views data associated with the projects that he/she has permission to viewa. Model input data

i. Meteorological inputsii. Emissions inputsiii. Initial and Boundary Conditionsiv. Ancillary inputs (land use, land cover, photolysis)

b. Model configuration metadatac. Model results

i. Gridded resultsii. Observation-paired results

d. Monitoring data

Dependencies:

· Appropriate RDBMS schema and SQL scripts/commands for managing Project metadatao Projectso Userso Downloaded/Uploaded Datasetso Documentso Analysis products

· An online user interface for the Projects and Studies section of the TSDW website

Page 20: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

• Obtain and Manage Model Input Data (Scope: Internal)

• Harvest File Metadata Using the XML Metadata Generator (Scope: Internal)

• Download Model Input Data from TSDW, Online Method (Scope: External)

• "Download" Model Input Data from TSDW, Offline Method (Scope: External)

• Download Boundary Conditions Generator (Scope: External)

• Download the CAMx Post-Processing Utility (Scope: External)

• Upload Model Results (Scope: External)

• Import Database-Ready Model Results (Scope: Internal)

• Visualize and Analyze Monitoring Data (Scope: External)

• Visualize and Analyze Model Results (Scope: External)

• View Project Data and Metadata (Scope: External)

Use Case Summary

Page 21: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Thanks.

Page 22: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Identification, Acquisition, Pre- and Post-processing, Extraction

Storage, Backup, Restore, Security, Summarizing, Statistics

Searching, Querying, Filtering, Aggregating, Formatting, Packaging

Charting, Graphing, Mapping, Analyzing

Architecture, Design, Implementation, Management, and Operation

WGANPS CIRA

Acquisition Integration Management Distribution Presentation

Verification, Validation, QA/QC, Mapping, Flagging, Tranformation

Tools

3SDW

Results

Documents

Raw Data

Modelers

Planners

Managers

Guidance, Requirements, Feedback, Funding

Monitored

Modeled

Aerosol

Deposition

Gaseous

Emissions

Met

Air Quality

AQS, VIEWS

WestJump, future modeling, etc

Review of the 3SDW Overall System Ecosystem and Architecture

Page 23: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

User Login Form

Page 24: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

User Registration/Modification Form

Page 25: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

User Profile/Account Form

Page 26: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

User Feedback Form

Page 27: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Dataset Request Form

Page 28: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update
Page 29: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Raw Data Download (Query Wizard)

Page 30: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Time Series Charts (Query Wizard)

Page 31: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Dynamic Contour Maps (Query Wizard)

Page 32: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Site Metadata Report (Query Wizard)

Page 33: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Monitoring Site Browser

Page 34: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update
Page 35: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Modeled Emissions Summary Tool

Page 36: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Modeled-to-Obs Comparison Tool

Page 37: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Air Quality Summary Reports

Page 38: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Model Data Mapping Tool

Page 39: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Future Online Visualization Tools

Page 40: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

First TSDW Modeling Use Case Report and Results

Page 41: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

• Testers visited the TSDW website and registered with the system to create an account

• Testers visited the Data Request web page and entered their requests for the WestJump Base08b dataset

• Each request was stored in the database

• The system determined whether or not each request could be automatically filled or had to be manually assembled

• The system sent emails to the appropriate TSDW team members to notify them of the data requests

• TSDW team members assembled the dataset requests (copied the relevant data files onto hard drives)

• The datasets (hard drives) were delivered to the beta testers

• The system updated the dataset requests to reflect their “filled” status

First Use Case - Beta Test Steps

Page 42: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

• Using the delivered datasets, testers ran the models and generated results

• Testers returned the model output results to the 3SDW

• The test results were assessed by TSDW team members

• Testing outcomes were summarized for the May 3-State AQ Study Technical Workshop

• The TSDW team refines the dataset ordering, download, packaging, and delivery system according to lessons learned

• The TSDW team develops the next Use Case testing scenario(s)

First Use Case - Beta Test Steps (cont’d)

Page 43: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Summer (June – October) 2013 3SAQS Technical Work Review

Data Warehouse Activities

Page 44: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

• Implement the collaborative components of the warehouse

• Implement the ongoing news and updates section

• 3SDW on-line for NEPA air quality analysis projects by end of October

• Out-bound data delivery and in-bound data ingestion for NEPA and other air quality studies

• Data warehouse operations (hosting, data analysis and processing, maintenance, et cetera)

• Plans for storage/access/visualization for modeling results and evaluation tools

• Store UT BLM ARMS and other studies’ data in 3SDW after evaluation using protocols

Summary of Coming Activities

Page 45: 3SAQS Technical Workshop October 31 – November 1, 2013 Data Warehouse Status and Planning Update

Testing and Refinement Help

• All users, collaborators, and partners can help with testing

• Please report bugs – don’t endure them

• Use the website Feedback form

• Send direct email to team members

• Provide as much information as possible up-front

• Stay abreast of ongoing additions and updates

• Be an active part of the design process - make suggestions for features and refinements

• Don’t assume it can’t be done

• Don’t assume it can be done


Recommended