3SAQS Technical WorkshopOctober 31 – November 1, 2013
Data WarehouseStatus and Planning Update
Zac Adelman (UNC-IE)Shawn McClure (CSU-CIRA)Tom Moore (WGA-WRAP)
• Researched and experimented with large data transfer technologies (iRODS, Globus Connect, etc.)
• Configured a large dual RAID array on the primary file server (~20TB) and designed a third RAID array to bring the total storage capacity to 50TB+
• Imported the WestJump source data files onto the primary file server and organized them into a uniform folder structure (meteorology, emissions, results)
• Created an FTP site on the primary file server for facilitating direct, basic access to the source data files
• Made available the current inventory of source data files on the new FTP site
• Began the design of the content, format, and coding protocols for submitting model results and other data to the TSDW
• Began the design of the schema and code infrastructure for the “project overview and tracking” system
• Continued to refine the database, software, and website infrastructure supporting the data warehouse
• Continued to refine various pre-processing components• XML Generator for metadata• Boundary Conditions Generator• CAMx Post Processing Utility• RDBMS data import system
• Refined the logical and physical file system design
• Refined the data verification and validation system
Summary of Past Quarter Activities
Operational Website Components
• User Login Form
• User Registration/Modification Form
• User Profile/Account Form
• User Feedback Form
• Dataset Request Form
• Database Query Wizard
• Raw Data Download
• Interactive Charts
• Dynamic Contour Maps
• Site Metadata Reports
• Monitoring Site Metadata Browser
• File Explorer
• FTP site
Authentication and Authorization System
Possible Future Website Components
• Modeled Emissions Summary Tool
• Modeled-to-Observed Data Comparison Tool
• Air Quality Summary Reports
o Visibility
o Deposition
o Ozone
o Other
• Model Data Mapping Tool
• Source Apportionment Tool
• Various Unpublished Monitoring Data Tools
• Backend Web Services and Processing Components
• Conduct additional use case tests
• Finalize the large data transfer system
• Import preexisting/legacy air quality studies and results
• Commence production-level data warehouse operations (hosting, data analysis and processing, maintenance, et cetera)
• Design visualization and analysis tools for modeling results and performance evaluation
• Design the “project overview and tracking” interface for the TSDW website
Summary of Coming Activities
TSDW Architecture Diagram - Overview
Standard API
Data Acquisition and Import System
Data Access Layer
Generalized Software Libraries
Air Quality-Specific Software Libraries
Web Services
RDBMS Spatial DB
TSDW Website
TSDW Software Libraries
External TSDW Interface
Source Data
TSDW Data Management
NPS BLM USFS EPA States
IRMA NRIS AQS
Users and Providers
Other Data Systems
Standard API JSON HTTP OGC XML
Data Files
Data ServicesTSDW FTP Site
Third Party Software Libraries
TSDW Data Flow Diagram - Overview
Photochemical Grid ModelingCMAQ CAMx
Emissions InventoriesSource Categories· Point & Area Sources· Oil and Gas· Biogenic· Fire (anthro, natural)· etc
Meteorological Inputs
Weather ObservationsLanduse/Landcover
Initial ConditionsPhysics Options
Data Sources· State & Local Agencies· EPA· Mexico· Canada· etc
Emissions Modeling (e.g. SMOKE)
BEIS MOVES
Model-Ready Processing(e.g. reformatting, regridding)
Model-Ready Processing(e.g. reformatting, regridding)
Model-Ready Processing(e.g. reformatting, regridding)
Model Inputs
Land Use & CoverBoundary Conditions
Initial ConditionsPhotolysis Rates
DBMS-Ready Model Results
Three State Data Warehouse
File Server Database Server WebsiteWeb Services
Gridded Model Results
Air Quality Modelers
Meteorological Models(e.g. WRF, MM5)
Met Data Processing(e.g. MCIP2)
Model-Ready Input Data
Planners, Stakeholders, and Users
Products, Reports, and Analyses
Oil and Gas Permits Recommendations
Monitoring Data
3SAQSAQS
IMPROVECASTNet
Data Provider Processing
Definition of "Use Case": A list of steps defining the interactions between a user and a system to achieve a specific goal. The "user" can be a human or an external system, depending on context.
Scopes of Use Cases: The subset of users to which the functionality of a given use case is made available • Internal: The TSDW administration and development team• External: A subset of external users that have been granted a specific role• Public: The general public - anyone who visits the TSDW website
Potential User Roles:
• Administrators• Project Managers• Project Team Members• Stakeholders• Data Providers• Planners• Public
TSDW Use Cases
Use Case Description
Obtain and Manage Model Input Data (Scope: Internal)
1. Obtain model input data from data provider(s)2. Copy model input data files to file server3. Organize model input data on the file server
a. File and folder naming conventionb. Physical file system organization (what developers see)c. Logical file system organization (what the user sees)d. Dataset partitioning (temporal, spatial, functional, etc.)
4. Perform periodic backup of "active" model input data5. Perform periodic archival of "inactive" model input data6. Track and manage the versioning of the model input data
Use Case Description
Harvest File Metadata Using the XML Metadata Generator (Scope: Internal)
1. An administrator locates the desired root folder in the file system2. An administrator executes the XML Generator program to produce XML files
containing file metadata3. (Ideally, the above two tasks could be automatically run as a "cron" task on a regular,
periodic basis, rather than as a two-step manual process.)4. The File Indexing Utility (FIU) processes the newly-generated files to extract the
relevant file metadata5. The FIU updates the RDBMS with the file metadata6. The new file metadata is automatically reflected in the TSDW File Explorer Tool
Dependencies: · The XML File Metadata Generator program· The File Indexing Utility (FIU)· The appropriate RDBMS schema, SQL scripts, and software libraries for managing
source file metadata
Use Case Description
Download Model Input Data from TSDW, Online Method (Scope: External)
1. User logs into the TSDW website2. User fills out the Dataset Request form3. The user is redirected to the Dataset Request confirmation message/page4. The DR form is passed to the Dataset Packaging System (DPS)5. The DPS registers metadata about the request into the RDBMS6. The DPS locates the physical files that are needed to fulfill the order7. The DPS assembles, organizes, and compresses the component files into a downloadable "package"8. The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle9. The DPS registers metadata about the package (including the "PackageID") into the RDBMS10. The DPS notifies the requesting user of the package's availability11. The user logs back into the TSDW website (if necessary)12. The user initiates a session of the Dataset Transfer System (DTS) to download the files13. The DTS registers metadata about the package "receipt" into the RDBMS14. The DIS notifies the appropriate TSDW administrator(s) of the download Dependencies: · Dataset Request Form· Dataset Request confirmation message/page· Dataset Packaging System (DPS) (could be one-and-the-same with iRODS or Globus)· Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata· Appropriate RDBMS schema for associating Dataset Requests with Users and Projects· A high volume data transfer program such as iRODS or Globus Connect Server
Use Case Description
"Download" Model Input Data from TSDW, Offline Method (Scope: External)
1. User logs into the TSDW website2. User fills out the Dataset Request form3. The DR form is passed to the Dataset Packaging System (DPS)4. The DPS registers metadata about the request into the RDBMS5. The DPS locates the physical files that are needed to fulfill the order6. The DPS creates a unique "PackageID" that will be linked with this package throughout its lifecycle7. The DPS registers metadata about the package (including the "PackageID") into the RDBMS8. The DPS notifies the requesting user of the order receipt and future hard drive shipment9. The DPS sends a list of the files that comprise the order to a TSDW administrator10. A TSDW administrator copies the selected files onto a hard disk drive (HDD) or drives11. A TSDW administrator mails the drive(s) to the requesting user12. A TSDW administrator records the shipment in the RDBMS
Dependencies:
· Dataset Request Form· Dataset Request confirmation message/page· Dataset Packaging System (DPS)· Appropriate RDBMS schema and SQL scripts/commands for managing Dataset Request metadata· Appropriate RDBMS schema for associating Dataset Requests with Users and Projects· A manual process for copying data files onto hard disks and mailing them to users
Use Case Description
Download Boundary Conditions Generator (Scope: External)
1. User logs into the TSDW website2. User navigates to the Modeling Utilities section of the website3. User fills out the Boundary Conditions Generator (BCG) download form4. The BCG download form is passed to the Utility Tracking System (UTS)5. The UTS extracts information from the metadata file associated with the current BCG6. The UTS associates this metadata with the appropriate User record in the RDBMS7. The UTS redirects the user to a download link for the BCG8. The user downloads the BCG and any associated instructions and configuration files9. The DIS notifies the appropriate TSDW administrator(s) of the download
Dependencies:
· Boundary Conditions Generator (BCG) program· BCG user guide· BCG download form· BCG download confirmation message/page and installation file link· The appropriate RDBMS schema, SQL scripts, and software libraries for managing BCG
download metadata
Use Case Description
Download the CAMx Post-Processing Utility (Scope: External)
1. User logs into the TSDW website2. User navigates to the Modeling Utilities section of the website3. User fills out the CAMx Post-Processing Utility (CPPU) download form4. The CPPU download form is passed to the Utility Tracking System (UTS)5. The UTS extracts information from the metadata file associated with the current CPPU6. The UTS associates this metadata with the appropriate User record in the RDBMS7. The UTS redirects the user to a download link for the CPPU8. The user downloads the CPPU and any associated instructions and configuration files9. The DIS notifies the appropriate TSDW administrator(s) of the download
Dependencies:
· CAMx Post-Processing Utility (CPPU) program· CPPU user guide· CPPU download form· CPPU download confirmation message/page and installation file link· The appropriate RDBMS schema, SQL scripts, and software libraries for managing
CPPU download metadata
Use Case Description
Upload Model Results (Scope: External)
1. User logs into the TSDW website2. User navigates to the Modeling Results Upload section of the website3. User fills out the Modeling Results Upload form
a. User provides a standard description of the model resultsb. User provides the "Package ID" of the model input data usedc. User provides the Background Conditions Generator "Version ID", if relevantd. User provides the CAMx Post-Processing Utility "Version ID", if relevante. User selects the files to uploadf. User clicks the "Submit" button on the form
4. The Model Results Upload form is passed to the Data Import System (DIS)5. The data files are uploaded and cataloged by the DIS6. The DIS creates a unique "DatasetID" that will be linked to this upload throughout its lifecycle7. The DIS registers metadata about the upload (including the "DatasetID") into the RDBMS8. The DIS notifies the uploading user of the upload success or failure (generally, its "status")9. The DIS places the file(s) into the appropriate location(s) on the TSDW file system10. The DIS notifies the appropriate TSDW administrator(s) of the upload Dependencies: · Modeling Results Upload (MRU) form· MRU system· Appropriate RDBMS schema and SQL scripts/commands for managing MRU metadata
Use Case Description
Import Database-Ready Model Results (Scope: Internal)
1. An administrator locates the newly-imported model results (which have been generated by the CPPU and uploaded to the TSDW)
2. And administrator executes the appropriate scripts/commands using the Data Import System (DIS)
3. The DIS reads and imports the database-ready model results into the RDBMSa. The DIS verifies that all the necessary metadata is present in the RDBMSb. The DIS transforms the data into the appropriate schema for importc. The DIS maps source codes and names to internal codes and names, as neededd. The DIS imports the data from the source file(s) into the RDBMSe. The DIS makes/updates the appropriate metadata records in the RDBMS for tracking the
imported model Datasetf. The imported model results become automatically available via the relevant tools on the
TSDW website Dependencies: · The CAMx Post-Processing Utility (CPPU) for generating the database-ready model results· The Dataset Import System (DIS)· Appropriate RDBMS schema and SQL scripts/commands for managing Model Results metadata
Use Case Description
Visualize and Analyze Monitoring Data (Scope: External)
1. User logs into the TSDW website2. The user chooses an appropriate visualization and/or analysis tool to use3. Using the tool, the user specifies spatial, temporal, and other dimensional filters for
the data as well as display and formatting options4. The tool displays monitoring data in various output products, such as:
a. Data summary tablesb. Bar chartsc. Line chartsd. Pie chartse. Contour maps
Dependencies:
· An appropriate collection of monitoring data· Specific design specifications for monitoring data output products· An appropriate collection of online visualization tools and technologies
Use Case Description
Visualize and Analyze Model Results (Scope: External)
1. The user logs into the TSDW website2. The user chooses an appropriate visualization and analysis tool to use3. Using the tool, the user specifies spatial, temporal, and other dimensional filters for the data as
well as display and formatting options4. The tool displays model performance and evaluation results in various output products, such as:
a. Normalized mean error and biasb. Mean normalized error and biasc. Root mean square errord. Correlation coefficientse. Soccer plotsf. Box and whisker plotsg. Bugle plotsh. Spatial statistical plotsi. Spatial concentration plots with observation overlays
Dependencies:
· An appropriate collection of model results data· Specific design specifications for model results output products· An appropriate collection of online visualization tools and technologies
Use Case Description
View Project Data and Metadata (Scope: External)1. A user logs into the TSDW website2. The user navigates to the Projects and Studies section of the TSDW website3. The user views metadata associated with the projects that he/she has permission to view
a. Name, purpose, descriptionb. Contact information: project manager(s), contractors, etc.c. Associated datasets: Model input data downloaded, model results uploaded, etc.d. Analysis products: Charts, graphs, summaries, etc.
4. The user views data associated with the projects that he/she has permission to viewa. Model input data
i. Meteorological inputsii. Emissions inputsiii. Initial and Boundary Conditionsiv. Ancillary inputs (land use, land cover, photolysis)
b. Model configuration metadatac. Model results
i. Gridded resultsii. Observation-paired results
d. Monitoring data
Dependencies:
· Appropriate RDBMS schema and SQL scripts/commands for managing Project metadatao Projectso Userso Downloaded/Uploaded Datasetso Documentso Analysis products
· An online user interface for the Projects and Studies section of the TSDW website
• Obtain and Manage Model Input Data (Scope: Internal)
• Harvest File Metadata Using the XML Metadata Generator (Scope: Internal)
• Download Model Input Data from TSDW, Online Method (Scope: External)
• "Download" Model Input Data from TSDW, Offline Method (Scope: External)
• Download Boundary Conditions Generator (Scope: External)
• Download the CAMx Post-Processing Utility (Scope: External)
• Upload Model Results (Scope: External)
• Import Database-Ready Model Results (Scope: Internal)
• Visualize and Analyze Monitoring Data (Scope: External)
• Visualize and Analyze Model Results (Scope: External)
• View Project Data and Metadata (Scope: External)
Use Case Summary
Thanks.
Identification, Acquisition, Pre- and Post-processing, Extraction
Storage, Backup, Restore, Security, Summarizing, Statistics
Searching, Querying, Filtering, Aggregating, Formatting, Packaging
Charting, Graphing, Mapping, Analyzing
Architecture, Design, Implementation, Management, and Operation
WGANPS CIRA
Acquisition Integration Management Distribution Presentation
Verification, Validation, QA/QC, Mapping, Flagging, Tranformation
Tools
3SDW
Results
Documents
Raw Data
Modelers
Planners
Managers
Guidance, Requirements, Feedback, Funding
Monitored
Modeled
Aerosol
Deposition
Gaseous
Emissions
Met
Air Quality
AQS, VIEWS
WestJump, future modeling, etc
Review of the 3SDW Overall System Ecosystem and Architecture
User Login Form
User Registration/Modification Form
User Profile/Account Form
User Feedback Form
Dataset Request Form
Raw Data Download (Query Wizard)
Time Series Charts (Query Wizard)
Dynamic Contour Maps (Query Wizard)
Site Metadata Report (Query Wizard)
Monitoring Site Browser
Modeled Emissions Summary Tool
Modeled-to-Obs Comparison Tool
Air Quality Summary Reports
Model Data Mapping Tool
Future Online Visualization Tools
First TSDW Modeling Use Case Report and Results
• Testers visited the TSDW website and registered with the system to create an account
• Testers visited the Data Request web page and entered their requests for the WestJump Base08b dataset
• Each request was stored in the database
• The system determined whether or not each request could be automatically filled or had to be manually assembled
• The system sent emails to the appropriate TSDW team members to notify them of the data requests
• TSDW team members assembled the dataset requests (copied the relevant data files onto hard drives)
• The datasets (hard drives) were delivered to the beta testers
• The system updated the dataset requests to reflect their “filled” status
First Use Case - Beta Test Steps
• Using the delivered datasets, testers ran the models and generated results
• Testers returned the model output results to the 3SDW
• The test results were assessed by TSDW team members
• Testing outcomes were summarized for the May 3-State AQ Study Technical Workshop
• The TSDW team refines the dataset ordering, download, packaging, and delivery system according to lessons learned
• The TSDW team develops the next Use Case testing scenario(s)
First Use Case - Beta Test Steps (cont’d)
Summer (June – October) 2013 3SAQS Technical Work Review
Data Warehouse Activities
• Implement the collaborative components of the warehouse
• Implement the ongoing news and updates section
• 3SDW on-line for NEPA air quality analysis projects by end of October
• Out-bound data delivery and in-bound data ingestion for NEPA and other air quality studies
• Data warehouse operations (hosting, data analysis and processing, maintenance, et cetera)
• Plans for storage/access/visualization for modeling results and evaluation tools
• Store UT BLM ARMS and other studies’ data in 3SDW after evaluation using protocols
Summary of Coming Activities
Testing and Refinement Help
• All users, collaborators, and partners can help with testing
• Please report bugs – don’t endure them
• Use the website Feedback form
• Send direct email to team members
• Provide as much information as possible up-front
• Stay abreast of ongoing additions and updates
• Be an active part of the design process - make suggestions for features and refinements
• Don’t assume it can’t be done
• Don’t assume it can be done