QAPP - Project TN2 Finalaqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 QAPP.pdf · 2008) and...

Quality Assurance Project Plan

Project 13‐TN2

Development of an IDL‐based geospatial data processing framework for

meteorology and air quality modeling

Daniel Tong Cooperative Institute for Climate and Satellites, University of Maryland, College Park, Maryland

Hyun Cheol Kim, Fantine Ngan Cooperative Institute for Climate and Satellites, University of Maryland, College Park, Maryland

and Pius Lee Air Resources Laboratory, NOAA, College Park, MD

Summary of Project

QAPP Category Number: III

Type of Project: Research or Development (Modeling)

QAPP Requirements: This QAPP includes descriptions of the project and

objectives; organization and responsibilities; geospatial data processing

framework; computational algorithms; GIS and satellite data; quality metrics;

reporting; and references.

QA Requirements: Audits of Data Quality: Cat III = 10% Required Report of QA Findings: In final report

2

Distribution List

Gary McGaughey, Project Manager, Texas Air Quality Research Program

Cyril Durrenberger, Quality Assurance Project Plan Officer, Texas Air Quality Research Program

Bright Dornblaser, Project Liaison, Texas Commission on Environmental Quality

Chris Owen, Quality Assurance Project Plan Officer, Texas Commission on Environmental Quality

Maria Stanzione, Program Grant Manager, Texas Air Quality Research Program

4

1. PROJECT DESCRIPTION AND OBJECTIVES

1.1 Problem Statement

Fast and accurate handling of Geographic Information System (GIS) data and satellite data is

essential in regional meteorological and chemical modeling and for data analysis. Accurate land

use information is particularly important in meteorological simulations for land surface

exchanges. It is also crucial in air quality simulations for spatial allocation of emission sources.

There has been increasing demand for geospatial data processing tools as finer resolution air

quality simulations become more commonplace.

The Texas Commission on Environmental Quality (TCEQ) has been considering utilizing fine

resolution land use land cover (LULC) data and satellite‐based remote sensing data for

meteorology and air quality simulations. Under contract to TCEQ, Byun et al. (2007 & 2008)

have incorporated high resolution LULC dataset from the University of Texas Center for Space

Research (UTCSR) (Wells, 2006) and a Texas Forest Service (TFS) (Cheng and Byun, 2008)

dataset into the Fifth‐Generation Penn State/NCAR Mesoscale Model (MM5) (Grell et al., 1994).

A similar approach to use fine resolution LULC data (30‐m Texas LULC generated by Texas A&M)

has also been done for the Weather Research and Forecasting (WRF) (Skamarock et al., 2008),

which is a successor to the MM5 model (Byun et al., 2011). Satellite data is also an important

source for providing model inputs. Furthermore, this satellite data form one of the bases for

model performance evaluation. Precedence of such applications was evident in several of the

previous TCEQ projects; e.g., implementations of Sea Surface Temperature (SST) (Byun et al.,

2008) and soil moisture (Byun et al., 2011).

Many such previous approaches were successful. However, there was some concern regarding

the general procedures of geospatial data processing used in the approaches. They had all

focused on developing a project specific data processing tool. There was no integrated effort to

handle a variety of GIS and/or satellite data sets in a unified framework. Such a framework will

facilitate speedy incorporation of new data formats. As increasing numbers of finer resolution

data have become available recently, current data processing tools are too slow, or could cause

memory problems while processing huge data sets. For example, current National Land Cover

Data (NLCD, http://www.mrlc.gov/index.php) data features 30‐m LULC data covering the

Contiguous United States (CONUS) (161190×104424=16 billion pixels) requires a large

computer with large run time and memory to process. Usually, it is impossible to load the

whole data set into the computer memory. Random access and subtraction of local data sets

are a fundamental capability of a successful data processing tool. The CONUS road network

5

data in vectorized polyline formats is another example. It usually has more than a million

entities. Therefore efficient tools to handle these data are necessary.

1.2 Project Objectives

This project investigates basic computational algorithms to handle GIS data and satellite data. It

develops a set of generalized libraries within a geospatial data processing framework aiming for

more efficient and accurate processing of geospatial data. We will utilize the Interactive Data

Language (IDL), by EXELIS Visual Information Solutions, to build geospatial processing libraries.

An IDL‐based Geospatial Data Processor (IGDP) has been created by the Air Resources

Laboratory, National Ocean and Atmospheric Administration (ARL/NOAA). It can process GIS

data both in vector format (e.g., ESRI shapefiles (.SHP) and raster format (e.g., Geo Tagged

Image File Format (GeoTIFF) and ERDAS IMAGINE (.IMG)) for any given domain. Processing

speeds will be improved through selective usages of polygon‐clipping routines and other

algorithms optimized for particular applications. The raster tool will be developed utilizing a

histogram reverse‐indexing method that enables easy access of grouped pixels. It generates

statistics of pixel values within each grid cell with improved speed and enhanced control of

memory usage. The spatial allocating tool, using the polygon clipping algorithms, requires huge

computational power to calculate fractional weighting between GIS polygons (and/or polylines)

and gridded cells. To overcome this speed issue and computational accuracy, an efficient

polygon/polyline clipping algorithm is crucial. A key for faster spatial allocation is to optimize

computational iterations in both polygon clipping and map projection calculations.

The project has the following specific objectives

1. To conduct a literature search for summarizing and comparing available GIS data

processing algorithms. Advantages and constraints from each algorithm will be

described.

2. To develop an optimized geospatial data processing tool that can handle raster data

format (e.g. pixels) and vector data format (polylines and polygons) with enhanced

processing time and accuracy, for any given target domain.

3. To collect and to process sample GIS and satellite data. Applications will include a spatial

regridding method on emissions and satellite data, such as the Moderate Resolution

Imaging Spectroradiometer (MODIS) Aerosol Optical Depth (AOD), the Ozone

Monitoring Instrument (OMI), and the Global Ozone Monitoring Experiment

(GOME)‐2 NO2 column data.

4. To perform an engineering test with processed fine resolution LULC data.

5. To draft a final report that documents all work performed in support of the project.

6

2. ORGANIZATION AND RESPONSIBILITIES

2.1 Personnel and Responsibilities

This project is a collaborative effort between Drs. Daniel Tong, Hyun Cheol Kim, Fantine Ngan of

the University of Maryland at College Park (UMD), and Dr. Pius Lee of Air Resources Laboratory

(ARL) of the National Oceanic and Atmospheric Administration (NOAA). Dr. Daniel Tong,

Research Associate Professor in the Cooperative Institute for Climate and Satellites, UMD at

College Park, is the Principal Investigator for the project. Drs. Hyun Cheol Kim and Fantine Ngan

from the Cooperative Institute for Climate and Satellites, UMD at College Park will serve as Co‐

Principal Investigators. Dr. Pius Lee, ARL National Air Quality Forecasting Capability Project

Leader, will also serve as a Co‐Principal Investigator for the project. Project participants and their

responsibilities are provided in Table 1 below. Drs. Daniel Tong and Pius Lee will have overall

oversight of the quality assurance.

7

Table 1. Project participants and their affiliations and key responsibilities.

Participant (Organization) KeyResponsibilities

Daniel Tong

(UMD)

Principal Investigator with overall responsibility for preparation

of emissions test runs, including the quality assurance and quality

control activities.

Hyun Cheol Kim

(UMD)

Co‐Principal Investigator with overall responsibility for the GIS

data processing algorithm review, development of the raster

and vector data processing tool, development of the spatial data

regridding method and the satellite data processing,

documentation and training of newly developed tools, including

quality assurance and quality control activities.

Fantine Ngan

(UMD)

Co‐Principal Investigator with overall responsibility for model

simulation and result evaluation, including quality assurance and

quality control activities. Pius Lee

(ARL/NOAA)

Co‐Principal Investigator with overall responsibility for quality

assurance and quality control activities.

Gary McGaughey

(University of Texas at Austin)

AQRP project Manager who oversees that the grantees achieve

the satisfactory completion of the project.

8

2.2 Schedule

The schedule for specific tasks is listed in Table 2.

Table 2. Schedule of project activities.

ID Task 2/13 3/13 4/13 5/13 6/13 8/13 9/13 10/13 11/13

1 Literature Review for basic algorithms X X X

2 Collection of GIS and satellite data X X

3 Development of raster tool X X X X

4 Development of vector tool X X X X

5 Applications (e.g. spatial regridding of

emissions and satellite data) X X X X

6 Documentation and training (manuals on IDL

library and packages) X X X X X X

7 Test run (engineering test) X X X

8 Reporting X X X X X X X X X

9

3. SCIENTIFIC APPROACH: Computational algorithms and data

Appropriate handling of GIS data is crucial in air quality simulations, especially in preparation of

emissions data. However, tools for GIS data have scarcely been developed in the air quality

scientific community. In most cases, current solutions for GIS data processing include PC‐based

ArcGIS tools and/or the U.S. EPA spatial allocator. Both tools have clear limitations in the

seamless processing of GIS data: compatibility on multi‐platforms, flexibility in various data

formats; and most importantly, processing speed for fine resolution data. In order to overcome

these limitations, we will design a generalized GIS data processing library and package to

process not only GIS data, but also emissions and satellite data.

Most GIS data has three types of data: point (pixel), line (polyline), and area (polygon). Pixel

data is usually called a raster dataset which is one‐dimensional dataset without effective area.

The value of each pixel represents the value at the center point of the area. Polyline and/or

polygon data have an effective length or area, and a polygon is a closed form of polyline, so we

can use similar routines for both polyline and polygon data. In order to convert polygon or pixel

data to a gridded format, we need to know how these irregular polygons overlap each grid cell,

and how many pixels are inside each grid cell. A polygon clipping algorithm and a pixel grouping

algorithm are key components in this new GIS data processing tool that we will develop.

3.1 Polygon clipping algorithms

Traditionally, a polygon clipping algorithm has been used in computer graphics to clip out the portions of a polygon that lie outside the window of the output device to prevent undesirable effects. Lately advanced computer graphics uses polygon clipping to render 3D images through hidden surface removal and to produce high‐quality surface details using techniques such as Beam tracing. It is also used in distributing the objects of a scene to appropriate processors in multiprocessor ray‐tracing systems to improve rendering speeds.

As clipping an arbitrary polygon against an arbitrary polygon is a basic routine in computer graphics and it may be applied thousands of times, the efficiency of these routines is therefore extremely important. To achieve good results, several polygon clipping algorithms have been developed, from simplified clipping algorithms that can only clip regular (e.g. convex) polygons, to more complicated algorithms that can handle more general polygons (e.g. concave or self‐intersecting polygons). These algorithms have their own advantages and disadvantages in processing efficiency and flexibility. We describe two such algorithms:

(1) Southerland‐Hodgman algorithm

The Sutherland‐Hodgman polygon clipping algorithm (Sutherland and Hodgman, 1974) was introduced in 1974. It works by extending each line of the convex clip polygon in turn and selecting only vertices from the subject polygon that lie on the visible side. The algorithm

10

begins with an input list of all vertices in the subject polygon, and one side of the clip polygon is extended infinitely in both directions, and the path of the subject polygon is traversed. Vertices from the input list are inserted into an output list if they lie on the visible side of the extended clip polygon line, and new vertices are added to the output list where the subject polygon path crosses the extended clip polygon line. This process is repeated iteratively for each clip polygon side, using the output list from one stage as the input list for the next. Once all sides of the clip polygon have been processed, the final generated list of vertices defines a new single polygon that is entirely visible. These steps are shown in Fig. 1. This is a very fast and efficient algorithm, but applies only when the target polygon is convex.

(2) Vatti clipping algorithm

The Vatti clipping algorithm is a more complicated and generalized polygon clipping algorithm. It allows clipping of any number of arbitrarily shaped subject polygons by any number of arbitrarily shaped clip polygons. Unlike the Sutherland‐Hodgman algorithm, the Vatti algorithm does not restrict the types of polygons that can be used as subjects or clips. Even more complex (e.g. self‐intersecting) polygons, and polygons with holes can be processed. This algorithm can be applied to various Boolean clipping operations: “intersections”, “difference”, “union”, and “exclusive”. This algorithm is generally applicable only in the 2D space.

Compared to the Sutherland‐Hodgman algorithm, the Vatti clipping algorithm is a more complete polygon clipping algorithm with more functionalities, but its features are sometimes beyond our scope and cause inevitable loss in processing efficiencies. Therefore, for the development of the IGDP, we utilized both polygon clipping algorithms, the simple Sutherland‐Hodgman and the complex Vatti algorithm, based on the necessary features and optimized processing time of GIS data.

Figure 1. Steps of Sutherland‐Hodgman polygon clipping algorithm (http://www.cs.helsinki.fi/group/goa/viewing/leikkaus/intro2.html)

11

3.2 Raster data handling procedures

The raster data processing tool uses a histogram reverse‐indexing method in the IDL histogram

function, and is capable of fast access of grouped pixels. For each grid cell, the raster tool

provides a histogram and statistics of pixels inside the grid cell. Figure 2 shows an example of

30‐m NLCD LULC data in a 4‐km grid cell near the Houston region. Usually, around 18,000 pixels

are found in a 4‐km grid cell, and the histogram in the right panel shows the pixel count

distribution for LULC types in a single grid cell.

12

Figure 2. 30‐m NLCD land use land cover data set near Houston (left), and an example pixel

distribution from a 4‐km grid cell.

3.3 Application of IGDP ‐ Satellite data regridding

Regridding of model output or satellite data with different map projection settings is very

important for inter‐comparisons of modeled results and/or satellite outputs. Simple

interpolation might be able to generate an approximate result, but it fails to conserve mass. For

more accurate remapping of spatial data, we need to know the exact fractions between the

initial data cells and the target grid cells. The spatial allocator tool (e.g. vector tool) from the

IGDP can provide exact fractions using the polygon clipping algorithm. Figure 3 shows an

example of this “conservative remapping” method. If we would like to regrid 4km data into

12km grid cells exactly, we need to know overlapping fractions of each original cell to the target

cell. Using spatial allocator from the GDP vector tool, one can perform regridding calculations

with the necessary accuracy.

13

Figure 3. Example of “conservative spatial regridding”. In order to calculate the exact amount in

total value in 12‐km cell (j), we need to know all the 4‐km cell(i)’s overlapping fractions, and

then sum them up.

4. QUALITY METRICS

Tool evaluation: As the goal of this project is to develop an accurate and efficient tool to

process geospatial data and satellite data which are usually used for model input and model

evaluation, the performance of the geospatial data processing tool will be evaluated (1) by the

comparison to geospatial data output processed using traditional tools, such as ArcGIS and EPA

spatial allocator, as well as the terrain processor of WRF Preprocessing System (WPS), and (2)

by the estimation of the tool’s processing speeds for varying model domain configurations.

Processed data will be compared with geographical spatial maps – comparing processed water

or ocean fractions with GIS coast line data and/or hydrology GIS data (e.g. rivers and lakes) are

one of the simple and effective ways to check the accuracy of the new tool, especially the

capability of handling proper map projections. Fine resolution LULC data, 30‐m or finer (if

available), will be processed as model inputs in a fine resolution simulation (i.e. 4‐km

simulation), and will be compared with default data set currently used in WRF‐ARW model.

Spatial areas with significant differences will be identified and described, by both graphical and

statistical comparisons. Four spatial graphics – processed data with the new tool, processed

data with model terrain processor (e.g. WRF WPS), unprocessed fine resolution raw data, and

coast line GIS data (e.g. shapefile (.SHP)) – will be compared in the regions with complicated

coast lines (e.g. Houston‐Galveston region). Statistics of processed data by the new tool and the

14

traditional tool will be compared by showing the histogram distribution of each LULC type

indices.

Engineering test: An engineering test will be performed for a short time period, to ensure

quality of newly developed IGDP GIS data processing tool. Meteorology and/or chemistry

model simulations using high resolution LULC, and additional fine resolution data, if any, will be

performed and a general evaluation with observations will be conducted. Comparisons with

old/new or coarse/fine resolution input will be presented, but detailed investigation of any

scientific finding will be beyond the scope of this project. Table 3 summaries the model

configuration of the WRF model for the meteorology simulation that follows TCEQ SIP modeling

settings. We will choose identical domain settings as previous studies to minimize efforts for

model set up and benefit by using the previous model results as base for this evaluation. Model

simulations with old and new inputs will be compared by computing mean deviation and root

mean square deviation between two model outputs. We will also compute statistics including

mean bias (MB) and root mean square error (RMSE), using surface observational data.

Mean Deviation,

N

iii MM

NMD

1

)21(1

Root Mean Square Deviation (RMSD),

2/1

1

2)21(1

N

iiiRMSD MM

NE

Mean Bias (MB),

N

iiiGB OM

NB

1

)(1

Root Mean Square Error (RMSE),

2/1

1

2)(1

N

iiiRMSE OM

NE

where M is model value, O is measured value, and N is number of data. M1 and M2 denote

simulations with new and old inputs, respectively.

15

Table 3. Model configuration and domain nesting of the WRF model.

Domain name NA36 SUS12 TX04

Resolution 36 km 12 km 4 km

Domain coverage Continental US Texas & adjoined

states

Eastern Texas

Horizontal grid 162 x 128 174 x 138 216 x 288

Initialization NAM + NCEP daily SST

Run in 2‐way nesting

Nest‐down of SUS12

Microphysics WSM5a WSM6b

Cloud scheme KFc None

Radiation scheme RRTMd for longwave radiation

MM5 (Dudhia)e for shortwave radiation

PBL scheme YSUf scheme

Land surface

model

5‐layer slab modelg

Nudging 3D grid nudging (no nudging of mass fields within PBL)

a WRF Single‐Moment 5‐class (Hong et al., 2004). b WRF Single‐Moment 6‐class (Hong and Lim, 2006). c Kain and

Fritsch scheme (Kain, 2004). d Rapid Radiative Transfer Model scheme (Mlawer et al., 1997). e Dudhia (1989). f

Yonsei University scheme (Hong et al., 2006). g 5‐layer soil temperature model (Grell et al., 1994).

5. DATA ANALYSIS, INTERPRETATION, AND MANAGEMENT

In addition to the development of a geospatial data processing tool, three types of data will be

collected and archived for the tool’s operational test and performance evaluations: (1) various

GIS data will be collected and utilized, both in vector and raster data format. GIS shape files for

population, census track, road networks, rail road, etc., will be tested for the evaluation of

16

polygon and/or polyline clipping capability, and various fine resolution land use land cover will

be tested for the geospatial tool’s raster data handling capability. (2) Various satellite data, such

as MODIS AOD, OMI/GOME‐2 NO2 column data, and/or Geostationary Sea Surface

Temperature (SST) data, will be collected and utilized to investigate spatial regridding capability

of the newly developed geospatial data tool. (3) A short‐term engineering model run with

outputs will be produced and archived. This engineering run is to evaluate the basic

performance of the geospatial data processing tool and to generate an example of the model’s

input data. This run is not intended to generate the best‐effort simulation with scientific

meaning, but will be evaluated with reasonable methods, and will be discussed for the model

input’s overall quality and future project topic development. Simulation data will be evaluated

by comparisons with base run (e.g. with lulc data processed by traditional tool) and

observations data, as described in the previous section.

6. REPORTING

A technical work plan (statement of work, quality assurance project plan, budget, and budget

justification) will be submitted by February 15, 2013. Monthly technical reports will be

prepared and submitted by the 8th of each month with accompanying financial reports

submitted by the 12th of each month throughout the duration of the project. The literature

review of basic GIS data processing algorithms (e.g. polygon clipping algorithms) and their

inter‐comparisons of performance will be described in detail in the final report. Manuals on

the IDL routine library will also be included in the final report. Engineering test run results with

simple evaluation will be documented in the final project report. A final technical report will be

submitted by November 30, 2013, preceded by a draft final report on October 21, 2013.

During or after completion of the project, the investigators anticipate the preparation of

conference presentations and manuscripts for submission to appropriate peer‐reviewed

journals in the field. Drs. Daniel Tong and Pius Lee will supervise the completion of all reports,

presentations, and manuscripts, which will be collaborative efforts between the UMD and the

ARL/NOAA team.

17

6. REFERENCES

Byun, Daewon, F. Ngan, F.‐Y. Cheng, H.‐C. Kim, and S. Kim, Improvement of MM5 Surface

Characteristics, Final Report, Texas Commission on Environmental Quality, August, 2008,

44 pp

Byun, Daewon, S. Kim, F.‐Y. Cheng, H.‐C. Kim, and F. Ngan, Improved Modeling Inputs: Land Use

and Sea‐Surface Temperature, Final Report, Texas Commission on Environmental

Quality, August, 2007, 33 pp

Byun, D. W., F. Ngan, and H. C. Kim, 2011: Improvement of Meteorological Modeling by

Accurate Prediction of Soil Moisture in the Weather Research and Forecasting (WRF)

Model. Final Report for Texas Commission on Environmental Quality, March 2011, 46 pp

Cheng, F.Y., Byun, D.W., 2008. Application of high resolution land use and land cover data for

atmospheric modeling in the Houston‐Galveston Metropolitan Area: Part I,

meteorological simulation results. Atmos. Environ. 42, 7795e7810.

Grell, G.A., Dudhia, J., Stauffer, D., 1994. A description of the fifth‐generation Penn State/NCAR

mesoscale model (MM5), NCAR Technical Note: NCAR/TN‐ 398þSTR.

Sutherland, I. E. and G. W. Hodgman, 1974: Reentrant polygon clipping, Comm. of the ACM, 17,

32‐42, doi:10.1145/360767.360802

Vatti, B.R., 1992: A generic solution to polygon clipping, Comm. of the ACM, 7, 56‐63,

doi:10.1145/129902.129906

Wells, G., 2006: The New Eastern Texas Land Use Land Cover Classification Project, HARC

Project Contact number: H‐46‐T28‐2004‐T2, UT‐Austin Center for Space Research,

Austin, TexasNielsen‐Gammon, J. W., 2001: Initial modeling of August 2000 Houston‐

Galveston ozone episode, report to the Technical Analysis Division, Texas Natural

Resource Conservation Commission. Dec. 2001.

Date post:	26-Apr-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

QAPP - Project TN2 Finalaqrp.ceer.utexas.edu/projectinfoFY12_13/12-TN2/12-TN2 QAPP.pdf · 2008) and...

Documents