PURDUE UNIVERSITY GRADUATE SCHOOL Thesis/Dissertation ... · Obviously, I was able to finish my...

01 14

PURDUE UNIVERSITY GRADUATE SCHOOL

Thesis/Dissertation Acceptance

Thesis/Dissertation Agreement.Publication Delay, and Certification/Disclaimer (Graduate School Form 32)adheres to the provisions of

Department

Ç±«² Í¸·µ Ð¿®µ

Ü»ª»´±°³»²¬ ¿²¼ Û²¸¿²½»³»²¬ ±º É»¾ó¾¿»¼ Ì±±´ ¬± Ü»ª»´±° Ì±¬¿´ Ó¿¨·³«³ Ü¿·´§ Ô±¿¼

Ü±½¬±® ±º Ð¸·´±±°¸§

Þ»®²¿®¼ ßò Û²¹»´

Ö±²¿¬¸¿² Óò Ø¿®¾±®

×²¼®¿¶»»¬ Ý¸¿«¾»§

Ê»²µ¿¬»¸ Óò Ó»®©¿¼»

Þ»®²¿®¼ ßò Û²¹»´

Þ»®²¿®¼ ßò Û²¹»´ ðíñîêñîðïì

i

i

DEVELOPMENT AND ENHANCEMENT OF WEB-BASED TOOLS

TO DEVELOP TOTAL MAXIMUM DAILY LOAD

A Dissertation

Submitted to the Faculty

of

Purdue University

by

Youn Shik Park

In Partial Fulfillment of the

Requirements for the Degree

of

Doctor of Philosophy

May 2014

Purdue University

West Lafayette, Indiana

ii

ii

To my parents, brother, and N. Kim

iii

iii

ACKNOWLEDGEMENTS

Obviously, I was able to finish my dissertation and Ph.D. program with many thankful

people I met. I’ve never had happier moments than they could have been, since I was

with the people I met and raised me up when I was frustrated.

I would like to express my deepest gratitude to my advisor, Dr. Bernie Engel, for his

excellent guidance, patience, support, and providing me with opportunities to do what I

wanted to try and suggestions to resolve problems I had. It was only several years here,

but I am convinced that the last years with him are enough to change my thirty years

from now. I am also grateful to Dr. Jon Harbor, Dr. Indrajeet Chaubey, and Dr.

Venkatesh Merwade for serving on my research committee and for excellent comments

and suggestions.

I would like to thank to Larry Theller (also known as Uncle Larry) for assistance in web

programming and GIS; it was delightful to travel to Chicago and Ann Arbor for projects.

I am grateful to Barbara Davies and Rebecca Peer for their kindness (five-dollar-per-

month-coffee as well).

iv

iv

I would also like to thank Dr. Kyoung Jae Lim who introduced me Ph. D. program at

Purdue University and helped me gain programming skills for my research at Purdue

University.

Nayoung Kim, you are the person I am most thankful. Thank you. I have been seizing the

days with you, and I will do still, but I won’t let the days seize me.

v

v

TABLE OF CONTENTS

Page

LIST OF TABLES ........................................................................................................... viii

LIST OF FIGURES ........................................................................................................... ix

ABSTRACT ................................................................................................................ xii

CHAPTER 1. INTRODUCTION .................................................................................... 1

1.1 Problem Statement .................................................................................... 1

1.2 Objectives ................................................................................................ 4

1.2.1 Proposed Objectives for Annual Load Estimation ..............................4

1.2.2 Proposed Objectives for Enhancement and Development of

TMDL Models ............................................................................................................ 5

1.3 Dissertation Organization ........................................................................ 6

1.4 References ................................................................................................ 7

CHAPTER 2. ANALYSIS FOR REGRESSION MODEL BEHAVIOR BY

SAMPLING STRATEGY FOR ANNUAL LOAD ESTIMATION ................................ 10

2.1 Abstract .................................................................................................. 10

2.2 Introduction ........................................................................................... 11

2.3 Methodology ......................................................................................... 13

2.3.1 Water Quality Data ...........................................................................16

2.3.2 Subsampling Methods and Regression Models ................................19

2.4 Results and Discussions ........................................................................ 23

2.4.1 Ratio Comparison of Sampling Strategies and Regression Models .....

...........................................................................................................23

2.4.2 Water Quality Data from High Flow ................................................26

2.4.3 Improvement of Annual Load Estimation.........................................36

2.5 Conclusions ........................................................................................... 38

2.6 References .............................................................................................. 41

vi

vi

Page

CHAPTER 3. IDENTIFYING THE CORRELATION BETWEEN WATER

QUALITY DATA AND LOADEST MODEL BEHAVIOR ........................................... 46

3.1 Abstract .................................................................................................. 46

3.2 Introduction ........................................................................................... 47

3.3 Methodology ......................................................................................... 51

3.3.1 Water Quality Data Statistics for Annual Load Estimates ................51

3.3.2 Water Quality Data Selection for LOADEST runs ...........................53

3.4 Results and Discussions ........................................................................ 54

3.4.1 Required Statistics for Annual Load Estimates ................................54

3.4.2 Mean Flow in Calibration Data and Annual Load Estimates ...........60

3.4.3 Improvement of the Poorest Annual Load Estimates .......................66

3.5 Conclusions ........................................................................................... 69

3.6 References .............................................................................................. 71

CHAPTER 4. A WEB TOOL FOR STORET/WQX WATER QUALITY DATA

RETRIEVAL AND BEST MANAGEMENT PRACTICE SCENARIO

IDENTIFICATION........................................................................................................... 76

4.1 Abstract .................................................................................................. 76

4.2 Introduction ........................................................................................... 77

4.3 Methodology ........................................................................................... 80

4.3.1 Module Development to Use Water Quality Data ............................80

4.3.2 Module Development to Suggest BMP Scenarios ............................83

4.4 Application of the Web Tool ................................................................. 88

4.5 Conclusions ........................................................................................... 93

4.6 References .............................................................................................. 96

CHAPTER 5. A WEB MODEL TO ESTIMATE THE IMPACT OF BEST

MANAGEMENT PRACTICES ..................................................................................... 100

5.1 Abstract ................................................................................................ 100

5.2 Introduction ......................................................................................... 101

5.3 Methodology ....................................................................................... 104

5.3.1 Annual Direct Runoff Computations ..............................................105

5.3.2 Web Interfaces and CLIGEN Use ...................................................109

5.3.3 Auto-Calibration Modules ..............................................................111

vii

vii

Page

5.3.4 Optimization of Best Management Practices ..................................114

5.4 Results ................................................................................................ 116

5.4.1 Annual Direct Runoff Computations ..............................................116

5.4.2 Application of STEPL WEB ...........................................................121

5.5 Conclusions ......................................................................................... 125

5.6 References ............................................................................................ 127

CHAPTER 6. SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS ....... 132

6.1 Summary .............................................................................................. 132

6.2 Conclusions ......................................................................................... 136

6.3 Recommendations for Future Research ............................................... 138

APPENDICES

Appendix A Statistics of USGS Stations .................................................................. 140

Appendix B 95% Confidence Intervals of Ratios ................................................... 150

Appendix C Relationships between Flow, Concentration, Logarithm Flow,

Logarithm Load, and Squared Logarithm Load ........................................................ 156

VITA .............................................................................................................. 164

viii

viii

LIST OF TABLES

Table .............................................................................................................................. Page

Table 2.1 Number of Stations and Total Years ................................................................. 17

Table 2.2 Improvement of Annual Load Estimates by Increasing PCH Data .................. 38

Table 3.1 Various water quality data for LOADEST uses ............................................... 50

Table 3.2 Statistics in Calibration and Estimation Data ................................................... 53

Table 3.3 Daily Sediment Data from USGS Stations ....................................................... 54

Table 3.4 Comparison of Errors between Regression Equation and All Data.................. 66

Table 3.5 Improvement of the Poorest Load Estimates by MFC Fitting .......................... 69

Table 4.1 BMP Categories for Each Flow Regime (USEPA, 2007) ................................ 85

Table 4.2 Default BMP Costs for Landuses ..................................................................... 87

Table 4.3 Target Load, 90th

Percentile Load, and Required Reduction Percentage ......... 93

Table 5.1 Daily Precipitation Collection from NCDC.................................................... 108

Table 5.2 Annual Precipitation and Direct Runoff Computations .................................. 120

Table 5.3 Landuse Distribution for Study Watershed .................................................... 124

Table 5.4 STEPL WEB Parameters (Default / Calibrated) ............................................. 125

Table 5.5 Annual Direct Runoff, Baseflow, and Sediment load .................................... 125

Appendix Table Page

Table A 1. Statistics of USGS Stations ........................................................................... 141

ix

ix

LIST OF FIGURES

Figure ............................................................................................................................. Page

Figure 2.1 Locations of Water Quality Data Stations ....................................................... 17

Figure 2.2 Correlation Coefficients for Concentrations of Water Quality Parameters with

Streamflow ........................................................................................................................ 19

Figure 2.3 Example of Subsampled Dataset ..................................................................... 21

Figure 2.4 Examples of Pollutant Load Estimates (Sed: sediment, Phs: phosphorous, Mn:

monthly, Bi: biweekly, Wk: weekly, Fx: fixed interval sampling, St: fixed interval

sampling with storm events, 0 and 1: LOADEST model number 0 and 1) ...................... 25

Figure 2.5 Mean and Width of 95% Confidence Intervals by Percentage of Calibration

Data from High Flow ........................................................................................................ 35

Figure 2.6 Seasonal Variation in Blanchard River near Findlay, Ohio Nitrogen Data from

the National Center for Water Quality Research of Heidelberg University ..................... 36

Figure 3.1 Correlation between Errors and Mean of Flow in Calibration Data ............... 57

Figure 3.2 Correlation between Mean of Flows in Calibration and Estimation Data....... 58

Figure 3.3 Required MFC by Regression Equation .......................................................... 59

Figure 3.4 Comparison of Slopes from Linear Regression Formula (LRS in equation 3.7)

and Calibrated LOADEST Model (a1 in equation 3.4) ..................................................... 64

Figure 3.5 Annual Sediment Load Estimates by MFCs ................................................... 65

x

x

Figure Page

Figure 3.6 Load Estimate Improvement when Excluding Water Quality Samples (USGS

Station Number 02119400, Monthly Fixed Sampling Strategy on 19th

of Every Month)....

........................................................................................................................................... 68

Figure 4.1 Google Maps Interface to Retrieve USGS Water Quality Data ...................... 82

Figure 4.2 Schematic Depicting Web-based Tool Access of Water Quality Data from

EPA STORET/WQX Location Database and Web Access to WQP ................................ 83

Figure 4.3 Flow Data Collection by USGS Flow Station Location Tool ......................... 91

Figure 4.4 Water Quality Data Collection by WQP Location Tool ................................. 92

Figure 4.5 Load Duration Curve for the Study Watershed ............................................... 93

Figure 5.1 Location Map of NCDC and CLIGEN Stations within Indiana.................... 109

Figure 5.2 Annual Direct Runoff, Groundwater, and Pollutant Load Computation in

STEPL WEB ................................................................................................................... 114

Figure 5.3 Comparison of Annual Direct Runoff by Different Approaches .................. 121

Figure 5.4 Landuses of Tippecanoe River at North Webster Watershed ....................... 124

Appendix Figure

Figure B 1. Ratio Comparison of Sediment Estimation by Fixed Sampling Frequencies

......................................................................................................................................... 150

Figure B 2. Ratio Comparison of Sediment Estimation by Fixed Sampling Frequencies

Supplemented with Stratified Sampling ......................................................................... 151

Figure B 3. Ratio Comparison of Phosphorus Estimation by Fixed Sampling Frequencies

......................................................................................................................................... 152

xi

xi

Appendix Figure Page

Figure B 4. Ratio Comparison of Phosphorus Estimation by Fixed Sampling Frequencies


Figure B 5. Ratio Comparison of Nitrogen Estimation by Fixed Sampling Frequencies.....

......................................................................................................................................... 154

Figure B 6. Ratio Comparison of Nitrogen Estimations by Fixed Sampling Frequencies


Figure C 1. Scatter plot of flow, concentration, and load with PCH of 0% ................... 157

Figure C 2. Scatter plot of flow, concentration, and load with PCH of 14% ................. 158






xii

xii

ABSTRACT

Park, Youn Shik. Ph.D., Purdue University, May 2014. Development and Enhancement

of Web-based Tools to Develop Total Maximum Daily Load. Major Professor: Bernard

A. Engel.

Flow and load duration curves (FDCs and LDCs) are commonly used to develop total

maximum daily loads (TMDLs). A web-based tool was previously developed to facilitate

development of FDC and LDC, allowing use of USGS streamflow data via web access.

In the research reported here, the tool has been upgraded to retrieve water quality data

from STORET/WQX and USGS, because significant effort is often required to obtain

water quality data, and additional tools were developed to assist in decision making for

best management practices (BMPs) selection.

The Web-based LDC Tool employs LOADEST and LOADIN to estimate daily pollutant

loads using intermittent water quality data; therefore, LOADEST and LOADIN were

evaluated for annual pollutant load estimations. Daily nitrogen, phosphorus, and sediment

concentration data were collected and subsampled using six sampling strategies. Since

the water quality parameters showed different relationships with streamflow for each

sampling strategy, it was concluded that pollutant regression models need to be selected

based on water quality parameters. In addition, water quality data used to estimate annual

pollutant loads need to include an appropriate proportion of water quality data from storm

xiii

xiii

events, with 20-30% of water quality data from high-flow (i.e. the upper 10 percent of

flows for a given analysis period) providing the closest estimated sediment and

phosphorus loads to measured loads.

After the Web-based LDC Tool identifies pollutant loads exceeding standards and

computes the required pollutant reduction to meet standard loads, a model capable of

simulating BMPs was required. The Spreadsheet Tool for the Estimation of Pollutant

Load (STEPL), a spreadsheet model to estimate annual pollutant loads, was evaluated as

the basis for the BMP model. STEPL computes annual direct runoff using the Soil

Conservation Service Curve Number (SCS-CN) method with average rainfall per event.

Annual direct runoff using the EPA STEPL approach showed large differences compared

to the annual direct runoff computed by general use of SCS-CN method. However,

annual direct runoff computed from daily precipitation data generated from CLIGEN

showed smaller differences than values computed from EPA STEPL approaches.

Therefore, a web-based model to simulate BMPs, STEPL WEB, was developed to

compute annual direct runoff obtained using daily precipitation data generated by

CLIGEN. STEPL WEB establishes a priority list of BMPs based on implementation cost

per mass of pollutant reduction, and then the model performs iterative simulations to

identify the most cost-effective BMP implementation plans.

1

1

CHAPTER 1. INTRODUCTION

1.1 Problem Statement

Total Maximum Daily Load (TMDL) is a water quality (WQ) standard designed to

preserve and regulate the quality of watersheds in the USA. Section 303(d) of the Clean

Water Act indicates that states and other defined authorities having contaminated water

need to establish priority rankings and develop TMDL plans to meet identified water

quality standards.

Watershed models are commonly used to develop TMDLs and to evaluate existing

pollutant loads. One of the greatest benefits of using watershed models is that they allow

consideration of the specific characteristics or conditions of a watershed, using temporal

and spatial data. However, this is also somewhat of a disadvantage, because such models

require not only a wide range of inputs but also expertise to prepare the inputs and to use

the models. Moreover, different models have apparent differences, purposes, applicability,

and uncertainties, and have various structures and assumptions for modeling complex

systems found in nature. Models have various advantages and disadvantages, and it may

be necessary to combine two or more models when solving a particular problem (Babbar-

Sebens and Karthikeyan, 2009; Shen and Zhao, 2010; Cleland, 2003; USEPA, 2008).

Load duration curve (LDC) analysis, a relatively simple approach requiring less input and

2

2

effort than watershed models, identifies the existing and allowable pollutant loads, and

requires streamflow and water quality (WQ) datasets. LDC analysis is a statistical

approach using cumulative streamflow and pollutant loads associated with streamflow

(USEPA, 2007).

A web-based tool to develop flow duration curves (FDC) and LDC was developed by

Kim et al. (2012). The Web-based LDC Tool (https://engineering.purdue.edu/~ldc/) uses

not only the user’s streamflow data but also USGS streamflow data via web access. The

tool simplifies development of FDC and LDC with user-friendly tablet and graphical

interfaces. The tool required users to prepare WQ data, which is fundamental data

required to develop LDC. Significant effort is often required to collect the WQ data, and

thus, there was a need to enhance the tool to access water quality data from

STORET/WQX and USGS.

To estimate annual pollutant loads and required reduction of pollutant loads, WQ data

associated with streamflow needs to have an identical temporal resolution to the

streamflow data. However, WQ data is typically intermittent. The tool employs

LOADEST to estimate daily pollutant loads using intermittent WQ data, however, the

regression models of LOADEST need to be evaluated to identify which regression

models are appropriate in annual pollutant load estimation and which regression model

shows ‘best fit’ to the ‘true load’. Therefore, the regression models of LOADEST and

another regression model used in the Web-based Load Interpolation Tool (LOADIN), a

web-based tool to interpolate pollutant loads, were evaluated in this study. With a

3

3

regression model to estimate daily loads, a sampling frequency is also assumed as one of

the influential factors in estimation of annual pollutant loads. To evaluate the regression

models, various sampling frequency strategies were evaluated to identify the required

number of WQ data per year for reasonable annual pollutant load estimation. For the

evaluation of sampling frequency strategies and regression models, daily WQ data were

collected from 21 stations for total nitrogen, 69 stations for phosphorous, and 211 stations

for suspended sediment. The collected daily WQ data were artificially degraded to

evaluate the effect of different sampling frequencies (i.e. weekly, biweekly, and monthly

fixed sampling intervals) on LOADEST and LOADIN results.

A simple LDC analysis allows a user to identify sources that contribute to pollutant loads

exceeding the TMDL and can help in suggesting Best Management Practices (BMPs) (i.e.

for nonpoint or point source pollution) to reduce pollutant loads in a watershed. However,

the original Web-based LDC tool was not designed to simulate BMPs to reduce pollutant

loads. To supplement the Web-based LDC Tool in terms of simulating BMP effects, the

Spreadsheet Tool for the Estimation of Pollutant Load (STEPL) was selected, which is a

spreadsheet-based model that estimates surface runoff, sediment load, nutrient loads, and

5-day biological oxygen demand (BOD5). The database incorporated with the model is

useful not only for annual runoff and pollutant load estimation but also for simulating the

implementation of various Best Management Practices (BMPs). Although STEPL

provides annual runoff and pollutant load estimation, the model has limitations

introduced by the way in which it calculates runoff. The model calculates runoff based on

the SCS-Curve Number (CN) method using a default initial abstraction and does not

4

4

adjust the CN and the SCS-CN equation to reflect changing initial abstraction.

Additionally, STEPL uses ‘average rainfall per event’ to reproduce annual runoff, rather

than actual rainfall data, and it is important to determine the value of using an alternate

approach. To correct the limitations of STEPL, a web-based model was developed to

integrate the revised STEPL with the Web-based LDC Tool.

1.2 Objectives

The overall goal of this study was to develop a framework for TMDL development using

web-based tools that include the Web-based LDC Tool with LOADEST and a Web-based

STEPL. First, water quality sampling frequency strategies and regression models to

interpolate intermittent WQ data and to estimate annual pollutant load were evaluated.

Second, the Web-based LDC Tool was improved to collect WQ data via web access, and

the Web-based STEPL was developed to estimate annual pollutant loads and to simulate

BMP effects.

1.2.1 Proposed Objectives for Annual Load Estimation

The LOADEST model has been used with intermittent water quality data in various

sampling frequencies such as monthly (Tamm et al. 2008; Qian et al. 2007; Das et al.

2011; Brigham et al. 2009), biweekly (Schuster et al. 2011; Dornblaser and Striegl, 2009),

weekly (Duan et al. 2012), or a few day interval (Horowitz, 2003; Raymond and Oh,

2007; Heimann et al. 2011). Evaluation of the predictive ability of pollutant load

regression models with various sampling strategies is needed to identify the correlation

5

5

between LOADEST model behavior and water quality datasets. These results can be

interpreted and reflected in the Web-based LDC Tool development.

The specific objectives of the study were to:

1. Evaluate water quality sampling frequency strategies and 10 regression models to

estimate annual pollutant loads. The regression models to be evaluated in this study

were nine LOADEST regression models (numbered 1 to 9) and one regression model

using a genetic algorithm.

2. Identify the correlation between LOADEST model behavior and water quality datasets

for various proportions of water quality data from storm events.

1.2.2 Proposed Objectives for Enhancement and Development of TMDL Models

The Web-based LDC Tool provides various benefits in development of FDC and LDC,

deriving streamflow data from the USGS data server, integrating with LOADEST, and

allowing additional analysis (e.g. seasonal variations and surface flow separation).

However, enhancement the Web-based LDC Tool was needed to improve ease-of-use

and to allow further analysis such as BMP suggestion or simulation. Benefits in the use of

STEPL, the other model used in this study, are that most inputs for the model are

provided by a model database and it allows simulating impacts of various BMPs.

However, there was a need for the model to be modified to improve the annual load

estimates, and to be developed into a web-based model to interact with the Web-based

LDC Tool. The specific objectives of the study were to:

6

6

1. Improve the web-based LDC Tool to allow collection of USGS and STORET/WQX

WQ data via web access.

2. Develop a module to suggest BMP scenarios to reduce annual pollutant load to meet

desired WQ goals.

3. Develop a web-based model to estimate annual WQ pollutant load and to simulate

BMPs to meet the required reduction of annual pollutant load.

1.3 Dissertation Organization

The dissertation contains six chapters. This chapter is an introduction and provides

general background on pollutant regression models, LDC, and STEPL for research needs

and objectives. Chapters 2-5 discuss in detail methods and results related to the proposed

objectives in the previous section. These chapters have been prepared following a style to

submit to journals, and thus each chapter contains an introduction, methods, results, and

conclusion. Chapter 2 evaluates the predictive ability of pollutant load regression models

with various sampling strategies for annual pollutant load estimations. Chapter 3 suggests

an approach to prepare water quality datasets for annual pollutant load estimation using

LOADEST. Chapter 4 covers the enhancement of Web-based LDC Tool to allow

automated collection of water quality data via web access and to identify BMPs to reduce

pollutant load to meet the required pollutant load reduction. Chapter 5 examines the

annual direct runoff approach in EPA STEPL and covers the development of a web-based

model capable of simulating annual pollutant load reductions for BMPs. Chapter 6 is an

overall conclusion and summary that provides an overview of the research and

recommendations for future studies.

7

7

1.4 References

Babbar-Sebens, M., Karthikeyan, R., 2009. Consideration of sample size for estimating

contaminant load reductions using load duration curves. Journal of Hydrology 372,

118-123.

Brigham, M. E., Wentz, D. A., Aiken, G. R., and D. P. Krabbenhoft. 2009. Mercury

cycling in stream ecosystems. 1. Water column chemistry and transport.

Environmental Science and Technology. 43: 2720-2725.

Chen, D., Lu, J., Wang, H., Shen, Y., Gong, D., 2011. Combined inverse modeling

approach and load duration curve method for variable nitrogen total maximum daily

load development in an agricultural watershed. Environmental Science Pollution

Research 18, 1405-1413.

Das, S. K., Ng, A. W. M., and B. J. C. Perera. 2011. Assessment of nutrient and sediment

loads in the Yarra River catchment. 19th International Congress on Modeling and

Simulation, Perth, Australia. 3490-3496.

Dornblaser, M. M. and R. G. Striegl. 2007. Nutrient (N,P) loads and yields at multiple

scales and subbasin types in the Yukon River basin, Alaska. Journal of Geophysical

Research. 112:G04S57.

Duan, S., Kaushal, S. S., Groffman, P. M., Band, L. E., and K. T. Belt. 2012. Phosphorus

export across an urban to rural gradient in the Chesapeake Bay watershed. Journal of

Geophysical Research. 117: G01025.

Heimann, D. C., Sprague, L. A., and D. W. Blevins. 2011. Trends in suspended-sediment

loads and concentrations in the Mississippi River Basin, 1950-2009. U. S. Geological

Survey Scientific Investigations Report 2011-5200.

8

8

Horowitz, A. J. 2003. An evaluation of sediment rating curves for estimating suspended

sediment concentrations for subsequent flux calculations. Hydrological Processes. 17:

3387-3409.

Kim. J., Engel, B. A., Park, Y. S., Theller, L., Chaubey, I., Kong, D. S., and K. J. Lim.

2012. Development of Web-based Load Duration Curve System for analysis of total

maximum daily load and water quality characteristics in a waterbody. Journal of

Environmental Management. 97: 46-55.

Qian, Y., Migliaccio, K. W., Wan, Y., Li, Y. 2007. Trend analysis of nutrient

concentrations and loads in selected canals of the Southern Indian River lagoon,

Florida. Water Air Soil Pollut. 186: 195-208.

Raymond, P. A., McClelland, J. W., Holmes, R, M., Zhulidov, A. V., Mull, K., Peterson,

B. J., Striegl, R. G., Aiken, G. R., and T. Y. Gurtovaya. 2007. Flux and age of

dissolved organic carbon exported to the Arctic Ocean: A carbon isotopic study of the

five largest arctic rivers. Global Biogechemical Cycles. 21: GB4011.

Schuster, P. F., Striegl, R. G., Aiken, G. R., Krabbenhoft, D. P., Dewild, J. F., Butler, K.,

Kanmark, B., and M. Dornblaser. 2011. Mercury export from the Yukon River basin

and potential response to a changing climate. Environmental Science and Technology.

45: 9262-9267.

Shen, J., Zhao, Y., 2010. Combined Bayesian statistics and load duration curve method

for bacteria nonpoint source loading estimation. Water Research 44, 77-84.

Tamm, T., Noges, T., Jarvet, A., and F. Bouraoui. 2008. Contributions of DOC from

surface and groundflow into Lake Vortsjarv (Estonia). Hydrobiologia. 599: 213-220.

9

9

USEPA, 2008. Handbook for developing watershed TMDLs. U. S. Environmental

Protection Agency. Washington, DC 20460.

10

10

CHAPTER 2. ANALYSIS FOR REGRESSION MODEL BEHAVIOR BY SAMPLING

STRATEGY FOR ANNUAL LOAD ESTIMATION

2.1 Abstract

Water quality data are typically collected less frequently than streamflow data due to

collection and analysis costs, and therefore water quality data may need to be estimated

for additional days. Regression models are often used as a basis for interpolating water

quality data associated with streamflow data, are extensively used and require relatively

small amounts of data. However, there is a need to evaluate how well regression models

represent pollutant loads from intermittent water quality datasets. Both the specific

regression model used and the water quality data frequency are important factors in

pollutant load estimation. In this study, nine regression models from LOADEST and one

regression model from LOADIN were evaluated with subsampled water quality datasets

from daily measured water quality datasets of nitrogen, phosphorus, and sediment. Each

water quality parameter had different correlations to streamflow, and the subsampled

water quality datasets had various proportions of storm samples. The behaviors of

regression models differed by not only water quality parameter but also by proportion of

storm samples. The regression models from LOADEST provided accurate and precise

annual sediment and phosphorus load estimates when the water quality data included 20-

40% storm samples. LOADIN provided more accurate and precise annual nitrogen load

estimates than LOADEST. In addition, the results indicate that the availability of water

11

11

quality data from storm events was crucial in annual pollutant load estimation using

pollutant regression models, and that accuracy increased if water quality data

extrapolation was avoided.

2.2 Introduction

The Total Maximum Daily Load (TMDL) is a standard measure used to regulate water

quality in watersheds in the USA. Section 303(d) of the Clean Water Act (U. S. Senate,

2002) indicates the states and other defined authorities having contaminated water need

to establish priority rankings and to develop TMDL plans to meet identified water quality

standards. A TMDL plan indicates the allowable total load of a pollutant in a watershed

without violation of water quality standards. TMDL planning involves the identification

of pollutant sources, water body monitoring, and an effort to mitigate pollutant sources so

that loads do not exceed the standard (Babbar-Sebens and Karthikeyan, 2009; Henjum,

2010). An appropriate sampling strategy is important for water quality monitoring that is

the basis of TMDL planning.

Various sampling strategies to collect water quality data are commonly employed,

including time-based, streamflow-based, and time and streamflow composited (King and

Harmel, 2003). Burn (1990) described three sampling strategies, which are fixed

frequency sampling, stratified fixed frequency sampling, and real-time updated stratified

sampling. Fixed frequency sampling is time-based, and regular sampling (Kronvang and

Bruhn, 1996) represents sampling being conducted with equal time intervals. Stratified

sampling is conducted based on streamflow proportion (i.e. high or low streamflow).

12

12

Water quality samples collected to estimate annual pollutant loads are typically collected

less frequently than streamflow due to the cost of collection and analysis. If the water

quality data are insufficient to estimate annual loads, the data needs to be estimated using

an appropriate method for days on which water quality samples are not available.

Regression model (rating curve) methods for estimating water quality parameters on days

for which samples are unavailable are based on a relationship between concentration (or

load) and streamflow. The methods began with simple linear forms of concentration and

streamflow, but have been modified based on logarithmic transformations, seasonal

variability, etc. (Cohn et al., 1992; Gilroy et al., 1990; Johnson, 1979; Robertson and

Richards, 2000). Regression models have come to be extensively used methods requiring

relatively small amounts of data, and are often applied with small datasets collected over

several years (Robertson and Richards, 2000). Even though the methods may cause large

errors in some cases, the methods provide unbiased load estimation with relatively low

variance (Cohn et al., 1992). Regression models with water quality data collected

biweekly or monthly with storm chasing often provided acceptable load estimates

(Horowitz, 2003; Robertson and Roerish, 1999; Robertson, 2003), while in other cases,

regression models often provided inaccurate and imprecise load estimates when the water

quality data are not normally distributed (Henjum et al., 2010) or when the number of

water quality data are small (Horowitz et al., 2001; Johnes, 2007). In addition, regression

models showed different model behaviors for different sites to which they are applied

(Phillips et al., 1999).

13

13

LOAD ESTimator (LOADEST, Runkel et al., 2004)and Web-based Load Interpolation

Tool (LOADIN, Park et al., 2012) are used in the Web-based Load Duration Curve Tool

(Web-based LDC Tool, Kim et al., 2012) which helps to develop Total Maximum Daily

Loads (TMDLs) and to determine required pollutant load reductions. Obtaining water

quality data is difficult and costly, and therefore interpolation of water quality data

associated with streamflow data may be required. The Web-based LDC Tool provides

LOADEST and LOADIN runs as options to generate daily pollutant loads from

intermittent water quality data, so that it is possible to calculate the required reduction of

pollutant loads to meet water quality standards. Here, the required reduction of pollutant

loads for five streamflow regimes (high flow, moist conditions, mid-range flows, dry

conditions, and low flow) in a load duration curve (LDC) are calculated by the sum of the

target loads and sum of the current loads. Therefore, there was a need to evaluate how

well the regression models represent annual pollutant loads from intermittent water

quality datasets.

Thus, the objectives of the study were: 1) to evaluate the predictive ability of pollutant

load regression models for annual pollutant load (i.e. sum of pollutant loads) estimates, 2)

to investigate the terms affecting the regression models’ behaviors; and 3) to explore

various sampling strategies with the regression models.

2.3 Methodology

LOADEST, one of the water quality data regression model methods, has been widely

used to interpolate or estimate daily pollutant loads for various water quality parameters

14

14

(Duan et al., 2013; Foster and Kenney, 2010; Spencer et al. 2009; Stenback et al., 2011;

Raymond et al. 2007; Eshleman et al. 2008). LOADEST estimates constituent loads in

streams given a time series of streamflow, additional data variables, and constituent

concentrations, and requires a minimum of 12 water quality samples (Dornblaser and

Gtriegl, 2007; Runkel et al., 2004). The model calibrates regression model coefficients

using three methods: Adjusted Maximum Likelihood Estimation (AMLE), Maximum

Likelihood Estimation (MLE), and Least Absolute Deviation (LAD) (Runkel et al. 2004).

The MLE method assumes the data have a linear interaction and follow a normal

distribution. The AMLE method uses a similar approach, but it is a nearly-unbiased

estimator for the mean (Cohn et al. 1992). Both methods are based on the assumption that

the residuals of models follow the normal distribution, and AMLE allows use of the

water quality datasets containing censored data. The alternative method, if it is assumed

that the residuals do not follow a normal distribution, is LAD that assumes the errors are

independently and identically distributed random variables (Powell, 1984). LOADEST

has 11 pollutant regression models (Equations 2.1-2.11) to estimate daily pollutant loads,

two of these (model numbers 10 and 11) are for specific periods defined by the user

(Runkel et al., 2004) and thus were not appropriate for use in this study. One of the nine

regression models (model numbers 1 to 9) could be selected automatically by setting the

model number to 0 or manually by setting the model number to one of the model

numbers from 1 to 9.

Eq. 2.1

Eq. 2.2

15

15

Eq. 2.3

Eq. 2.4

Eq. 2.5

Eq. 2.6

Eq. 2.7

Eq. 2.8

Eq. 2.9

Eq. 2.10

Eq. 2.11

Where, a0-6 are coefficients, Q is streamflow, dtime is decimal time, and per is the period

defined by user.

LOADIN estimates pollutant loads using streamflow and intermittent water quality data,

similar to LOADEST. LOADIN has a regression equation (Equation 2.12) composed of

three terms; the first term is for the pollutant loads correlated to streamflow, and the other

terms contain decimal time for the pollutant loads varying with time (e.g. season). The

coefficients of the regression model are calibrated by a genetic algorithm.

[ ] [ ]

Eq. 2.12

Where, Li is estimated load at time step i, Qi is streamflow at time step i, dectime is

decimal time at time step i, and a1-8 are coefficients.

16

16

In this study, LOADEST and LOADIN were evaluated with subsampled water quality

datasets from measured daily water quality data.

2.3.1 Water Quality Data

A measured ‘true load’ was required to evaluate how well pollutant load regression

models perform with intermittent water quality data and to examine what sampling

strategies are appropriate to estimate annual pollutant load. Daily water quality data for

sediment, phosphorus, and nitrogen were collected from the USGS Water-Quality Data

for the Nation (http://waterdata.usgs.gov/nwis/qw) and the National Center for Water

Quality Research of Heidelberg University

(http://www.heidelberg.edu/academiclife/distinctive/ncwqr) (Fig. 2.1). The daily water

quality data collected had at least a 1 year and a maximum 37 year period, and most of

the stations had 1-10 year periods for sediment and 1-5 years for phosphorus and nitrogen

(Table 2.1).

17

17

Figure 2.1 Locations of Water Quality Data Stations

Table 2.1 Number of Stations and Total Years

1~5 years 6~10 years 11~20 years 21 ~ 37 years Total

Sediment 97

(187)

104

(972)

5

(79)

5

(159)

211

(1397)

Phosphorus 52

(101)

7

(55)

5

(79)

5

(159)

69

(394)

Nitrogen 10

(21)

1

(7)

5

(79)

5

(159)

21

(266)

18

18

The daily water quality parameters used in the study showed different relationships with

streamflow, and correlation coefficients (equation 2.13) were calculated to determine the

relationships between streamflow and concentration data (Fig. 2.2). Compared with

phosphorus and nitrogen concentrations, sediment concentrations were more related or

proportional to streamflow, while nitrogen concentrations showed a relatively poor

relationship with streamflow data.

∑

√∑ √∑ Eq. 2.13

Where Ci is concentration at time step i, is mean of concentration data, Fi is streamflow

at time step i, and is mean of streamflow data.

19

19

Figure 2.2 Correlation Coefficients for Concentrations of Water Quality Parameters with

Streamflow

2.3.2 Subsampling Methods and Regression Models

To examine sampling strategies to estimate annual loads using regression models, all

daily water quality data collected were artificially degraded with six sampling strategies.

The first three sampling strategies were fixed interval sampling frequencies (weekly,

biweekly, and monthly); the other three sampling strategies were fixed interval with fixed

sampling frequencies supplemented with stratified sampling (fixed interval with storm

event sampling strategies). Storm samples in the study were defined as water quality data

collected at the peak flow of each hydrograph in the high-flow regime, which is the upper

10 percent of streamflow for a given analysis period (USEPA, 2007). Water quality data

20

20

were not designated to be storm samples for the peaks of hydrographs in Moist-Condition

(10-40%), Mid-Range Flow (40-60%), Dry-Conditions (60-80%), and Low-Flow (90-

100%).

All water quality data were used in subsampling processes for sampling strategies with

different beginning dates. For instance, the first water quality dataset for the weekly

interval sampling strategy was comprised of the water quality data from every Monday,

while the second dataset had the water quality data from every Tuesday. Therefore, the

water quality dataset for weekly fixed interval sampling frequencies had seven water

quality datasets. Similarly to the weekly interval sampling strategy, the biweekly interval

sampling strategy was subsampled based on days of the week with fourteen-day intervals.

The first water quality dataset for the sampling strategy was comprised of the water

quality data from every alternate Monday, and the first water quality data of the dataset

was from the first Monday of the entire water quality dataset. The eighth water quality

dataset for the sampling strategy was also from every alternate Monday; however, the

first water quality data of dataset was from the second Monday of the entire water quality

dataset. Thus, fourteen water quality datasets were subsampled for fixed interval

biweekly sampling frequencies. For the monthly sampling strategies, the water quality

data on 1st date of each month were subsampled for the first dataset, and the water quality

data on 2nd

date of each month were subsampled for the second dataset. Therefore, twenty

eight water quality datasets were subsampled for fixed interval monthly sampling

frequencies. The water quality data from storm events were added to the fixed sampling

frequencies supplemented with stratified sampling strategies. Ninety eight water quality

21

21

datasets for each sampling station were created by subsampling the measured water

quality dataset.

Figure 2.3 Example of Subsampled Dataset

Ten regression models were evaluated with the subsampled water quality datasets. Nine

regression models were from LOADEST (model numbers 1 to 9), and one regression

model was from LOADIN. In addition, LOADEST was run with the subsampled water

quality datasets by setting model number to ‘0’ to explore accuracy and precision of

annual pollutant load estimates for the “best” regression model automatically selected by

LOADEST. Measured daily water quality datasets were used to calculate the true

measured loads by direct numeric integration (Equation 2.14).

∑

Eq. 2.14

Where, i is day, Qi is streamflow on day i, Ci is concentration on day i, and Num.Yr is the

number of years.

22

22

To evaluate estimated annual loads from regression models, a ratio was calculated for

individual annual pollutant load estimates. The ratio is the division of the estimated

annual pollutant load by the measured annual pollutant load (Equation 2.15). The ratio is

1.0 when the estimated annual load is the same as the measured annual load; it is smaller

than 1.0 when a regression model underestimated load; and it is greater than 1.0 when a

regression model overestimated load.

Eq. 2.15

Numerous pollutant load estimates were performed. LOADEST was executed ten times

for each subsampled water quality dataset for model numbers 1 to 9 to evaluate the nine

regression models and for model number 0 to investigate how well LOADEST selects a

regression model. The number of pollutant load estimates for one measured water quality

dataset was 1,078 (i.e. (7 (weekly) + 14 (biweekly) + 28 (monthly)) × 2 (fixed interval

only or fixed interval with stratified) × 11 (LOADIN + 10 of LOADEST model number)).

Therefore, the total number of load estimates was 227,458 for sediment, 74,382 for

phosphorus, and 22,638 for nitrogen.

Both accuracy and precision are important in pollutant load estimation, because accuracy

represents the degree of systematic error, and precision indicates the degree of dispersion

or range (Phillips et al., 1999; Preston et al., 1989). Pollutant load estimates need to be

accurate (low bias) and to be precise (low variance). Accuracy was evaluated with the

mean of ratios, and precision was evaluated with the 95% confidence interval of ratios.

23

23

2.4 Results and Discussions

2.4.1 Ratio Comparison of Sampling Strategies and Regression Models

The study computed 324,478 pollutant load estimates with regression models (Appendix

B), and three distinct features were observed. The first feature was that pollutant load

estimates for LOADEST model number ‘0’, which is supposed to estimate pollutant

loads using the best regression model for the water quality dataset (Runkel et al., 2004),

were not necessarily more precise or accurate than pollutant load estimation for manual

model selection. For instance, the sediment load estimates when selecting LOADEST

model number ‘0’ were less accurate and less precise than the sediment load estimates for

LOADEST model number 1 (Fig. 2.4; comparison of ‘Sed-Wk-Fx-0’ and ‘Sed-Wk-Fx-

1’). Not surprisingly, often the pollutant load estimates for automatic model selection

provided more accurate and precise load estimates compared to the pollutant load

estimates for regression models selected manually. The LOADEST model number ‘0’

selects one of nine regression models based on the Akaike Information Criterion (AIC)

computed for each of the models based on regression model parameters and residuals

from AMLE methods (Runkel et al., 2004), and therefore the possibility for inaccurate or

imprecise estimates exists.

The second feature observed in the estimated pollutant load results was that including

water quality data from storm events improved the accuracy and precision in pollutant

load estimates. This feature was readily found throughout all three water quality

parameters. For instance, the estimates ‘Sed-Wk-St-1’ displayed higher accuracy (i.e.

ratio mean close to 1.0) and precision (i.e. narrow 95% CI) than the estimates ‘Sed-Wk-

24

24

Fx-1’. In addition, this feature indicates that extrapolation needs to be avoided, since the

fixed sampling frequencies supplemented with stratified sampling strategies include the

water quality data for maximum streamflow.

The last feature observed in the estimated pollutant load results was that high sampling

frequency did not necessarily improve the accuracy and precision of estimated loads.

Monthly subsampled water quality datasets had less water quality data than weekly

subsampled water quality datasets, but nevertheless, monthly subsampled water quality

datasets sometimes led to more accurate and precise pollutant load estimates than weekly

subsampled water quality datasets. For instance, the load estimate ‘Phs-Bi-St-2’ displayed

higher accuracy and precision than the estimate ‘Phs-Wk-St-2’ which displayed lower

accuracy and precision than the estimate ‘Phs-Mn-St-2’. The accuracy of ‘Sed-Wk-St-1’

was higher than that of ‘Sed-Mn-St-1’, but the precision of ‘Sed-Wk-St-1’ was lower than

that of ‘Sed-Mn-St-1’. Therefore, a more extensive water quality dataset did not

necessarily lead to accurate and/or precise pollutant load estimates.

25

Figure 2.4 Examples of Pollutant Load Estimates (Sed: sediment, Phs: phosphorous, Mn: monthly, Bi: biweekly, Wk: weekly, Fx:

fixed interval sampling, St: fixed interval sampling with storm events, 0 and 1: LOADEST model number 0 and 1)

26

26

2.4.2 Water Quality Data from High Flow

Through the results in the previous section, it was concluded that: (1) an extensive

sampling strategy does not necessarily lead to accurate and precise annual pollutant load

estimates using pollutant regression models, and (2) the use of water quality data from

storm events improves pollutant load estimates. Therefore, an analysis to explore the

influence of the portion of water quality data from storm events on load estimation was

performed.

The results showed that the water quality data for high flow conditions play an important

role in annual pollutant load estimation. Therefore, the results were categorized into

seven groups based on the percentage of calibration data from the high flow regimes

(PCH) (Fig. 2.5). Regression models displayed different behaviors by water quality

parameter and PCH.

In sediment load estimation, the regression models displayed two different ratio trends

against the PCH (Fig. 2.5(a) and (b)). In sediment load estimation, the four LOADEST

regression models numbered 1, 3, 4, and 7 (first group of regression models)

underestimated loads if the PCH was smaller than 20% and overestimated loads when the

PCH was greater than 20%. Use of water quality data with a PCH range of 20-30%

provided the closest estimated sediment loads to measured loads. The LOADEST

regression models numbered 2, 5, 6, 8, and 9 (second group of regression models)

typically overestimated sediment loads (Fig. 2.5(a)) and showed the narrowest 95% CI

when PCHs were 30-40%.

27

27

Similar to sediment load estimation, two ratio trends were identified in annual

phosphorus load estimation. The first group of regression models (LOADEST regression

models numbered 1, 3, 4, and 7) showed similar behavior as for annual sediment load

estimation. The second group (LOADEST regression models numbered 2, 5, 6, 8, and 9)

typically overestimated loads, but the ratios of annual phosphorus loads were close to 1.0

when PCH was greater than 20% (Fig. 2.5(c)). Moreover, the 95% CI for all LOADEST

regression model results were narrow when PCHs were greater than 20% (Fig. 2.5(d)).

LOADIN was more sensitive to storm samples than LOADEST and displayed low

precision in annual sediment and phosphorous load estimates (Fig. 2.5(a), (b), (c), and

(d)). The second group of regression models (model numbers 2, 5, 6, 8, and 9) showed

greater mean ratios than the first group of regression models in annual sediment and

phosphorous estimation. The difference between the first group and the second group of

regression models is the inclusion of the term ‘squared logarithm streamflow’ in the

second group. The forms of regression models in LOADEST were determined based on

various functions of streamflow and time (Cohn et al, 1992; Crawford, 1991; Helsel and

Hirsch, 2002; Runkel et al., 2004). LOADEST model number 1 (the simplest model)

consists of streamflow and two coefficients, and LOADEST model 9 (the most

sophisticated model) consists of six variables (i.e. logarithm streamflow, squared

logarithm streamflow, time, etc.) and seven coefficients. The regression model composed

of streamflow and two coefficients demonstrated reasonable sediment load estimates

(Crawford, 1991), and the regression model composed of six variables and seven

coefficients demonstrated reasonable phosphorus load estimates (Cohn et al, 1992).

However, the second group of regression models with ‘squared logarithm streamflow’

28

28

(model numbers 2, 5, 6, 8, and 9) tend to overestimate loads and showed low accuracy

and precision. Therefore, it was concluded that equations with the term ‘squared

logarithm streamflow’ should be carefully evaluated prior to using them for estimating

annual pollutant loads.

Unlike for annual sediment and phosphorus load estimates, the LOADEST regression

models mostly overestimated nitrogen loads, and the annual nitrogen load estimates with

PCH of 50-60% were the closest to the measured loads (Fig. 2.5(e)). The 95% CI for all

LOADEST regression models were wider than LOADIN, indicating that LOADIN has

higher precision in annual nitrogen load estimation than LOADEST. Inclusion of more

high flow water quality data from storm events led to more precise annual nitrogen load

estimates using LOADEST, because the 95% CI narrowed with PCH increase (Fig. 2.5

(f)). While the regression models in LOADEST overestimated and displayed low

precision in annual nitrogen load estimation, the LOADIN estimates were notably close

to the measured annual nitrogen loads, since the means of ratios were typically close to

1.0 (Fig. 2.5(e)) and the 95% CIs were typically narrower than those for LOADEST (Fig.

2.5(f)).

The nitrogen data collected in the study displayed seasonal variance or low relationships

to streamflow (Figs. 2.2 and 2.6). Similar to LOADEST, LOADIN identifies the

relationship between streamflow and pollutant loads. The regression model in LOADIN

is composed of two functions. One is a function of streamflow and model coefficients to

represent pollutant loads for streamflow variation, and the other is a function of

29

29

streamflow, decimal time, and model coefficients to represent pollutant loads based on

time (or seasonal) variation. Regression models calibrate the coefficients by pollutant

loads not by pollutant concentrations. Therefore, the term for the pollutant loads for time

variability needs to be the multiplication of ‘decimal time’, ‘streamflow’, and ‘coefficient

to calibrate’, not the multiplication of ‘decimal time’ and ‘coefficient to calibrate’.

LOADIN has a term in the form of multiplication of ‘decimal time’, ‘streamflow’, and

‘coefficient to calibrate’ for seasonality; however, LOADEST does not.

The regression models used in the study identify the correlation between stream

volumetric streamflow rate (i.e. cubic meters per second) and water quality parameter

mass (i.e. load, the multiplication of pollutant concentration (milligram per liter) and

streamflow). Thus, if streamflow and concentration have a proportional relationship, the

pollutant load for high concentration and high streamflow is much greater than the

pollutant load for low concentration and low streamflow. In other words, the relationship

between pollutant loads and streamflow would be closer to an exponential function; this

assumption corresponds to LOADEST regression models. However, the nitrogen data

typically did not have a proportional relationship to streamflow, thus, LOADEST led to

less accurate and less precise annual nitrogen load estimates.

30

30

(a) Mean of Ratio for Sediment

31

31

(b) Width of 95% Confidence Intervals of Ratio for Sediment

32

32

(c) Mean of Ratio for Phosphorus

33

33

(d) Width of 95% Confidence Intervals of Ratio for Phosphorus

34

34

(e) Mean of Ratio for Nitrogen

35

35

(f) Width of 95% Confidence Intervals of Ratio for Nitrogen

Figure 2.5 Mean and Width of 95% Confidence Intervals by Percentage of Calibration

Data from High Flow

36

36

Figure 2.6 Seasonal Variation in Blanchard River near Findlay, Ohio Nitrogen Data from

the National Center for Water Quality Research of Heidelberg University

2.4.3 Improvement of Annual Load Estimation

It was concluded that: (1) the use of water quality data from storm events improves

annual pollutant load estimates, and (2) the PCH plays an important role in annual

pollutant load estimates. Therefore, an analysis to investigate the influence of water

quality data from high flow was performed with some of the poorest pollutant load

estimates for each water quality parameter.

Three water quality datasets for each water quality parameter were selected from the

‘poorest’ pollutant load estimates. The ‘Est. 0’s in Table 2.2 are the pollutant load

37

37

estimates using the intact water quality dataset from the subsampling process. The

sampling strategies for datasets 1 and 3 for sediment were the monthly fixed interval

sampling strategy, and that for dataset 2 was the biweekly fixed interval sampling

strategy. In the case of phosphorus load estimation, the sampling strategies were

biweekly (dataset 1), weekly (dataset 2), and monthly (dataset 3) fixed interval sampling

strategies. All datasets for nitrogen load estimation were for the monthly fixed interval

sampling strategy. The regression model of the datasets were model numbers 2, 5, 6, 8,

and 9 (second group of regression models) in LOADEST. The datasets had different

frequencies; however, they had concomitant features in that they had fixed interval

sampling strategies, the models were the second group mentioned in the previous section,

and the PCH was smaller than 10%. High flow water quality data were added from ‘Est.

1’ to ‘Est. 3’; in other words, PCH data were intentionally increased to improve the

pollutant load estimates. The water quality data for maximum streamflow were added in

‘Est. 1’ to investigate model behaviors by avoiding extrapolation in the estimates.

The second group of regression models overestimated sediment load more than ten times

and overestimated phosphorus and nitrogen loads several hundred times with the nine

subsampled water quality datasets (‘Est. 0’ in Table 2.2). However, inclusion of water

quality data for maximum streamflow (i.e. avoiding extrapolation) significantly improved

all pollutant estimates (comparisons of ‘Est. 0’s and ‘Est. 1’s in each dataset). Pollutant

load estimates were improved from ‘Est. 1’ to ‘Est. 3’ with PCH increases; however, use

of a large proportion of water quality data from storm events led to overestimated

pollutant loads (e.g. comparison of ‘Est. 2’ and ‘Est. 3’ of dataset 1 for phosphorus).

38

38

Compared to the first group of regression models, the second group of regression models

was inaccurate, imprecise, and overestimated sediment and phosphorus loads. However,

the nine pollutant load estimates by the regression models were improved by adding

water quality data for maximum streamflow. This suggests that extrapolation might lead

to inaccurate and imprecise pollutant load estimates by the second group of regression

models.

Table 2.2 Improvement of Annual Load Estimates by Increasing PCH Data

Water Quality

Parameter Dataset 1 Dataset 2 Dataset 3

Sediment

(2, 9, 5)

Est.* PCH Ratio PCH Ratio PCH Ratio

0 0.0 18.0 7.7 17.3 8.3 16.4

1 14.3 0.6 20 0.7 15.4 1.0

2 53.9 1.0 33.3 0.9 20.0 1.0

3 65.7 1.0 45.5 1.0

Phosphorus

(2, 6, 8)


0 6.7 291.1 7.7 658.9 5.8 422.3

1 10.2 0.7 12.7 0.9 13.1 0.8

2 21.1 1.0 18.6 1.1 27.1 1.0

3 27.6 1.3 37.2 1.0

Nitrogen

(6, 8, 9)


0 8.3 571.9 0.0 189.8 0.0 169.4

1 15.4 1.1 14.3 0.7 14.3 0.8

2 35.3 1.0 40 1.0 45.5 1.0

3 47.6 1.0 52 1.1 53.9 1.1

*Est.: Pollutant Load Estimation Number, the numbers in () are the regression models for datasets 1, 2, and

3 respectively.

2.5 Conclusions

Water quality samples are typically collected less frequently than streamflow due to the

cost of collection and analysis. Water quality data need to be interpolated using an

39

39

appropriate method, if the samples are insufficient. Regression models are based on a

relationship between concentration (or load) and streamflow; they are applicable to

interpolate or generate water quality data associated with streamflow data. The Web-

based LDC Tool employs LOADEST to develop TMDLs and to calculate the required

reduction of pollutant loads against standard pollutant loads. The regression models in

LOADEST needed to be evaluated with various water quality datasets. Therefore, the

regression models and one regression model from LOADIN were evaluated with six

sampling strategies for three water quality parameters. The water quality data collected in

the study were subsampled to investigate the influence of water quality sample strategy

(i.e. weekly, biweekly, and monthly fixed interval sampling frequencies) and inclusion of

storm events (i.e. fixed interval with fixed sampling frequencies supplemented with

stratified sampling).

It was concluded that 1) use of extensive water quality data does not necessarily lead to

precise and accurate estimates of annual pollutant loads by regression models, 2) water

quality data to estimate annual pollutant loads needs to consist of an appropriate

proportion of water quality data from storm events, 3) extrapolation needs to be avoided

in use of pollutant concentrations within regression models for annual pollutant load

estimates, and 4) a regression model needs to be employed based on the behaviors of

water quality parameters.

Regression models were evaluated with large datasets from six sampling strategies, and

several regression models demonstrated better accuracy and precision than the others. In

40

40

addition, the appropriate proportion of water quality data from storm events for

accurately estimating pollutant loads using these regression models was identified.

However, further study is required to investigate the correlation between regression

model behavior and calibration datasets within various proportions of water quality data

from storm events, using the regression model which provided the most accurate and

precise pollutant load estimates.

41

41

2.6 References



118-123.

Burn, D. H., 1990. Real-time sampling strategies for estimating nutrient loadings. Journal

of Water Resources Planning and Management 116(6), 727-741.

Cohn, T. A., Caulder, D. L., Gilroy, E. J., Zynjuk, L. D., Summers, R. M., 1992. The

validity of a simple statistical model for estimating fluvial constituent loads: an

empirical study involving nutrient loads entering Chesapeake Bay. Water Resources

Research 28(9), 2353-2463.

Crawford, C. G., 1991. Estimation of suspended-sediment rating curve and mean

suspended-sediment loads. Journal of Hydrology 129, 331-348.

Dornblaser, M. M., Striegl, R. G., 2007. Nutrient (N,P) loads and yields at multiple scales

and subbasin types in the Yukon River basin, Alaska. Journal of Geophysical

Research 112, G04S57.

Duan, W., Takara, K., He, B., Luo, P., Nover, D., Yamashiki, Y., 2013. Spatial and

temporal trends in estimates of nutrients and suspended sediment loads in the Ishikari

River, Japan, 1985 to 2010. Science of the Total Environment 461-462, 499-508.

Eshleman, K. N., Kline, K. M., Morgan, R. P., Castro, N. M., Legley, T. L., 2008.

Contemporary trends in the acid-base status of two acid-sensitive streams in Western

Maryland. Environmental Science and Technology 42, 56-61.

42

42

Foster, K., Kenney, T. A., 2010. Dissolved-solids load in Henrys Fork upstream from the

Confluence with Antelope Wash, Wyoming, water year 1970-2009. U. S. Geological

Survey Scientific Investigations Report 2010-5048, Reston, Virginia.

Gilroy, E. J. Hirsch, R. M., Cohn, T. A. 1990. Mean square error of regression-based

constituent transport estimates. Water Resources Research 26(9), 2069-2077.

Haggard, B. E., Soerens, T. S., Green, W. R., Richards, R. P., 2003. Using regression

Methods to estimate stream phosphorus loads at the Illinois River, Arkansas. Applied

Engineering in Agriculture 19(2), 187-194.

Helsel, D. R., Hirsch, R. M., 2002. Statistical methods in water resources. U.S.

Geological Survey Techniques and Methods, Book 4, Chap. A3, Reston, Virginia.

Henjum, M. B., Hozalski, R. M., Wennen, C. R., Novak, P. J., Arnold, W. A., 2010. A

comparison of total maximum daily load (TMDL) calculations in urban streams using

near real-time and periodic sampling data. Journal of Environmental Monitoring 12,

234-241.

Horowitz, A. J., 2001. Estimating suspended sediment and trace element fluxes in large

river basins: methodological considerations as applied to the NASQAN programme.

Hydrological Processes 15, 1107-1132.

Horowitz, A. J., 2003. An evaluation of sediment rating curves for estimating suspended

sediment concentrations for subsequent flux calculations. Hydrological Processes 17,

3387-3409.

Johnes, P. J., 2007. Uncertainties in annual riverine phosphorus load estimation: impact

of load estimation methodology, sampling frequency, baseflow index and catchment

population density. Journal of Hydrology 332, 241-258.

43

43

Johnson, A. H. 1979. Estimating solute transport in streams from grab samples. Water

Resources Research 15(5), 1224-1228.

Kim. J., Engel, B. A., Park, Y. S., Theller, L., Chaubey, I., Kong, D. S., Lim, K. J., 2012.

Development of web-based load duration curve system for analysis of total maximum

daily load and water quality characteristics in a waterbody. Journal of Environmental

Management 97, 46-55.

King, K. W., Harmel, R. D., 2003. Considerations in selecting a water quality sampling

strategy. Transaction of the ASAE 46(1), 63-73.

Kronvang, B., Bruhn, A. J., 1996. Choice of sampling strategy and estimation method for

calculating nitrogen and phosphorus transport in small lowland streams. Hydrological

Processes 10, 1483-1501.

Park, Y. S., Chaubey, I., Lim, K. J., Engel, B. A., 2012. Development of a web-based

pollutant load interpolation tool using an optimization algorithm. American Society of

Agricultural and Biological Engineers Annual International Meeting. Paper Number:

121337988.

Phillips, J. M., Webb, B. W., Walling D. E., Leeks, L., 1999. Estimating the suspended

sediment loads of rivers in the LOIS study area using infrequent samples.


Powell, J. L., 1984. Least absolute deviations estimation for the censored regression

model. Journal of Econometrics 25, 303-325.

Preston, S. D., Bierman, V. J,,Silliman, S. E., 1989. An evaluation of methods for the

estimation of tributary mass loads. Water Resources Research. 25(6), 1379-1389.

44

44

Raymond, P. A., McClelland, J. W., Holmes, R, M., Zhulidov, A. V., Mull, K., Peterson,

B. J., Striegl, R. G., Aiken, G. R., Gurtovaya, T. Y., 2007. Flux and age of dissolved

organic carbon exported to the Arctic Ocean: A carbon isotopic study of the five

largest arctic rivers. Global Biogechemical Cycles 21, GB4011.

Robertson, D. M., 2003. Influence of different temporal sampling strategies on estimating

total phosphorus and suspended sediment concentration and transport in small streams.

Journal of the American Water Resources Association. 1281-1308.

Robertson, D. M., Roerish, E. E., 1999. Influence of various water quality sampling

strategies on load estimates for small streams. Water Resources Research 35(12),

3747-3759.

Robertson, D. M., Richards, K. D., 2000. Influence of different temporal sampling

strategies on estimating loads and maximum concentrations in small streams.

Proceedings of the National Water Quality Monitoring Council National Monitoring

Conference, Austin TX. 209-223.

Runkel, R. L., Crawford, C. G., Cohn, T. A., 2004. Load Estimator (LOADEST): A

Fortran program for estimating constituent loads in streams and rivers. U.S.

Geological Survey Techniques and Methods, Book 4, Chap. A5, Reston, Virginia.

Spencer, R. G. M., Aiken, G. R., Butler, K. D., Dornblaser, M. M., Striegl, R. G., Hernes,

P. J., 2009. Utilizing chromophoric dissolved organic matter measurements to derive

export and reactivity of dissolved organic carbon exported to the Arctic Ocean: A case

study of the Yukon River, Alaska. Geophysical Research Letters 36, L06401.

Stenback, G. A., Crumpton, W. G., Schilling, K. E., Helmers, M. J., 2011. Rating curve

estimation of nutrient loads in Iowa rivers. Journal of Hydrology 396, 158-169.

45

45

USEPA. 2007. An Approach for using load duration curves in the development of

TMDLs. Watershed Branch (4530T), Office of Wetlands, Ocean and Watersheds, U.S.

Environmental Protection Agency, 1200 Pennsylvania Ave., Northwest.

USGS. 2001. Effect of storm-sampling frequency on estimation of water-quality loads

and trends in two tributaries to Chesapeake Bay in Virginia. Water-Resources

Investigations Report 01-4136, Richmond, Virginia.

U. S. Senate. 2002. Federal water pollution control act. U. S. Senate, Dirksen Senate

Office Bldg. Washington, DC.

46

46

CHAPTER 3. IDENTIFYING THE CORRELATION BETWEEN WATER QUALITY

DATA AND LOADEST MODEL BEHAVIOR

3.1 Abstract

Water quality samples are typically collected less frequently than flow since water quality

sampling is costly. LOADEST is used to predict water quality concentration (or load) on

days when flow data are measured so that the water quality data are sufficient for annual

pollutant load estimation. However, there is a need to identify water quality data

requirements for accurate pollutant load estimation. Measured daily sediment data were

collected from 211 stream records. Estimated annual sediment loads from LOADEST and

subsampled data were compared to the measured annual sediment loads (true load). The

means of flow for calibration data were correlated to model behavior. A regression

equation was developed to compute the required mean of flow in calibration data to best

calibrate the LOADEST regression model coefficients. LOADEST runs were performed

to investigate the correlation between the mean flow in calibration data and model

behaviors as daily water quality data were subsampled. LOADEST calibration data used

sediment concentration data for flows suggested by the regression equation displayed

small errors in annual sediment load estimates. Moreover, use of more extensive water

quality data only occasionally led to the annual load estimates with small error.

47

47

3.2 Introduction

Water quality samples are collected less frequently than flow, because water quality

sampling requires significant labor, and the samples are costly to collect and analyze.

Therefore, water quality samples are collected by various sampling strategies which are

based on flow, time, or flow and time composited (Burn, 1990; King and Harmel, 2003).

Fixed frequency sampling strategies collect samples based on time and represent the

sampling being conducted with equal time intervals (e.g. 52, 26, and 12 per year in cases

of weekly, biweekly, and monthly, respectively), while stratified sampling strategies are

conducted based on flow proportion (e.g. 10 mm volumetric depth). Water quality data

samples may not be consecutive or associated with the range of flow data, and therefore a

straightforward annual load estimate (e.g. sum of daily loads) may not be possible. Thus,

water quality samples typically need to be estimated for days on which samples were not

collected (Robertson, 2003).

Regression models (rating curves) are used to predict water quality concentrations (or

loads) on days when flow data are measured. Regression models have been used

extensively for this purpose, and have been modified from simple linear forms to

logarithmic transformations and to consider seasonal variability (Cohn et al., 1992b;

Gilroy et al., 1990; Johnson, 1979; Robertson and Richards, 2000). Various ranges of

water quality data sampling frequencies were used to predict pollutant loads with

regression models (Coynel et al., 2004; Henjum et al., 2010; Horowitz, 2003; Johnes,

2007; Kronvang and Bruhn, 1996; Robertson, 2003; Robertson and Roerish, 1999) to

investigate what sampling frequencies are appropriate for regression model use.

48

48

Several approaches have been suggested to determine the number of water quality data

required for estimating pollutant loads (Equations 3. 1-3). The equations are to determine

the number of samples (n) to estimate the mean concentration within a margin of error (d)

and are composed of the Student’s t value and statistical factors. The equations require an

initial estimate of sample size (no) to compute the numbers of samples (n); iterations are

necessary until n corresponds to no. For instance, if it is required to compute the number

of samples for “0.05 (α) level of significance with a 90 percent chance (β=0.1) of

detecting a mean significantly different within 0.04 mg/l (d)”, assuming that an initial

estimate of 12 (no=12 thus v=11) would be required and that the sample standard

deviation (s’) would be the same as the population standard deviation (S = 0.05 mg/l),

then the number of samples would be 9 samples using equation 1 (n=8.12), 9 samples

using equation 2 (n=8.31), and 19 samples for equation 3 (n=18.39). However, the

equations are used to determine the number of samples required to determine the mean

concentration within a margin of error, not for load estimation regression model uses. In

addition, the equations require assumptions for degree of freedom (v), sample standard

deviation (s’), and population standard deviation (S, i.e. standard deviation of true water

quality concentrations).

(

)

Eq. 3. 1 (Cochran, 1963)

Eq. 3.2 (USEPA, 1997)

Eq. 3.3 (Zar, 1984)

49

49

Where, n is number of samples, no is the initial estimate, N is the total number of possible

observations, t is the student’s t value, s’ is sample standard deviation, S is population

standard deviation, d is absolute margin of error, α is a probability of committing a Type

I error, β is a probability of committing a Type II error, and v is degrees of freedom.

Robertson (2003) indicated that the most extensive sampling strategies did not always

lead to accurate load estimates. For example, Haggard et al. (2003) defined storm events

as times when the flow stage exceeds 1.5 m. The daily flow stage was less than 1.5 m

approximately 80 % of the time during their study period. USGS (2001) used the

program PART (USGS, 1998) to separate runoff and baseflow from streamflow, and then

the water quality data collected on days for which baseflow was less than 60 % of

streamflow were designated as water quality data from storm events. They found that use

of water quality samples from fifty percent of storm events led to accurate and precise

load estimates (Haggard et al., 2003; USGS, 2001). In the previous chapter, it was found

that approximately twenty to thirty percent of storm samples were required for accurate

and precise annual pollutant load estimates.

LOAD ESTimator (LOADEST, Runkel et al., 2004) has 11 regression models to estimate

constituent loads in streams and rivers using streamflow, constituent concentration, and

regression model coefficients. The model calibrates regression model coefficients using

three statistical methods, which are Adjusted Maximum Likelihood Estimation (AMLE),

Maximum Likelihood Estimation (MLE), and Least Absolute Deviation (LAD) (Runkel

et al. 2004). The AMLE and MLE methods are appropriate when the calibration model

50

50

error (or residuals) follows a normal distribution (Cohn et al. 1992b; Helsel and Hirsch,

2002), and the LAD assumes the errors are independently and identically distributed

random variables (Powell, 1983). The model has been used to estimate daily pollutant

loads for various water quality parameters with various sample sizes (or sampling

strategies) (Table 3.1).

Table 3.1 Various water quality data for LOADEST uses

Water Quality Parameter Sample Size Period Num. of

Sites Reference

Mercury 30-47 samples (Monthly

sampling) 2002-2006 8 Brigham et al.(2009)

Suspended sediment ±30 samples

(6-8 per year) 2001-2005 5

Dornblaser and

Striegl (2009)

Chromophoric dissolved

organic matter 39 samples 2004-2005 1 Spencer et al. (2009)

NOx-N, NH3-N, Total

Phosphorus

88-155 samples

(Monthly sampling) 1992-2006 18 Carey et al. (2011)

Total Nitrogen,

Total Phosphorus,

Total Suspended Solids

Monthly sampling 1970-2009 12 Das et al. (2011)

Total Nitrogen 54-152 samples 12-22 years 18

Oh and

Sankarsaubramanial

(2011)

Soluble reactive

phosphorus, Total

phosphorus

Weekly sampling 1998-2007 8 Duan et al. (2012)

Chapter 2 showed that water quality data should include an appropriate portion of water

quality samples from storm events rather than a fixed number of water quality samples.

In other words, an appropriate water quality sampling strategy is required to accurately

estimate annual pollutant loads using approaches like LOADEST. Therefore, the

objectives of the study were to: 1) identify the correlation between LOADEST model

behavior and water quality datasets for various proportions of water quality data from

51

51

storm events, and 2) suggest an approach to prepare water quality datasets for annual

pollutant load estimation using LOADEST.

3.3 Methodology

3.3.1 Water Quality Data Statistics for Annual Load Estimates

In the Chapter 2, daily sediment data were collected from 211 streams from the USGS

Water-Quality Data for the Nation (http://waterdata.usgs.gov/nwis/qw) and the National

Center for Water Quality Research of Heidelberg University

(http://www.heidelberg.edu/academiclife/distinctive/ncwqr), the daily data were

subsampled using six sampling strategies, the 9 regression models were run in

LOADEST, and the estimated annual sediment loads were compared to the measured

annual sediment loads. Regression model number 3 (Equation 3.4) provided the most

accurate and precise annual sediment load estimates, and therefore the model was

selected for use in this study.

Eq. 3.4 (Runkel et al. 2004)

∑

∑ Eq. 3.5 (Cohn et al. 1992a)

Where, L is load, log(Qi) is “log(Si) - ”, dtime is “decimal time - center of decimal time”,

ax are coefficients to calibrate, Si is log(streamflowi), is mean of log(streamflow), and

is center of log(streamflow).

52

52

The number of subsampled water quality datasets for each stream was 98 (i.e. (7 for

weekly + 14 for biweekly + 28 for monthly sampling strategies) × 2 (with or without

storm event)). Therefore, 20,678 (98 subsampled datasets × 211 streams) annual sediment

load estimates from Chapter 2 were used in the study.

The annual sediment load estimates for the regression model were explored to identify

what factors (or statistics) of model inputs affected annual load estimates. LOADEST

requires two inputs, one is to calibrate the regression model coefficients (i.e. water

quality and streamflow datasets), and the other is to estimate daily loads (i.e. streamflow

data). Thus, various statistics were derived from the subsampled input datasets (Table

3.2). The statistics for calibration and estimation data listed in Table 3.2 were assumed to

be possibly correlated to annual sediment load estimates.

53

53

Table 3.2 Statistics in Calibration and Estimation Data

From Calibration Data From Estimation Data

Q1

Minimum, Maximum, Mean, Standard

deviation

Minimum, Maximum, Mean,

Standard deviation

C2

Minimum, Maximum, Mean, Standard

deviation

Minimum, Maximum, Mean,

Standard deviation

Q, C, and

L3

Correlation Coefficient of :

Q and C, log(Q) and C, (log(Q))2 and

C, Q and L, log(Q) and L, (log(Q))2

and L

Coefficient of determination of :

Q and C, log(Q) and C, (log(Q))2 and

C, Q and L, log(Q) and L, (log(Q))2

and L

Percentage of Q with C data in high, moist, mid-range, dry, and low flow

regimes4

Minimum Q in calibration data / Minimum flow in estimation data

Maximum Q in calibration data / Maximum flow in estimation data

Mean Q in calibration data / Mean flow in estimation data

Standard deviation Q in calibration data / Standard deviation Q in

estimation data 1Q: streamflow data,

2C: water quality data (i.e. concentration),

3L: load (multiplication of

measured streamflow by water quality data), 4 Flow Regimes: defined by flow

frequencies (USEPA, 2007)

3.3.2 Water Quality Data Selection for LOADEST runs

The 20,678 annual sediment load estimates from LOADEST regression model number 3

were analyzed with the input data statistics (Table 3.2). Following these runs, there was a

need to run LOADEST to investigate regression model behavior for calibration data

characteristics. Thus, 5 USGS stations were selected from the 211 streams. The USGS

stations selected had long-term, daily water quality data, and the drainage areas ranged

from 12.5 km2 to 814,810 km

2 (Table 3.3). LOADEST requires at least 12 water quality

data to calibrate the model coefficients, and also it is limited to use a maximum of 2,440

(approximately 6 years of daily data) water quality data samples (Runkel et al. 2004).

Since the subsampling methods in the study included using all data (i.e. calibration data

54

54

period and interval are the same as estimation data period and interval), each water

quality dataset collected in the study was split into two datasets. For instance, the daily

water quality dataset of 10 years was split into two water quality datasets of 5 years.

Therefore, 10 water quality datasets were prepared from 5 USGS stations.

Table 3.3 Daily Sediment Data from USGS Stations

Station Number Station Name Data Period Drainage

Area (km2)

02119400 Third Creek near Stony Point, NC 1959-1968 12.5

07287150 Abiaca Creek near Seven Pines, MS 1993-2002 246.6

03265000 Stillwater River at Pleasant Hill, OH 1967-1973 1302.8

12334550 Clark Fork at Turah Bridge nr Bonner,

MT 1993-2002 9430.1

06486000 Missouri River at Sioux City, IA 1992-1999 814,810.3

3.4 Results and Discussions

3.4.1 Required Statistics for Annual Load Estimates

The means of flow in calibration data (MFC) were correlated to the errors in estimated

pollutant loads (Equation 3.6). There were no notable correlations between the errors and

the other statistics listed in Table 3.2. Moreover, the other statistics did not show

correlations to annual sediment load estimates categorized by streamflow flashiness,

drainage area, and geographical location (states).

Eq. 3.6

However, MFCs were correlated to annual sediment load estimates, LOADEST

underestimated loads with small MFCs and overestimated loads with large MFCs (Figure

55

55

3.1). This correlation (or trend) was identified through analysis of the load estimates in

the 211 streams. Chapter 2 showed that the portion of water quality data from storm

events used in creating the model was correlated to errors in annual sediment load

estimates. The correlation between load estimates and MFCs corresponded to these

results, since larger MFCs had more water quality data from storm events.

(a) USGS Station Number 01357500

56

56

(b) USGS Station Number 01463500

(c) USGS Station Number 01470500

57

57

(d) USGS Station Number 01481000

Figure 3.1 Correlation between Errors and Mean of Flow in Calibration Data

While the correlation was readily identified, the MFCs of the annual sediment load

estimates for a value of 0 % error differed for the 211 streams (see Figure 3.1). For

instance, annual sediment load estimates were the same as measured loads when MFC

was approximately 650 cubic meters per second (cms) for USGS Station 01463500, but

MFC was approximately 14 cubic meters per second for USGS Station 01481000 (Figure

3.1). Therefore, annual sediment load estimates with errors from -10% to +10% were

taken to be acceptable load estimates, and these annual sediment load estimates were

extracted to investigate correlation between MFCs and characteristics of the 211 streams.

The MFCs were correlated to mean flow in estimation (MFE) data and the MFCs were

slightly greater than MFEs (Figure 3.2).

58

58

Figure 3.2 Correlation between Mean of Flows in Calibration and Estimation Data

As a correlation between MFC and MFE was found, a linear regression to estimate a

required MFC was derived (Equation 3.8) using the formula for linear regression

(Equation 3.7).

∑ ∑ ∑

∑ ∑

∑ ∑

Eq. 3.7

Eq. 3.8

59

59

Where, y is a dependent variable, x is an independent variable, n is the number of

variables, LRS is linear regression slope, and b is constant.

The required MFCs were computed by the regression equation using the MFEs from 211

streams, and the coefficient of determination (R2) between the required MFCs estimated

by the regression equation and the MFCs from subsampled water quality data in 211

streams was 0.98 (Figure 3.3).

Figure 3.3 Required MFC by Regression Equation

60

60

3.4.2 Mean Flow in Calibration Data and Annual Load Estimates

LOADEST runs were performed to identify the correlation between MFCs and load

estimates. The 10 water quality datasets from 5 USGS stations were subsampled based on

flow size. Water quality datasets were in sequence by date, so the datasets were

manipulated based on flow size in two ways to be ascending and to be descending before

the data were subsampled. The first subsampled datasets from the ascending dataset were

composed of the smallest 12 flow data (i.e. the smallest 12 flow data with water quality

data for calibration data) from original datasets, since LOADEST requires at least 12

water quality data samples with flow data. The first subsampled dataset had the minimum

MFC from the original dataset. The second subsampled dataset from the ascending

dataset was composed of the smallest 42 flow data from the original dataset, which was

30 flow data added to the first subsampled dataset. In the same manner, the third

subsampled dataset of the ascending dataset was composed of the smallest 72 flow data

form the original dataset. In other words, 30 flow data were added in each subsampling

until all data were included. This approach was used to explore how LOADEST

performed with data biased toward low flow. In this subsampling method, the water

quality data from the largest flow were added in the last subsampling, therefore the model

extrapolated with all subsampled datasets, except the last subsampled dataset, which used

all data.

For the method of subsampling the descending dataset, the first subsampled dataset was

composed of the greatest 12 flow data from the original dataset, and the second

subsampled dataset had the greatest 42 flow data. As with the other subsampling method,

61

61

30 data were added until the calibration data included all data. The first subsampled

dataset had the maximum MFC from the original dataset. The load estimates were not

extrapolated, but the data were biased toward high flow (or storm events). These

sampling methods are not practical in a water quality monitoring program, but they were

used for evaluation of model behaviors with MFCs.

Regression models predict loads based on the correlation between flow and water quality

data. The slope, for instance a1 in equation 3.4, is a key factor in the correlation between

flow and concentration (or load), and so the slope for the measured data used to calibrate

LOADEST model coefficients was compared to the calibrated slope (a1) in LOADEST.

In other words, linear regression slopes (LRSs) between flow and concentration data in

calibration data were computed using equation 3.7, and the slope coefficients (a1) in the

LOADEST regression model (Equation 3.4) were derived from all load estimates.

Ten water quality datasets from five streams were subsampled to run LOADEST with

subsampled datasets. The five USGS stations selected in the study had different

geological locations and drainage areas. However, the model behaviors by changes in

MFC were similar. Both LRSs (of calibration data) and a1 (from LOADEST) fluctuated

when MFCs were too small or too great (Figure 3.4). Therefore, the errors of load

estimates were fluctuating when MFCs were too small or too great. In other words, the

model showed low precision with the data biased toward either low or high flows (Figure

3.5). This indicates that there is a limitation on reproduction of the true load with water

quality datasets biased toward low or high flow. Moreover, water quality datasets biased

62

62

toward low or high flow could lead to load estimates that differ significantly from true

loads. This is because the regression model requires only flow and water quality data, and

therefore load estimates close to true loads cannot be expected if the data are too biased

to reproduce true load.

It might be thought that use of more water quality sample data in estimating loads would

lead to load estimates that are closer to true loads. This premise was examined using all

data to calibrate regression model coefficients. Although load estimates using all data

were close to measured annual loads, use of all data did not necessarily lead to the

smallest error in annual sediment load estimates. For instance, the error of load estimates

using all data were -13.0 % and -1.7 % with the data from USGS Station Number

02119400 (Figure 3.5 (a)) and 06486000 (Figure 3.5 (b)), while it was not difficult to find

load estimates with smaller error than the load estimates using all data. Moreover, when

MFCs were close to the required MFCs based on the regression equation (Equation 3.8,

Table 3.4), the errors were smaller than the errors of the load estimates using all data.

Therefore, a water quality dataset should consist of samples associated with appropriate

flows (e.g. to be the required MFC by Equation 3.8) rather than using data from extensive

sampling strategies (e.g. daily water quality data collection).

The percentage of calibration data from high flow (PCH) conditions was computed in

each sediment load estimate (Table 3.4). High flow was defined as the upper 10 % of

flows for a given analysis period (USEPA, 2007). PCHs by the regression equation

ranged from 16.7 % (Station 12334550 from 1998 to 2002) to 36.8 % (Station 02119400

63

63

from 1959 to 1963). Haggard et al. (2003) and USGS (2001) suggest that 50 % of water

quality samples used in estimating loads should come from storm events, however, there

are differences in the definitions of storm events. Haggard et al. (2003) defined storm

events as flow stages exceeding 1.5 m which was approximately 20 % of days during

their study period, while the storm events (i.e. PCH) in the study were defined as the

upper 10 % of flows for a given period.

The results presented here indicate that the regression equation approach using Equation

3.8 can be used, for water quality monitoring programs intended to estimate annual

sediment loads using approaches like LOADEST, in the following way:

1) Compute MFCo using the regression equation with a mean flow of historical data prior

to initiating a water quality monitoring program,

2) Collect a few water quality samples based on MFCo,

3) Compute MFCi using the regression equation with the mean flow from the beginning

of the water quality monitoring program,

4) Collect water quality samples from low flow if MFCi is greater than the required MFC

by regression equation, collect water quality samples from high flow (storm events) if

MFCi is smaller than the required MFC by regression equation,

5) Repeat processes 3) and 4) to the end of water quality monitoring program.

However, collecting water quality samples only for the flow regime close to the required

MFC needs to be avoided, because it would be biased toward a certain regime of flow

data.

64

64



Figure 3.4 Comparison of Slopes from Linear Regression Formula (LRS in equation 3.7)

and Calibrated LOADEST Model (a1 in equation 3.4)

65

65



Figure 3.5 Annual Sediment Load Estimates by MFCs

66

66

Table 3.4 Comparison of Errors between Regression Equation and All Data

USGS Station Number

(Data Period)

Error of Load Estimates (%)

(PCH, %)

Regression1 All Data

02119400 (1959-1963) 6.4

(36.8)

-13.0

(11.1)

02119400 (1964-1968) 2.5

(36.2)

-8.7

(10.3)

07287150 (1993-1997) 13.0

(16.8)

22.4

(10.1)

07287150 (1998-2002) 8.1

(17.4)

-5.1

(10.1)

03265000 (1967-1969) 14.8

(20.3)

-29.5

(10.1)

03265000 (1970-1973) -10.8

(18.8)

-39.7

(10.2)

12334550 (1993-1997) 7.5

(17.1)

-16.6

(10.1)

12334550 (1998-2002) 0.7

(16.7)

-12.7

(10.3)

06486000 (1992-1995) -2.9

(18.4)

-1.7

(10.1)

06486000 (1995-1999) -5.8

(21.7)

-2.8

(10.1) 1Regression: Load Estimates when MFCs were close to the required MFC computed by

regression equation

3.4.3 Improvement of the Poorest Annual Load Estimates

Using the regression equation approach to estimate a required MFC can be employed

from the beginning of water quality monitoring programs. However, there is also a need

to explore an approach to employ the regression equation for water quality datasets that

have already been collected. When the regression equation is employed from the

beginning of water quality monitoring programs, water quality samples will be added

based on MFCs. However, if a water quality monitoring program has already been

finished, or if a water quality dataset has been collected by others (e.g. EPA, USGS, etc.),

water quality data cannot be added to obtain the required MFC by the regression equation.

67

67

Another way to obtain the required MFC would be to exclude water quality data from the

original data.

To explore this concept, five load estimates from Chapter 2 were selected, which had

poor load estimates and had enough water quality data to allow data to be excluded when

estimating loads. LOADEST runs were performed for the 5 water quality datasets, with

water quality samples excluded. The water quality datasets were from monthly or

biweekly fixed interval sampling strategies, and the number of water quality data samples

were 84, 120, and 261 (Table 3.5). All MFCs from the calibration data were smaller than

the required MFCs based on the regression equation. Therefore, water quality samples

from the smallest flow data were removed, and LOADEST runs were performed for the

reduced water quality datasets.

Load estimates improved as water quality samples were removed. For instance, load

estimates using the original datasets showed large error, however, the error became

smaller with MFC increases due to exclusion of water quality data associated with the

smallest flow data (Figure 3.6). For the five water quality datasets, the errors ranged from

132.8 % to 223.0 % with the original dataset, while the errors ranged from -27.0 % to 1.7 %

when the required MFCs from the regression equation were obtained for the LOADEST

calibration datasets. The original datasets were biased toward low flow; in other words,

MFCs were smaller than the required MFCs. The results indicate that a water quality

dataset needs to consist of an appropriate portion of water quality data from low and high

flows rather than large numbers of water quality samples, because the water quality

datasets for a small number of data but with appropriate MFC demonstrated smaller

68

68

errors than the water quality datasets with larger numbers of data (Table 3.5). The PCHs

of the original datasets (‘Original’ in Table 3.5) were approximately 10 %, while they

need to be 20-30 % (Chapter 2). The PCHs of water quality datasets based on the

regression equation were approximately 30 %, except the water quality dataset for station

05291000 since the PCH was 13.6 %.

Figure 3.6 Load Estimate Improvement when Excluding Water Quality Samples (USGS

Station Number 02119400, Monthly Fixed Sampling Strategy on 19th

of Every Month)

69

69

Table 3.5 Improvement of the Poorest Load Estimates by MFC Fitting

USGS Station

Number

(Sampling Strategy)

MFE1 R. MFC2

MFC3

(Error, %)

Num. Data4

(PCH, %)

Original5 Regression

6 Original

5 Regression

6

02119400

(Monthly on 18th)

0.18 0.36 0.19

(195.5)

0.35

(1.7)

120

(10.0)

45

(26.7)

02119400

(Monthly on 19th)

0.18 0.36 0.22

(223.0)

0.36

(-3.0)

120

(12.5)

57

(26.3)

02119400

(Monthly on 20th)

0.18 0.36 0.21

(132.8)

0.36

(-13.0)

120

(12.5)

52

(28.8)

02119400

(biweekly on 12th)

0.18 0.36 0.17

(144.7)

0.36

(1.4)

261

(7.7)

65

(30.8)

05291000

(Monthly on 25th)

1.43 2.50 1.78

(204.0)

2.51

(-27.0)

84

(9.5)

59

(13.6) 1MFE: Mean Flow (cms) of Estimation Data,

2R. MFC: Required MFC (cms) by the

Regression Equation (Equation 3.8), 3MFC: Mean Flow (cms) of Calibration Data,

4Num.

of Data: Number of data in calibration data, 5Original: Calibration Data from Chapter 2,

6Regression: Calibration Data after the Exclusion of Minimum Flow Data

3.5 Conclusions

Regression models are used to estimate pollutant loads or concentrations for a given time

sequence, and also for annual load estimation. LOADEST is used for various water

quality parameters in various sample sizes. In the study, one of the regression models

from LOADEST was evaluated with various sample sizes. Several distinct features were

found in the study: 1) the mean of flow to calibrate (MFC) regression model coefficients

were correlated to the mean of flow to estimate (MFE) pollutant loads, 2) the load

estimates differed significantly from the measured loads if MFC is too small or too great,

3) the use of all data having identical data intervals and period to estimation data did not

lead to the smallest error against the measured load, 4) the water quality dataset of

appropriate MFC showed smaller errors than the water quality dataset for a large amount

of data but biased toward low or high flows, and 5) exclusion of water quality data to fit

the required MFC improved annual load estimates. The results imply that a water quality

70

70

dataset needs to represent the distribution of given data; in other words, it is required not

to be biased toward a certain flow regime.

Calibration data needs to be representative of the watershed characteristics or to be

representative of the period used to estimate pollutant loads. In other words, calibration

data for pollutant regression models need to include an appropriate portion of storm

samples that are not too biased toward specific flow conditions. This is why the water

quality datasets collected less frequently but including an appropriate portion of storm

samples led to smaller differences in the predicted annual pollutant loads compared to

predicted loads for the water quality datasets collected more frequently but including

inappropriate portions of storm samples.

A regression equation was developed to compute the required mean flow in calibration

data. The regression equation is applicable from the beginning of water quality

monitoring programs. Furthermore, the regression equation approach can be used to

exclude water quality data if a water quality dataset has already been collected. The

regression equation is expected to be employed, if the purpose is to estimate annual

sediment loads using LOADEST.

71

71

3.6 References

Brigham, M. E., Wentz, D. A., Aiken, G. R., Krabbenhoft, D. P., 2009. Mercury cycling

in stream ecosystems. 1. water column chemistry and transport. Environmental

Science & Technology 43, 2720-2725.

Burn, D. H., 1990. Real-time sampling strategies for estimating nutrient loadings. Journal

of Water Resources Planning and Management 116(6), 727-741.

Cochran, W. G., 1963. Sampling Techniques (Second Edition). John Wiley and Sons,

New York, New York. Chapter 4. The Estimation of Sample Size. 75.

Cohn, T. A., Caulder, D. L., Gilroy, E. J., Zynjuk, L. D., Summers, R. M., 1992a. The

Validity of a simple statistical model for estimating fluvial constituent loads: an

empirical study involving nutrient loads entering Chesapeake Bay. Water Resources

Research 28(9), 2353-2363.

Cohn, T. A., Gilroy, E. J., Baier, W. G., 1992b. Estimating fluvial transport of trace

constituents using a regression model with data subject to censoring. paper presented

at the Joint Statistical Meetings, Am. Stat. Assoc., Boston.

Coynel, A., Schafer, J., Hurtrez, J. Dumas, J., Etcheber, H., Blanc, G., 2004. Sampling

frequency and accuracy of SPM flux estimates in two contrasted drainage basins.

Science of the Total Environment 330, 233-247.

Carey, R. O., Migliaccio, K. W., Brown, M. T., 2011. Nutrient discharges to Biscayne

Bay, Florida: Trends, loads, and a pollutant index. Science of the Total Environment

409, 530-539.

72

72

Das, S. K., Ng, A. W. M., Perera, B. J. C., 2011. Assessment of nutrient and sediment

loads in the Yarra river catchment. 19th International Congress on Modelling and

Simulation, Perth, Australia. 3490-3496.

Dornblaser, M. M., Striegl, R. G., 2009. Suspended sediment and carbonate transport in

the Yukon river basin, Alska: Flouxes and potential future responses to climate

change. Water Resources Research 45, W06411, doi:10.1029/2008WR007546

Duan, S., Kaushal, S. S., Groffman, P. M., Band, L. E., Belt, K. T., 2012. Phosphorus

export across an urban to rural gradient in the Chesapeake Bay watershed. Journal of

Geophysical Research 117, G01025, doi:10.1029/2011JG001782.

Gilroy, E. J. Hirsch, R. M., Cohn, T. A., 1990. Mean square error of regression-based

constituent transport estimates. Water Resources Research 26(9), 2069-2077.

Haggard, B. E., Soerens, T. S., Green, W. R., Richards, R. P., 2003. Using regression

methods to estimate stream phosphorus loads at the Illinois River, Arkansas. Applied

Engineering in Agriculture 19(2), 187–194.

Helsel, D. R., Hirsch, R. M. 2002. Statistical Methods in Water Resources. U. S.

Geological Survey Techniques of Water-Resources Investigations of the United

States Geological Survey Book 4, Hydrologic Analysis and Interpolation, 352.

Henjum, M. B., Hozalski, R. M., Wennen, C. R., Novak, P. J., Arnold, W. A., 2010. A

comparison of total maximum daily load (TMDL) calculations in urban streams

using near real-time and periodic sampling data. Journal of Environmental

Monitoring 12, 234-241.

73

73

Horowitz, A. J., 2003. An evaluation of sediment rating curves for estimating suspended

sediment concentrations for subsequent flux calculations. Hydrological Processes 17,

3387-3409.

Johnson, A. H., 1979. Estimating solute transport in streams from grab samples. Water

Resources Research 15(5), 1224-1228.

Johnes, P. J., 2007. Uncertainties in annual riverine phosphorus load estimation: Impact

of load estimation methodology, sampling frequency, baseflow index and catchment

population density. Journal of Hydrology 332, 241-258.

King, K. W., Harmel, R. D., 2003. Considerations in selecting a water quality sampling

strategy. Transaction of the ASAE 46(1), 63-73.

Kronvang, B., Bruhn, A. J., 1996. Choice of sampling strategy and estimation method for

calculating nitrogen and phosphorus transport in small lowland streams.


Oh, J., Sankarasubramanian, A., 2011. Interannual hydroclimatic variability and its

influence on winter nutrients variability over the southeast United States. Hydrology

and Earth System Sciences Discussions 8, 10935-10971.

Powell, J. L. 1984. Least absolute deviations estimation for the censored regression

model. Journal of Econometrics 25, 303-325.

Robertson, D. M., 2003. Influence of different temporal sampling strategies on estimating

total phosphorus and suspended sediment concentration and transport in small

streams. Journal of the American Water Resources Association 39(5), 1281-1308.

74

74

Robertson, D. M., Roerish, E. D., 1999. Influence of various water quality sampling

strategies on load estimates for small strams. Water Resources Research 35(12),

3747-3759.

Robertson, D. M., Richards, K. D., 2000. Influence of different temporal sampling

strategies on estimating loads and maximum concentrations in small streams.

Proceedings of the National Water Quality Monitoring Council National Monitoring

Conference, Austin TX. 209-223.

Runkel, R. L., Crawford, C. G., Cohn, T. A., 2004. Load Estimator (LOADEST): A


Geological Survey Techniques and Methods, Book 4, Chap. A5

Spencer, R. G. M., Aiken, G. R., Bulter, K. D., Dornblaser, M. M., Striegl, R. G., Hernes,

P. J., 2009. Utilizing chromophoric dissolves organic matter measurements to derive

export and reactivity of dissolved organic carbon exported to the Arctic Ocean: A

case study of the Yukon river, Alaska. Geophysical Research Letters 36, L06401,

doi:10.1029/2008GL036831.

USEPA (U.S. Environmental Protection Agency), 1997. Monitoring guidance for

determining the effectiveness of nonpoint source controls. EPA/841-B-96-004, U.S.

EPA Office of Water Nonpoint Source Control Branch. Washington D.C.

USEPA (U.S. Environmental Protection Agency), 2007. An approach for using load

duration curves in the development of TMDLs. Washington, DC 20460.

USGS (U. S. Geological Survey), 1998. Computer programs for describing the recession

of ground-water discharge and for estimating mean ground-water recharge and

discharge from streamflow records-update. Water-Resources Investigation Report

98-4148. Reston, Virginia.

75

75

USGS (U. S. Geological Survey), 2001. Effect of storm-sampling frequency on

estimation of water-quality loads and trends in two tributaries to Chesapeake Bay in

Virginia. Water-Resources Investigations Report 01-4136. Richmond, Virginia.

Zar, J. H., 1984. Biostatistical Analysis (Second Edition). Prentice Hall, New Jersey.

Chapter 8. One-Sample Hypotheses. 110.

76

76

CHAPTER 4. A WEB TOOL FOR STORET/WQX WATER QUALITY DATA

RETRIEVAL AND BEST MANAGEMENT PRACTICE SCENARIO

IDENTIFICATION

4.1 Abstract

Total Maximum Daily Load is a water quality standard used to regulate and protect the

quality of water in streams, rivers and lakes. A wide range of approaches is used

currently to develop TMDLs for impaired streams and rivers. Flow and load duration

curves (FDC and LDC) have been used in many states to evaluate the relationship

between flow and pollutant loading along with other models and approaches. A web-

based LDC Tool has been developed to facilitate development of FDC and LDC as well

as to support other hydrologic analyses. In this study, the FDC and LDC tool was

enhanced to allow collection of water quality data via the web and to assist in

establishing cost-effective best management practice (BMP) implementations. The

enhanced web-based tool uses water quality data from the US Geological Survey and

from the US Water Quality Portal via web access. Moreover, the web-based tool

identifies required pollutant reductions to meet standard loads and suggests a BMP

scenario to meet this reduction, based on ability of BMPs to reduce pollutant loads and on

BMP establishment and maintenance costs. In this study, flow and water quality data

were collected via web access to develop LDC and to identify the required reductions in

load. BMP scenario suggestions are based on the EPA Spreadsheet Tool for the

77

77

Estimation of Pollutant Load model with the goal of achieving the required pollutant

reduction at the least cost.

Keywords: Flow Duration Curve, Load Duration Curve, STORET/WQX, Total

Maximum Daily Load

4.2 Introduction

The Total Maximum Daily Load (TMDL) is a measure of the maximum load of a

pollutant and is used as a water quality standard to regulate water quality of streams.

Section 303(d) of the Clean Water Act requires states and other defined authorities to

develop lists of impaired waters, indicating that jurisdictions having contaminated water

need to establish priority rankings and to develop TMDL plans (USEPA, 2008). A wide

range of approaches is currently used to develop TMDLs for impaired streams. Different

models have apparent differences, purposes, applicability, and uncertainties, and have

various structures and assumptions for modeling sophisticated systems found in nature. In

other words, models have various advantages and disadvantages, and it may be necessary

to combine two or more models when solving a problem (Babbar-Sebens and

Karthikeyan, 2009; Shen and Zhao, 2010; Cleland, 2003; USEPA, 2008). Load duration

curves (LDCs) have been used in many states along with other models or approaches to

evaluate the relationship between flow and pollutant loading (Alaska Department of

Environmental Conservation, 2008; Babbar-Sebens and Karthikeyan, 2009; Chen et al.,

2011; IDEM, 2007; Minnesota Pollution Control Agency, 2010; New Hampshire

Department of Environmental Services, 2008; Tennessee Department of Environment

and Conservation, 2005; Texas Institute for Applied Environmental Research, 2010).

78

78

Flow duration curves (FDCs) and LDCs can be plotted either manually (Kim and Yoon,

2011; Babbar-Sebens and Karthikeyan, 2009; Shen and Zhao, 2010) or by computer

programs (Johnson et al., 2009; Kim et al., 2012). The steps to plot the curves are: 1)

collecting flow and water quality data, 2) manipulating the data from chronological

division into cumulative frequency of flow data, 3) calculating the standard pollutant

loads by multiplying the numerical target concentration by flow data, 4) calculating

existing pollutant loads by multiplying the numerical observed water quality data by flow

data, and 5) plotting FDC and LDC against the cumulative frequency (Shen and Zhao,

2010). The method is a relatively simple approach for developing TMDLs, as little

expertise and time are required to prepare input data and to develop FDCs and LDCs.

However, historical records of streamflow and water quality data associated with the

streamflow data are required. Both streamflow and water quality datasets need to be

imported into spreadsheets, and the datasets need to be manipulated so that the other

processes can be followed to develop FDCs and LDCs.

A web-based LDC Tool (https://engineering.purdue.edu/~ldc/; Kim et al., 2012) has been

developed to automate the processes for generation of FDC and LDC. The Web-based

LDC Tool has numerous benefits. One of them is that it is easy to use and operate the

tool as no installation or other software is required due to the fact that it is a web-based

tool. Further, it provides a user friendly interface to identify sampling stations using

Google Maps. The web-based tool allows use not only of a user’s streamflow data,

fundamental data for FDC and LDC, but also U.S. Geological Survey (USGS)

79

79

streamflow data retrieved from the USGS data server for stations selected via a Google

Maps interface.

In addition, the web-based tool employs LOAD ESTimator (LOADEST; Runkel et al.,

2004). Generally, water quality data measurement is costly and requires many procedures,

and the data size may not be sufficient to perform TMDL analysis. LOADEST can be

used to generate daily pollutant loads using intermittent water quality data. The web-

based tool interacts with LOADEST, preparing inputs and plotting LDC with the results

of LOADEST.

The web-based tool provides various benefits in development of FDC and LDC, deriving

streamflow data from the USGS data server, integrating with LOADEST, and allowing

additional analysis (e.g. seasonal variations and surface flow separation). However, users

were required to prepare water quality data, as water quality data are an essential input in

LDC development. Also, upgrading the web-based tool was required to improve ease-of-

use and to allow further analysis of Best Management Practice (BMP) recommendation.

Thus, the objectives of the study were to: 1) enhance the Web-based LDC Tool to allow

collecting water quality data via web access, and 2) develop a tool module to suggest

BMP scenarios to reduce annual pollutant load to meet required pollutant load reduction.

80

80

4.3 Methodology

4.3.1 Module Development to Use Water Quality Data

Two datasets are required to develop LDC, which are flow and water quality datasets.

While the web-based tool provided use of USGS streamflow data via a Google Maps

interface, users must prepare water quality data manually. Water quality data can be

prepared by the users, or the data can be collected from USGS, US Environmental

Protection Agency (EPA), or the Water Quality Portal (WQP). Therefore, three modules

were developed to automate collecting water quality data and to facilitate use of the data

in the web-based tool.

The first module requests and receives USGS water quality data which are provided

through web access (http://waterdata.usgs.gov/nwis/qw). Similar to the use of the USGS

streamflow data, a module with a Google Maps interface displaying the locations of the

USGS water quality stations was developed (Fig. 4.1). As the user finds and selects the

USGS station of interest on the Google Maps interface, the module requests and receives

water quality data from the USGS server. Since the dataset from the USGS is for various

water quality parameters, one of the water quality parameters must be selected by the

user, making the data for the selected water quality parameters available for use in the

web-based tool without any manual processes (e.g. data formatting).

The second module uploads and uses water quality data from the EPA My WATERS

Mapper (http://watersgeo.epa.gov/mwm/) application. The My WATERS Mapper

provides STOrage and RETrieval (STORET) data, and the web site allows downloading

81

81

a water quality data file formatted in comma separated values (CSV), extensible markup

language (XML), or keyhole markup language (KML). The downloaded file from the

EPA My WATERS Mapper consists of various water quality parameters. Therefore, a

module has been developed to upload the data file, to extract the water quality parameter

of interest, and to allow use of the extracted data in the web-based tool without any

manual processes.

The third module requests and receives water quality data from the Water Quality Portal

(WQP; http://www.waterqualitydata.us/), which provides use of water quality data from

the USGS, USEPA, and the National Water Quality Monitoring Council. The WQP

allows not only downloading water quality data but also requesting/receiving water

quality data by the web with location information (i.e. “Water Monitoring Location” and

“Organization ID”). A monitoring location database was built from the EPA

STORET/WQX locations (http://www.epa.gov/storet/) and was stored in a Purdue

University server. A module was developed to display the STORET/WQX locations, to

request water quality data from the WQP, to extract water quality parameters of interest,

and to convert data formats (Fig. 4.2).

In brief, the web-based tool has been upgraded: 1) to use the USGS water quality data for

the entire U.S. via web access, 2) to upload water quality data downloaded from the EPA

My WATERS Mapper, and 3) to use water quality data from STORET/WQX for the

entire U.S. via web access.

82

82

Figure 4.1 Google Maps Interface to Retrieve USGS Water Quality Data

83

83

Figure 4.2 Schematic Depicting Web-based Tool Access of Water Quality Data from

EPA STORET/WQX Location Database and Web Access to WQP

4.3.2 Module Development to Suggest BMP Scenarios

Flow data can be analyzed to evaluate hydrologic conditions and to interpret water

quality data. The FDC is divided into five regimes (or zones); high flows (0-10%), moist

conditions (10-40%), mid-range flows (40-60%), dry conditions (60-90%), and low flows

(90-100%). Streamflow patterns are related to watershed characteristics (Chen et al.,

2011; Sigua and Tweedale, 2003; May et al., 2001; Hsu et al., 2010; Grizzetti et al.,

2005). The five flow regimes in the FDC are helpful in understanding watershed

characteristics since the FDC shows the magnitude and frequency of flow. The FDC and

LDC are fundamental analyses used to develop TMDLs, identifying specific flow

regimes violating water quality standards.

84

84

A LDC for water quality criterion is developed by multiplying streamflow by the water

quality target (USEPA, 2007; Cleland, 2003). The pollutant loads exceeding standards in

each zone imply potentially different sources of pollutant loads, although the LDC

approach does not require any information about the source of watershed pollutants such

as landuse types. For instance, if the pollutant loads exceed the standard pollutant loads in

‘Dry Conditions’ or ‘Low Flows’, they can be identified as point sources or potentially

livestock in the case of an agricultural watershed. Pollutant delivery related to runoff

from riparian areas, from impervious areas in urban watersheds with light rain, or from

saturated soils may cause pollutant loads to exceed standards in ‘Moist conditions’ or

‘Mid-range Flow’. Pollutant loads exceeding standards in ‘High Flow’ typically result

from stream bank erosion, channel processes, and non-point source (NPS) pollutant loads

(USEPA, 2007).

Since the sources and causes of the pollutant loads exceeding standards in the flow

regimes are different, a BMP scenario needs to be based on the flow regime in which

pollutant loads are exceeded (Table 4.1). For instance, if the pollutant loads exceed the

standards in the ‘High-Flow’ regime, the BMPs categorized as ‘Post Development

BMPs’, ‘Streambank Stabilization’, or ‘Erosion Control Program’ are selectable (e.g.

diversion, streambank stabilization and fencing, porous pavement). If the pollutant loads

exceed the standards in the ‘Dry Condition’ regime, the BMPs categorized as ‘Riparian

Buffer Protection’ or ‘Municipal Wastewater Treatment Plant’ are selectable (e.g. filter

strip, runoff management system, waste storage facility) (USEPA, 2007).

85

85

Table 4.1 BMP Categories for Each Flow Regime (USEPA, 2007)

High-Flow Moist

Conditions

Mid-range

Flow

Dry

Conditions

Low

Flow

Implementatio

n

Opportunities

Post

Development

BMPs

Streambank

Stabilization

Erosion Control Program

Riparian Buffer Protection

Municipal Wastewater

Treatment Plant

The LDC approach allows simple analysis to determine if pollutant loads are exceeded in

any the five flow regimes, and identification of potential BMPs to address problems

identified since the sources and causes of pollutant loads for flow regimes are different.

The EPA Spreadsheet Tool for the Estimation of Pollutant Load (EPA STEPL; Tetra

Tech, 2011) is a spreadsheet model to compute annual runoff, sediment load, nutrient

loads, and 5-day biological oxygen demand (BOD5). The spreadsheet tool allows

estimation of various BMP implementations and Low Impact Development (LID)

practices so that pollutant load reduction for BMPs or LIDs can be computed. The

spreadsheet tool has a BMP database with 54 BMP efficiencies for nitrogen, phosphorus,

BOD, and sediment load reductions. The BMPs were categorized into the five

implementation categories in Table 4.1.

The Web-based LDC Tool was enhanced to identify the required pollutant reduction

percentage for each flow regime, and then the web-based tool makes lists of the BMPs

corresponding to the flow regimes responsible for exceeding the standards. The BMP

86

86

lists are associated with each landuse because the BMPs in the EPA STEPL database

were categorized by landuse types (Cropland, Forest, Feedlots, and Urban).

After the BMP lists are established, the web-based tool computes BMP implementation

costs (ct) using a cost function (Arabi et al., 2006) that requires establishment cost (c0),

ratio of annual maintenance cost to establishment cost (rm), interest rate (s), and BMP

design life (td).

[∑ ] Eq. 4.1

BMP establishment cost and ratio of annual maintenance cost were collected from

various documents (Table 4.2), but actual cost might differ for a given watershed.

Therefore, the module displays BMP costs from the database as a default and allows the

user to update BMP costs before the module suggests BMP scenarios for each landuse.

After BMP implementation costs are estimated by the cost function (Eq. 4.1), the web-

based tool establishes BMP scenarios for each landuse based on least BMP

implementation cost per unit of BMP efficiency. In other words, the web-based tool

computes BMP implementation costs for a pollutant reduction of 1 percent.

87

Table 4.2 Default BMP Costs for Landuses

Landuse BMP

Establishment

Cost

($/ha)

Maintenance Cost

(% of Establishment

Cost)

Reference

Cropland

Contour Farming 15 1 Pertsova, 2007

Filter strip1 21 10 Buckner, 2001

Reduced tillage systems 7 1 Kieser & Associates, 2008

Forest

Site preparation/hydro

mulch/seed/

fertilizer

3707 1 USEPA, 2005

Site preparation/straw/crimp/net 35,481 1 GLEC, 2008

Feedlots Filter strip1 21 10 Buckner, 2001

Urban

Alum treatment 1,112 0 Wisconsin DNR, 2003

Grass swales 1,730 5 USEPA, 1999

Infiltration Basin 7,413 3 USEPA, 1999

Infiltration Trench 22,239 5 USEPA, 1999

Porous Pavement 592,015 1 King and Hagan, 2011

Sand Filter 25,946 12 USEPA, 1999

Vegetated Filter Strips 2,224 4 USEPA, 1999

Weekly Street Sweeping 14,947 7 King and Hagan, 2011

Wetland Detention 6,178 2 USEPA, 1999 1Filter strip: the ratio of contributing drainage area to filter strip area is assumed to be 40:1.

88

88

4.4 Application of the Web Tool

A watershed was selected to demonstrate use of the web-based tool to develop LDC, to

compute required pollutant reduction percentage, and to recommend a BMP scenario to

reduce pollutant loads to achieve TMDL pollutant levels. A 254 km2 watershed in

northeast Indiana was selected. The landuses in the watershed are 33.7 % cropland (90

km2), 33.9 % pastureland (90.5 km

2), 6.3 % urban (17.0 km

2), and 7.7 % forest (20.6

km2). Flow data were collected from USGS station number 04177870 (Fish Creek near

Artic, Indiana; Fig. 4.3) and ranged from 0.1 m3/s to 38.5 m

3/s. Total suspended solids

data were collected by the module from EPA STORET (Fig. 4.4), and the data ranged

from 4.0 mg/l to 100.0 mg/l. The flow data period was from 1998-04-08 to 2007-12-05,

and the water quality data period was from 1999-04-06 to 2007-11-28. Therefore, the

data from 1999-04-06 to 2007-11-28 were selected to develop the LDC, and the water

quality target for the total suspended solids was set to 46.0 mg/l (IDEM, 2013). An

interest rate of 4.5% was used for the cost function to estimate BMP implementation

costs.

The LDC was developed using the web-based tool (Fig. 4.5). The watershed had

sediment loads exceeding standards in the High-Flow and Moist-Condition, and the

required pollutant reductions were 48.1% in High-Flow and 31.9% in Moist-Condition

flow regimes (Table 4.3). The web-based tool established BMP scenarios to reduce

sediment from both High-Flow and Moist-Condition flow regimes. In the BMP scenario

for cropland, the most cost-effective BMP was ‘reduced tillage systems’ with an

estimated annual cost of $7/ha/year, with ‘filter strip’ (estimated annual cost of

89

89

$7/ha/year) and ‘contour farming’ (estimated annual cost of $16/ha/year) the second and

third most cost-effective BMPs for cropland, respectively. Estimated annual cost for both

‘reduced tillage systems’ and ‘filter strip’ was $7/ha/year, however ‘reduced tillage

systems’ was the most cost-effective BMP since ‘reduced tillage systems’ is able to

reduce 75% of sediment losses while ‘filter strip’ is able to reduce 65% of sediment

losses based on the EPA STEPL BMP database. In the BMP scenario for urban, the most

cost-effective BMP was ‘vegetative filter strip’ ($5/ha/year), and ‘grass swales’

($397/ha/year) was the second most cost-effective BMP.

The web-based tool identifies BMP scenarios for each landuse, however, there is a need

to optimize the area to which a BMP is applied (BMPapplied) to identify the most cost-

effective BMP implementation plan. The most cost-effective BMP from the BMPs

identified is applied first. If application of the first BMP to the available area of

associated landuse (i.e. max level of BMP application) does not meet the required

pollutant load reduction, the second most cost-effective BMP needs to be applied.

Therefore, iterative simulations using another model to evaluate impacts of BMPs and

BMPapplied are required.

To demonstrate the iterative simulations described above, the EPA STEPL model was

applied with the BMP scenario for cropland to demonstrate the iterative simulations,

since cropland is dominant in the watershed. The BMPapplied for the first BMP (reduced

tillage systems) was specified at up to 54 km2 (60 % of cropland), in other words, it was

assumed that 40 % of cropland had this BMP already or would not adopt this practice.

90

90

During modeling, the BMPapplied for ‘reduced tillage systems’ was increased iteratively

until the estimated sediment reduction was greater than the required reduction (48.1%) or

until the BMPapplied met the maximum specified BMPapplied (54 km2). Estimated sediment

reduction was 42.3 % when the BMPapplied of ‘reduced tillage system’ was 54 km2.

Therefore, the second most cost-effective BMP, ‘filter strip’, was applied in combination

with ‘reduced tillage system’. Estimated sediment reduction was 48.3 %, when ‘reduced

tillage systems’ of 54 km2 and ‘filter strip’ of 36 km

2 were applied for cropland. There

was no further simulation required (e.g. increasing BMPapplied or applying more BMPs),

since estimated sediment reduction met the required sediment reduction. The estimated

annual cost was $62,968 which resulted from $37,781 for ‘reduced tillage system’

applied to 54 km2 and $25,187 for ‘filter strip’ applied to 36 km

2.

91

91

Figure 4.3 Flow Data Collection by USGS Flow Station Location Tool

92

92

Figure 4.4 Water Quality Data Collection by WQP Location Tool

93

93

Figure 4.5 Load Duration Curve for the Study Watershed

Table 4.3 Target Load, 90th

Percentile Load, and Required Reduction Percentage

Flow Regime Target Load

(tons/day)

90th

Percentile Load

(tons/day)

Required Reduction

(%)

High-Flow 42.1 81.1 48.1

Moist-Condition 11.6 17.0 31.9

Mid-range Flow 4.6 3.5 0.0

Dry-Condition 1.8 1.2 0.0

Low-Flow 0.8 0.2 0.0

4.5 Conclusions

Section 303(d) of the Clean Water Act indicates that the states or authorities need to

develop lists of impaired waters, establish priority rankings, and develop TMDL plans. A

wide range of approaches are used currently with flow and load duration curves in the

94

94

implementation of TMDL plans. The curves can be employed to identify the source of

pollutant loads, compute required pollutant reduction percentage against standard

pollutant loads, and establish BMP scenarios. While the LDC approach is simple, LDC

development can be time-consuming and opens the possibility of human error.

A web-based tool was developed previously to simplify the LDC development process,

however only flow data retrieval was automated by accessing USGS data in the tool,

while both flow and water quality data are essential in LDC development. Therefore, the

web-based tool was upgraded to allow use of water quality data from USGS and WQP

providing the water quality data of STORET/WQX for any location in the United States.

Moreover, the web-based tool now provides Google Maps interfaces to display and select

water quality locations of interest. In TMDL implementations, required pollutant

reductions must be computed and also used to establish BMP scenarios to meet the

standard loads. The pollutant loads exceeding standards in the five flow regimes implies

different sources of pollutant loads, therefore different BMPs need to be applied to

address the specific pollutant sources. BMP implementation costs are required to

facilitate identification of cost-effective BMP implementation plans. Therefore, the web-

based tool identifies the required pollutant reduction for each flow regime, makes lists of

BMPs able to reduce pollutant loads corresponding to the flow regime for which

pollutant loads are exceeded, and identifies the BMP with the least cost for each landuse.

As the tool has been upgraded, it is expected the web-based tool be useful for collection

of water quality data, identification of pollutant sources, and computation of required

95

95

pollutant reduction. Moreover, the web-based tool can be used to identify BMP scenarios

for simulation by models in developing watershed management plans.

Currently, the web-based tool suggests BMP scenarios to meet the standard load for one

water quality parameter such as nitrogen, phosphorus, BOD, or sediment. Therefore, the

web-based tool will be upgraded in the future to suggest BMP scenarios for cases in

which two or more water quality parameters must be considered. The web-based tool

identifies BMPs with the least cost for each landuse, however, the BMPs and the area to

which these BMPs are applied need to be optimized for cost-effective BMP

implementation. Therefore, in the future the Web-based LDC Tool will be integrated with

a hydrologic/water quality model and optimization code to identify cost-effective BMP

implementations.

96

96

4.6 References

Alaska Department of Environmental Conservation, February 2008. Total maximum

daily load (TMDL) for fecal coliform in the waters of Pederson Hill Creek in Juneau,

Alaska.

Arabi, M., Govindaraju R. S., Hantush, M. M., 2006. Cost-effective allocation of

watershed management practices using a genetic algorithm. Water Resources

Research 42, W10429, DOI: 10.1029/2006WR004931.



118-123.

Buckner, E.R., 2001. An Evaluation of the Use of Vegetative Filter Strips on Agricultural

Lands in the Upper Wabash River Basin. Dissertation, Purdue University, West

Lafayette, Indiana, UMI Microform 3037543.

Chen, D., Lu, J., Wang, H., Shen, Y., Gong, D., 2011. Combined inverse modeling

approach and load duration curve method for variable nitrogen total maximum daily

load development in an agricultural watershed. Environmental Science Pollution

Research 18, 1405-1413.

Cleland, B., 2003. TMDL development from the “Bottom Up” - Part III: duration curves

and wet-weather assessments. National TMDL Science and Policy 2003 - WEF

Specialty Conference. Chicago, IL.

Great Lakes Environmental Center (GLEC), December 2008. National level assessment

of water quality impairments related to forest roads and their prevention by best

management practices.

97

97

Grizzetti, B., Bouraoui, F., Marsily, G. d., Bidoglio, G., 2005. A statistical method for

source apportionment of riverine nitrogen loads. Journal of Hydrology 304, 302-315.

Hsu, T., Kin, J., Lee, T., Zhang, H. X., Lu, S. L., 2010. A storm event-based approach to

TMDL development. Environmental Monitoring and Assessment 163, 81-94.

Indiana Department of Environmental Management (IDEM), July 2007. Total maximum

daily load for escherichia coli (E. coli) for the East Fork Whitewater River

Watershed, Wayne, Union, Fayette, and Franklin Counties.

Indiana Department of Environmental Management (IDEM), 2013. Water quality targets.

Available at < http://www.in.gov/idem/nps/3484.htm>. Accessed in October 2013.

Johnson, S. L., Whiteaker, T., Maidment, D. R., 2009. A tool for automated load duration

curve creation. Journal of the American Water Resources Association 45(3): 654-663.

Kieser & Associates, February 2007. Modeling of agricultural BMP scenarios in the Paw

Paw River Watershed using the Soil and Water Assessment Tool (SWAT).

Kim. G., Yoon, J., 2011. Development and application of total coliform load duration

curve for the Geum River, Korea. Korean Society of Civil Engineers, Journal of

Civil Engineering 15(2), 239-244.

Kim. J., Engel, B. A., Park, Y. S., Theller, L., Chaubey, I., Kong, D. S., Lim, K. J., 2012.

Development of Web-based Load Duration Curve System for analysis of total

maximum daily load and water quality characteristics in a waterbody. Journal of

Environmental Management 97, 46-55.

King, D., Hagan, P., October 2011. Costs of stormwater management practices in

Maryland Counties. University of Maryland Center for Environmental Science.

98

98

May, L., House, W. A., Bowes, M., McEvoy, J., 2001. Seasonal export of phosphorus

from a lowland catchment: upper River Cherwell in Oxfordshire, England. The

Science of the Total Environment 269, 117-130.

Minnesota Pollution Control Agency, June 2010. Rabbit River turbidity total maximum

daily load report.

New Hampshire Department of Environmental Services, April 2008. Total maximum

daily load (TMDL) study for waterbodies in the Vicinity of the I-93 Corridor from

Massachusetts to Manchester, NH: North Tributary to Canobie Lake in Windham,

NH.

Pertsova, C. C., 2007. Ecological Economics Research Trends. Chapter 3. A recent trend

in ecological economic research: Quantifying the benefits and costs of improving

ecosystem services, 57.

Runkel, R. L., Crawford, C. G., Cohn, T. A., 2004. Load estimator (LOADEST): A



Sigua, G. C., Tweedale, W. A., 2003. Watershed scale assessment of nitrogen and

phosphorus loadings in the Indian River Lagoon basin, Florida. Journal of

Environmental Management 67, 363-372.

Shen, J., Zhao, Y., 2010. Combined Bayesian statistics and load duration curve method

for bacteria nonpoint source loading estimation. Water Research 44, 77-84.

Tennessee Department of Environment and Conservation. July 2005. Total maximum

daily load (TMDL) for low dissolved oxygen & nutrients in the Upper Duck River

Watershed (HUC 06040002) Bedford, Coffee, Marshall, & Maury Counties,

Tennessee.

99

99

Tetra Tech Inc., 2011. User’s guide spreadsheet tool for the estimation of pollutant load

(STEPL) version 4.1. Tetra Tech, Inc. 10306 Eaton Place, Suite 340 Fairfax, VA

22003.

Texas Institute for Applied Environmental Research, March 2010. Technical support

document for bacteria TMDLs Segment 0822A-Cottonwood Branch & Segment

0822B-Grapecine Creek.

USEPA, 1999. Preliminary data summary of urban storm water best management

practices, August 1999. United States Environmental Protection Agency, Office of

Water (4303) Washington.

USEPA, April 2005. National management measures to control nonpoint source pollution

from forestry. U. S. Environmental Protection Agency. Washington, DC 20460.

USEPA, 2007. An approach for using load duration curves in the development of

TMDLs. U. S. Environmental Protection Agency. Washington, DC 20460.

USEPA, 2008. Handbook for developing watershed TMDLs. U. S. Environmental

Protection Agency. Washington, DC 20460.

Wisconsin Department of Natural Resources (Wisconsin DNR), March 2003. Alum

treatments to control phosphorus in lakes.

100

100

CHAPTER 5. A WEB MODEL TO ESTIMATE THE IMPACT OF BEST

MANAGEMENT PRACTICES

5.1 Abstract

The Spreadsheet Tool for the Estimation of Pollutant Load (STEPL) can be used for

Total Maximum Daily Load (TMDL) processes, because the model is capable of

simulating impacts of various best management practices (BMPs) and low impact

development (LID) practices. The model computes annual direct runoff using the Soil

Conservation Service Curve Number (SCS-CN) method with average rainfall per event,

but this is not a typical use of the SCS-CN method. Five SCS-CN based approaches to

compute annual direct runoff were investigated to explore estimated differences in annual

direct runoff computations using daily precipitation data collected from the National

Climate Data Center and generated by the CLIGEN model for twelve stations in Indiana.

Compared to the annual direct runoff computed for the conventional use of the SCS-CN

method, the approach used to estimate annual direct runoff within EPA STEPL showed

large differences. A web-based model (STEPL WEB) was developed with an updated

approach to estimate annual direct runoff. Moreover, the model was integrated with the

Web-based Load Duration Curve Tool which identifies least cost BMPs for each landuse

and optimizes BMP selection to identify the most cost-effective BMP implementations.

The integrated tools provide an easy to use approach to performing TMDL analysis and

identifying cost effective approaches to controlling nonpoint source pollution.

101

101

5.2 Introduction

Section 303(d) of the Clean Water Act requires states and other defined authorities to

develop lists of impaired rivers and streams that have seriously contaminated water. They

need to establish priority rankings for waters on the lists and to develop Total Maximum

Daily Loads (TMDLs). Various models have been used not only to develop TMDLs but

also to perform analyses to identify strategies to attain pollutant load limits with plans

typically identifying Best Management Practices (BMPs) to reduce loads (Kang et al.,

2006; Patil and Deng, 2011; Pease et al., 2010; Richards et al., 2008).

One such model is the Spreadsheet Tool for the Estimation of Pollutant Load (STEPL)

that computes annual direct runoff, sediment load, nutrient loads, and 5-day biological

oxygen demand (BOD5) (Tetra Tech, 2011). The model is capable of estimating annual

non-point source (NPS) pollutant loads, and in addition, the model allows estimation of

impacts of various BMP implementations and Low Impact Development (LID) practices

so that pollutant load reduction for BMPs or LIDs can be computed (Commonwealth

Biomonitoring, 2009; FDEP, 2009; Keegstra et al., 2012; Tetra Tech, 2011). EPA STEPL

requires landuse and Hydrologic Soil Group (HSG) to define Curve Number (CN) as it

computes direct runoff in a watershed based on the Soil Conservation Service Curve

Number (SCS-CN) method (USDA 1985). Landuse categories in the model are urban,

cropland, pastureland, forest, and user defined.

However, the model uses an approach that is not the conventional approach used to

calculate annual direct runoff, and the annual direct runoff is a key parameter in

102

102

estimating annual NPS pollutant loads. Thus, there is a need to explore the reliability and

consistency of the runoff, pollutant loads, and BMP impacts predicted by EPA STEPL.

The first unconventional process in EPA STEPL is the use of precipitation to compute

direct runoff based on ‘average rainfall per event’ calculated by ‘Rainfall’, ‘Rain Days’,

‘Precipitation Correction Factor’, and ‘Number of Rain Days Correction Factor’ for each

county (Tetra Tech, 2011). However, the SCS-CN method is typically used to simulate

event- or daily-based direct runoff using specific daily rainfall in hydrologic models

(Arnold et al., 1998; Knisel, 1980; Lim et al., 2006; Williams and LaSeur, 1976,

Williams et al., 2000). EPA STEPL calculates direct runoff for a rainfall value (i.e.

‘average rainfall per event’) with the SCS-CN method, and then the model multiplies the

direct runoff from that rainfall by ‘Rain Days’ and ‘Number of Rain Days Correction

Factor’ to reproduce annual direct runoff. This approach using average rainfall values

may not accurately reproduce long-term annual direct runoff because the relationships

between CNs and rainfall are not linear (Tedela et al., 2012; USDA, 1986).

The second unconventional process employed in STEPL is the selection of the default

initial abstraction coefficient (λ) for the SCS-CN method. The SCS-CN method equations

are as follows (USDA, 1986).

Eq. 5.1

Eq. 5.2

Eq. 5.3

103

103

Eq. 5.4

Where, Q is direct runoff (mm), P is precipitation (mm), Ia is initial abstraction (mm), and

λ is initial abstraction coefficient.

The initial abstraction coefficient typically varies from 0.0 to 0.2 (Baltas et al., 2007; Shi

et al., 2009). The CN tables published and currently used typically assume the initial

abstraction coefficient is 0.2. If a different initial abstraction coefficient is used, the CN

values need to be adjusted (Lim et al., 2006; Woodwart et al., 2003). The EPA STEPL

default for the initial abstraction coefficient is 0.0 while the denominator of equation 5. 1

in EPA STEPL is fixed as “P + 0.8 S” which is inconsistent with a default initial

abstraction coefficient of 0.0. Further, STEPL does not adjust CNs when the initial

abstraction is updated by users. Although the EPA STEPL model allows changing the

initial abstraction coefficient, if the user leaves it at the default value, this will likely

result in overestimation of runoff and therefore overestimation of annual pollutant loads,

because the direct runoff calculated with a value of 0.0S for initial abstraction with a

fixed denominator is greater than the direct runoff calculated with a value of 0.2S for

initial abstraction. Most significantly, it is incorrect to use a modified initial abstraction

value with the CNs provided in standard references, as these CN values are for an initial

abstraction of 0.2S (USDA 1986). However, EPA STEPL estimates annual direct runoff

using an initial abstraction of 0.0S and the CNs provided in standard references. If an

adjustment of initial abstraction is made, CN and S both need to be adjusted based on the

change to initial abstraction. This is because the initial abstraction coefficient was

empirically determined to be 0.2 (USDA, 1986), and equations 5.1-4 need to be modified

104

104

when other initial abstraction coefficient values are used (Lim et al., 2006; Woodward et

al., 2003).

EPA STEPL is a spreadsheet model that can be used for annual pollutant load estimation

and simulation of BMP impacts. Our interest in EPA STEPL was driven by our search for

a model capable of simulating BMPs for use with the Web-based Load Duration Curve

Tool (Web-based LDC Tool; https://engineering.purdue.edu/~ldc/). The Web-based LDC

Tool identifies pollutant loads exceeding standards and computes the required pollutant

reduction to meet the standard loads. Moreover, the Web-based LDC Tool identifies

BMPs with the least cost for each landuse, but the BMPs and area to which these

practices should be applied need to be optimized for cost-effective BMP implementation.

Therefore, the objectives of the study were: 1) to examine and correct the annual direct

runoff approach in EPA STEPL, and 2) to develop a web-based model capable of

simulating pollutant load reductions for BMPs.

5.3 Methodology

The study had two purposes, one was to examine the annual direct runoff approach in

EPA STEPL to identify the impacts of its assumptions on runoff computations and

corresponding pollutant loads. And the other was to develop a web-based model that used

a load duration curve tool and the corrected STEPL for identifying appropriate BMPs and

simulating their effects on pollutant load reduction. Five approaches to estimate annual

direct runoff with the SCS-CN method were explored. Daily precipitation data from

twelve National Climate Data Center (NCDC, www.ncdc.noaa.gov) stations were

105

105

collected for conducting the analyses. The first approach was to obtain annual runoff by

aggregating daily direct runoff computed by the SCS-CN method and NCDC daily

precipitation data. The second and third approaches represented current EPA STEPL

approaches using values of 0.0 and 0.2 for initial abstraction coefficients. In the fourth

approach, daily precipitation data were generated by the CLIGEN (Nicks and Lane, 1989)

model, and annual direct runoff was computed from daily direct runoff obtained using

daily precipitation data. The fifth approach was using the EPA STEPL model.

For the second objective of the study, a web-based model was developed based on a

corrected EPA STEPL model. The web-based model provides web interfaces and

employs the CLIGEN model to generate daily precipitation data. Modules to calibrate

model parameters and to optimize BMPs were developed and integrated in the web-based

model.

5.3.1 Annual Direct Runoff Computations

Daily precipitation data were collected from the NCDC to explore the approaches to

compute annual direct runoff using measured daily precipitation data. Daily precipitation

data were generated with the CLIGEN model using the inputs collected from the United

States Department of Agriculture (www.ars.usda.gov/Research/docs.htm?docid=18094)

(Table 5.1, Figure 5.1). Twelve NCDC stations providing long-term daily data were

selected within Indiana, and twelve CLIGEN stations were selected which are

geographically identical to the NCDC stations.

106

106

Five approaches to compute annual direct runoff were established. The approaches were

to investigate the annual direct runoff differences between the methods 1) using daily

precipitation data with equations 5.1~4) using average rainfall per event with equations

5.5~7 (described below) and an initial abstraction coefficient of 0.0 (i.e. original EPA

STEPL method), 3) using average rainfall per event with equations 5.5~7 and initial

abstraction coefficient of 0.2 (i.e. corrected EPA STEPL method), 4) using daily

precipitation data generated by the CLIGEN model with equations 1~4, and 5) using the

EPA STEPL model.

Eq. 5.5

Eq. 5.6

Eq. 5.7

The first approach used equations 5.1~4 to compute daily direct runoff depth with daily

precipitation data from NCDC with an initial abstraction coefficient of 0.2. In other

words, the approach was to represent the general use of the SCS-CN method to compute

annual direct runoff depth.

The second and third approaches were to represent the annual direct runoff depth

computations in EPA STEPL. The EPA STEPL database related to precipitation data was

not used for the second and third approaches, thus preventing possible annual direct

runoff differences due to precipitation data differences, because the purpose of this step

107

107

was comparison of approaches. If the EPA STEPL database had been used, the

approaches would have been incomparable, because different precipitation data would

have led to different annual direct runoff estimates. Therefore, the required inputs (i.e.

annual rainfall, rain days, and correction factors for equation 5.5) in the second and third

approaches were computed based on daily precipitation data collected from NCDC for

each location. Rainfall correction factors are the percentage of annual precipitation

greater than 5 millimeters; the factors were computed by dividing the sum of daily

precipitation values greater than 5 millimeters by sum of daily precipitation values. Rain

day correction factors are the percentage of rain days that have precipitation greater than

5 millimeters (Tetra Tech, 2011). Average rainfall per event (ARE) was computed using

equation 5. An initial abstraction coefficient of 0.0 was used for equation 6 in the second

approach to compute annual direct runoff depth, while the initial abstraction coefficient

in the third approach was 0.2.

The fourth approach to compute annual direct runoff used equations 1~4 with daily

precipitation data generated by CLIGEN. The fourth approach was identical to the first

approach, except that daily precipitation data generated by CLIGEN were used in the

fourth approach. The EPA STEPL model, the fifth approach, was used to compute annual

direct runoff with the model database providing annual rainfall, rainfall correction factor,

rain days, and rain days correction factor for the counties in which NCDC stations are

located.

108

108

Table 5.1 Daily Precipitation Collection from NCDC

Station Number Station Name Period

USC00120676 Berne WWTP 1949-1999

USC00121747 Columbus 1921-2010

USC00121869 Crane NSA 1943-1959

USW00014848 South Bend Michiana Regional Airport 1948-2012

USC00128999 Valparaiso Waterworks 1985-2000

USC00129138 Wabash 1989-2004

USC00129430 West Lafayette 6 NW 1989-2012

USC00129678 Winchester AAP 3 1989-2012

USC00121229 Cambridge City 3 N 1975-1992

USC00123547 Greensburg 1933-1941

USC00127125 Princeton 1 W 1898-1952

USC00127875 Scottsburg 1897-2000

109

109

Figure 5.1 Location Map of NCDC and CLIGEN Stations within Indiana

5.3.2 Web Interfaces and CLIGEN Use

The EPA STEPL model includes a database to provide the input parameters describing

precipitation and soil erosion estimation, and the model also has a user-friendly interface

within Microsoft Excel. A web-based model (STEPL WEB;

https://engineering.purdue.edu/~ldc/STEPL/) was developed following a similar overall

program flow to that used in EPA STEPL, which is to define subwatershed characteristics

110

110

and then to identify watershed inputs and BMPs in turn. However, HTML interfaces

instead of Visual Basic and MS Excel interfaces were created, with the core engine

programmed in the FORTRAN programming language to perform the calculations

computed in MS Excel in EPA STEPL. A number of Python CGIs and Java Script-based

HTML were programmed to handle inputs from databases.

EPA STEPL computes annual direct runoff using equations 5.5~7, but this was replaced

with two approaches in the web-based tool. One uses 0.2S for the initial abstraction,

because CNs commonly available and used are based on a value of 0.2S for the initial

abstraction. The other approach employs CLIGEN to generate long-term, daily

precipitation data. Zhang and Garbrecht (2003) reported that the CLIGEN model showed

reasonable means and standard deviations of daily precipitation amounts when 100 years

of precipitation data were generated. The model performed well with 20 years

precipitation data generation for annual amounts, monthly amounts, and number of

events (Elliot and Arnold, 2001). Lim and Engel (2003) employed the model to generate

climate data for a web-based model. Therefore, the web-based model employs CLIGEN

for daily precipitation data generation and provides 2,368 locations with CLIGEN inputs

collected from the United States Department of Agriculture

(http://www.ars.usda.gov/Research/docs.htm?docid=18094). The daily precipitation data

generated by CLIGEN were used to estimate annual direct runoff in STEPL WEB.

111

111

5.3.3 Auto-Calibration Modules

STEPL WEB computes annual direct runoff using the SCS-CN method, annual

contribution to shallow groundwater by soil infiltration fractions for precipitation, and

annual pollutant loads by pollutant coefficients multiplied by annual direct runoff and

groundwater (Figure 5.2). Sediment load is computed based on the Universal Soil Loss

Equation (USLE) and sediment delivery ratio (SDR, Equation 5.10 and 5.11; USDA-

NRCS, 1983). STEPL WEB has two sources of nutrient loads (N, P, and BOD). The first

source is the nutrient loads from landuses, which are computed by pollutant coefficients

and annual direct runoff and shallow groundwater contribution. The second source is

nutrient loads in sediment, which are computed by soil nutrient concentrations and

sediment load (Tetra Tech, 2011). Therefore, CNs and soil infiltration fractions should be

calibrated for annual direct runoff and annual shallow groundwater so that nutrient loads

are correctly computed. Since sediment load is computed by USLE and SDR, the SDR

(Equations 5.10 and 5.11) can be calibrated for sediment load. Pollutant coefficients also

need to be calibrated for nutrient loads.

Since CNs are defined by landuse and HSG, the relationships between CNs need to be

maintained. For instance, CNs for urban are typically greater than forest, and CNs for

HSG A are smaller than HSG D for a given landuse. One approach to calibrate CNs is to

multiply default CNs by an identical fraction (or percentage, Frcn in Equation 5.8); in

other words, CNs are increased or decreased by an identical percentage. Annual shallow

groundwater is computed based on the soil infiltration fractions for precipitation which

are defined by landuse and HSG. Therefore, the soil infiltration fractions for precipitation

112

112

can be calibrated by the approach used for calibrating CNs. The pollutant coefficients are

defined by landuse, and the calibration approach can be applied (Frnt,1 for N, Frnt,2 for P,

and Frnt,3 for BOD in Equation 5.9). USLE factors are defined by landuses in STEPL

WEB, with factors based on those from the EPA STEPL database. Since sediment loads

are computed by multiplying soil erosion (USLE factors) and SDR, sediment loads can

be calibrated by SDR calibration rather than by individual USLE factors (Park et al.,

2010); the fraction in equations 10 and 11 is calibrated for annual sediment load. The

fraction is multiplied by SDR, which also implies that USLE factors for soil erosion are

increased or decreased by an identical fraction.

Two modules were developed to calibrate the fractions/coefficients in equations 5.8-11.

One module uses a genetic-algorithm (GA), and the other uses the bisection method. The

algorithm is based on the principles of ‘natural evolution’ and ‘survival of the fittest’, and

sets up a population of individuals for a given problem, consisting of a stochastic strategy

which imitates the evolution of natural organisms (Holland, 1975; Lim et al., 2010).

Solving sophisticated problems effectively, the algorithm has been used for various areas

such as business, engineering, and science (Tog˘an and Dalog˘lu, 2008), and is deemed

as a powerful method to solve highly complex problems.

The alternative module to calibrate the fractions in equations 5.8-11 uses the bisection

method. The bisection method is a simple and straightforward numerical method, is

applicable to continuous functions, and has been applied to solve simple problems

(Ashkar and Mahdi, 2006; Hong et al., 2006; Neupauer and Brochers, 2001). The method

113

113

sets intervals and selects the midpoint which shows the least error during iteration,

narrowing the intervals. Initial intervals (e.g. 50% to 150% for CNs) need to be set, and

iterative computations are required until the error (e.g. difference between observed

annual direct runoff and estimated annual direct runoff) is zero or less than a specified

tolerance. The module in STEPL WEB performs the iterations until the intervals are in

the thousandth digits for the fractions in equation 5.8-11.

Eq. 5.8

Eq. 5.9

Eq. 5.10

Eq. 5.11

Where, Frcn is calibration parameter or fraction for CN, Frnt,i is calibration parameter for

pollutant coefficients, Frsdr is calibration parameter for SDR, and A is watershed area in

square kilometers.

114

114

Figure 5.2 Annual Direct Runoff, Groundwater, and Pollutant Load Computation

in STEPL WEB

5.3.4 Optimization of Best Management Practices

Since estimated annual cost of BMP implementation in a watershed is computed by BMP

cost per unit area and applied area of BMP (AREABMP), both BMP cost and AREABMP

need to be considered when identifying the most cost-effective BMP implementation. In

other words, the BMP with least cost per unit mass reduction (i.e. dollars per ton of

reduction) needs to be identified and applied, and then AREABMP needs to be minimized

as long as the estimated reduction meets the required reduction. In addition, use of a

BMP on 100% of landuse area may not be possible. For instance, it may not be possible

115

115

to apply a BMP to 90% of cropland, if the BMP is already applied on 30% of cropland. In

this circumstance, the BMP could only be applied to a maximum of 70% of cropland area.

STEPL WEB estimates BMP implementation cost based on establishment, maintenance,

and opportunity costs using a cost function (equation 5.12; Arabi et al., 2006). The model

computes the costs per unit of pollutant mass reduction for BMPs and establishes a

priority list of BMPs to apply based on the cost per unit mass of pollutant reduction.

[∑ ] Eq. 5.12

Where, ct is BMP implementation cost, c0 is establishment cost, rm is ratio of annual

maintenance cost to establishment cost, s is interest rate, and td is BMP design life.

After establishment of the BMP list, STEPL WEB optimizes AREABMP, increasing

AREABMP of the first BMP and estimating annual pollutant reductions for that BMP.

Iterative simulations are required with AREABMP increasing up to the allowable

maximum area (e.g. 100% or 70% for the circumstance stated above) until the estimated

pollutant reduction is greater than the required reduction. If estimated annual pollutant

reduction for the first BMP does not meet the required reduction, the second most cost

effective BMP needs to be simulated with AREABMP increased iteratively for the second

BMP. In brief, the BMP optimization process is an iterative simulation, adding BMPs in

turn and increasing AREABMP for each BMP.

116

116

5.4 Results

5.4.1 Annual Direct Runoff Computations

To explore differences in runoff computation approaches, a CN of 85 was selected as a

test case, and this represents cropland with a C hydrologic soil group in the EPA STEPL

model. Average annual direct runoff (depth, mm) values were computed with the five

approaches (Table 5.2, Figure 5.3). The annual direct runoff for the first approach (i.e.

General Use of SCS-CN method with daily precipitation data, GU) ranged from 117.4

mm (USC00120676) to 191.4 mm (USC00127125). However, the annual direct runoff

estimated by the second approach (i.e. Original EPA STEPL approach, OS) ranged from

242.5 mm (USW00014848) to 339.9 mm (USC00127125). Although identical

precipitation data were used for the approaches, the difference between the OS and GU

approaches was a minimum of 77.6 % (overestimated, USC00127125) and maximum of

111.8 % (overestimated, USC00120676).

The third approach (i.e. Corrected EPA STEPL with initial abstraction coefficient of 0.2,

CS) resulted in underestimation compared to the GU for all stations. Moreover, the

approach showed no direct runoff when the average rainfall per event for the stations was

smaller than 0.2S calculated by CN (equation 5.3), because average direct runoff depth

(equation 5.6) became 0.0 mm based on equation 5.2. Thus, CN needs to be greater than

a critical value (TC in table 5.2) to generate annual direct runoff greater than 0.0 mm. In

other words, there will be no annual direct runoff in the area with CN values smaller than

TC in Table 5.2.

117

117

Daily precipitation data generated by CLIGEN were used to compute daily direct runoff

in the fourth approach (CL). The difference between the CL and GU approaches was a

minimum of 0.9 % (overestimated, USC00121869) and maximum of 20.5 %

(underestimated, USC00128999). In addition, the annual direct runoff from the CL

approach demonstrated smaller differences than the annual direct runoff by the OS and

CS approaches.

The EPA STEPL model was used in the fifth approach (ST), and the estimated annual

direct runoff ranged from 190.7 mm (USW00014848) to 357.0 mm (USC00127125). The

difference between the GU and ST approaches was a minimum of 43.1 % (overestimated,

USC00128999) and maximum of 152.9 % (overestimated, USC00123547). Similar to the

OS approach, the ST results showed large differences relative to GU results. The OS and

ST approaches overestimated average annual direct runoff compared to the GU approach,

since the denominators in equation 5.1 for the OS and ST approaches were greater with

the initial abstraction coefficient of 0.0 than the denominator with the initial abstraction

coefficient of 0.2 for the GU approach.

The approaches were compared with other CNs for different landuses and HSGs.

Simulated runoff for other CN values displayed identical trends in average annual direct

runoff. In other words, annual direct runoff using the OS, CS, and ST approaches were

much greater compared to the annual direct runoff using the GU approach. The results

indicate that annual direct runoff needs to be the aggregate of daily direct runoff based on

daily precipitation data. In other words, the approach to compute annual direct runoff

118

118

using the SCS-CN method with average rainfall per event (i.e. the OS and CS approaches)

would lead to annual direct runoff that is much larger than values estimated when the CN

runoff method is applied as it was intended. Comparing the OS approach to the CS

approach, the OS approach overestimated annual direct runoff at all stations, because the

denominators in equation 6 for the OS approach were smaller than those for the CS

approach, since the initial abstraction coefficients were 0.0 for OS and 0.2 for CS. Even

though the approach using average rainfall per event was corrected (CS), the approach

resulted in underestimation compared to the GU approach. In addition, the CS approach

estimated annual direct runoff greater than 0 mm only when CN was greater than a

critical value (TC), and therefore there won’t be annual direct runoff in areas with small

CNs (e.g. forest). The ST approach resulted in not only overestimation of runoff but also

a large difference compared to runoff from the GU approach. Thus, it was concluded that

the approach using average rainfall per event currently used in EPA STEPL model is not

applicable for computing annual direct runoff.

The SCS-CN method was developed based on the relationship between rainfall and direct

runoff (USDA, 1985; USDA, 1986). SCS-CN is an empirical model composed of two

parameters which are CN and initial abstraction. The initial abstraction was empirically

determined to be 0.2S (Garen and Moore, 2005; USDA, 1986). The CN tables published

and currently used typically assume the initial abstraction coefficient is 0.2. The

equations in the SCS-CN method need to be modified when other initial abstractions are

used (Lim et al., 2006; Woodward et al., 2003). In addition, the SCS-CN method using an

initial abstraction coefficient of 0.2 is typically used to simulate daily-based direct runoff

119

119

using daily rainfall in hydrologic models (Arnold et al., 1998; Knisel, 1980; Lim et al.,

2006; Williams and LaSeur, 1976, Williams et al., 2000).

Since the SCS-CN method is an empirical model, its assumptions must be maintained if it

is to provide accurate runoff estimates. Therefore, the initial abstraction needs to be 0.2S

for the CN values commonly used, otherwise CNs need to be adjusted for other initial

abstraction values. The method must also be used for event- or daily-based direct runoff

with event- or daily-based rainfall. In other words, annual direct runoff needs to be

computed by aggregating daily direct runoff obtained using daily rainfall. If the

assumptions are not maintained in use of SCS-CN method, large differences in results

compared to those for general use of SCS-CN method may result. For instance, the OS

approach used a different initial abstraction (i.e. 0.0S) and did not adjust CNs, and thus

the approach resulted in overestimation compared to the GU approach. When the initial

abstraction coefficient was corrected (CS), annual direct runoff resulted in

underestimation compared to the GU approach, because the CS approach used a single

rainfall value (average rainfall per event) for annual direct runoff estimation.

120

120

Table 5.2 Annual Precipitation and Direct Runoff Computations

Station

Precipitation

(mm) Annual Direct Runoff Depth (mm)

TC8

PN1 PC

2 GU

3 CL

4 CS

5 OS

6 ST

7

USC00120676 963.8 940.7 117.4 107.4 41.6 248.6 216.0 78

USC00121747 1077.0 1027.1 164.9 147.9 68.0 309.4 327.8 76

USC00121869 1128.1 1146.3 187.7 189.3 76.1 334.4 331.8 76

USW00014848 970.7 950.5 121.1 114.5 40.2 242.5 190.7 78

USC00128999 1008.6 958.9 152.6 121.4 56.0 274.4 218.4 76

USC00129138 1025.7 925.5 143.8 120.8 54.5 280.4 215.2 77

USC00129430 996.0 935.1 145.3 125.1 55.6 274.8 215.2 77

USC00129678 931.3 951.8 124.8 122.6 45.8 250.8 216.2 77

USC00121229 1044.5 1027.8 141.1 130.3 55.6 284.4 249.1 77

USC00123547 994.8 1047.9 129.6 144.1 48.4 263.9 327.8 77

USC00127125 1080.4 1069.2 191.4 176.4 85.6 339.9 357.0 75

USC00127875 1097.9 1078.6 173.0 161.6 69.6 320.8 335.7 76 1PN: Annual Precipitation from NCDC

2PC: Annual precipitation from CLIGEN

3GU: Annual direct runoff by daily direct runoff computation

4CL: Annual direct runoff with daily precipitation data generated by CLIGEN

5CS: EPA STEPL with corrected initial abstraction of 0.2S

6OS: Original EPA STEPL

7ST: Annual direct runoff by EPA STEPL model

8TC: Curve number threshold to generate direct runoff by CS

121

121

Figure 5.3 Comparison of Annual Direct Runoff by Different Approaches

5.4.2 Application of STEPL WEB

To demonstrate the BMP simulation ability of STEPL WEB, a watershed named

Tippecanoe River at North Webster in northeastern Indiana (Figure 5.4) was selected.

The spatial input datasets to delineate the watershed and to prepare STEPL WEB inputs

were the 30 meter resolution Digital Elevation Model (DEM) from the United States

Geological Survey (USGS) National Elevation Dataset, the National Land Cover Dataset

2006 (NLCD 2006) from USGS, and Soil Survey Geographic Database (SSURGO) from

United States Department of Agriculture (USDA). The watershed area is 129.1 km2, with

61.3 % of the watershed landuse being cropland (Figure 5.4, Table 5.3).

122

122

The Web-based LDC Tool was used to collect flow data from the USGS station number

03330241 (Tippecanoe River at North Webster, Indiana). Total phosphorus data were

collected from the Indiana Department of Environmental Management (IDEM) at the

same location. The period selected to develop a load duration curve was from 1998-03-25

to 2010-11-17, based on water quality data period. The standard concentration for total

phosphorus was set to 0.08 mg/l (IDEM, 2013). Annual direct runoff, annual baseflow,

and annual sediment load were computed with the Web-based LDC Tool. Nutrient loads

in STEPL WEB are computed based on loads from runoff as well as sediment, and

therefore annual sediment load is required to calibrate model parameters for annual

nutrient loads. LOADEST (Runkel et al., 2004) was used for annual sediment load

calculation. The model parameters in STEPL WEB were calibrated (Table 5.4, Table 5.5)

using the auto-calibration module using the bisection method. The Web-based LDC Tool

identified the required phosphorus pollutant reduction percentage to be 11% to meet the

standard load. STEPL WEB then established cost-effective BMP lists based on least cost

per unit of pollutant reduction. The most cost-effective BMP was ‘filter strip’ for

cropland, ‘reduced tillage systems’ and ‘contour farming’ were the second and third most

cost-effective BMPs. The fourth and fifth most cost-effective BMPs were ‘vegetated

filter strips’ and ‘bioretention facility’ for urban.

Two scenarios for BMP application in the watershed were simulated. One was the

application of ‘filter strip’ on up to 79 km2 (100% of the cropland area). The other was

application of ‘filter strip’ on up to 10 km2 in cropland area, ‘reduced tillage systems’ on

up to 10 km2 of cropland, ‘contour farming’ was considered not applicable, and

123

123

‘vegetative filter strips’ were possible on up to 5 km2 in urban area. In the first scenario,

‘filter strip’ for cropland needed to be applied to 17 km2 of cropland to reach the pollutant

reduction goal, with an estimated annual cost of $12,870. In the second scenario, the

management practice ‘filter strip’ needed to be applied to 10 km2 of cropland, ‘reduced

tillage systems’ needed to be applied to 10 km2 of cropland, and ‘vegetative filter strips’

needed to be applied to 4 km2 of urban as the optimal solution. The estimated annual cost

was $17,400 which resulted from $7,650 for ‘filter strip’ which provided 147.5 kg/year

phosphorus reduction in cropland, $7,710 for ‘reduced tillage systems’ with 95.6 kg/year

phosphorus reduction in cropland, and $2,040 for ‘vegetative filter strips’ with 1.0

kg/year phosphorus reduction in urban. Both scenarios met the required reduction;

however, the estimated annual cost of the second scenario was more expensive than the

first scenario, but the first scenario may not be feasible if ‘filter strip’ application is not

possible to implement on 17 km2 or more of cropland area.

124

124

Figure 5.4 Landuses of Tippecanoe River at North Webster Watershed

Table 5.3 Landuse Distribution for Study Watershed

Landuse Area (km2) Percentage (%)

Urban 10.4 8.1

Cropland 79.2 61.3

Pasture 2.7 2.1

Forest 21.1 16.3

Water 15.7 12.2

Total 129.1 100.0

125

125

Table 5.4 STEPL WEB Parameters (Default / Calibrated)

HSG A B C D

Curve

Number

Urban 83 / 90 89 / 97 92 / 98 93 / 98

Cropland 67 / 73 78 / 85 85 / 92 89 / 97

Pastureland 49 / 53 69 / 75 79 / 86 84 / 91

Forest 39 / 42 60 / 65 73 / 79 79 / 86

Soil

Infiltration

Fraction

Urban 0.36 / 0.31 0.24 / 0.21 0.12 / 0.01 0.06 / 0.05

Cropland 0.45 / 0.39 0.30 / 0.26 0.15 / 0.13 0.08 / 0.07

Pastureland 0.45 / 0.39 0.30 / 0.26 0.15 / 0.13 0.08 / 0.07

Forest 0.45 / 0.39 0.30 / 0.26 0.15 / 0.13 0.08 / 0.07

Pollutant

Coefficient

Phosphorus

(mg/l)

Urban 0.30 / 0.18

Cropland 0.50 / 0.30

Pastureland 0.30 / 0.18

Forest 0.10 / 0.06

Sediment

Delivery

Ratio

/

Table 5.5 Annual Direct Runoff, Baseflow, and Sediment load

Measured Predicted

Direct Runoff 16.7 × 106 m

3/year 16.7 × 10

6 m

3/year

Baseflow 25.9 × 106 m

3/year 25.6 × 10

6 m

3/year

Sediment 237 ton/year 242 ton/year

Phosphorus 2.3 ton/year 2.1 ton/year

5.5 Conclusions

Protection of water quality in streams and rivers is important and one regulatory approach

to protecting these resources is use of TMDLs. Many models are used to develop TMDLs

and to simulate the ability of BMPs to reduce pollutant loads to meet TMDLs. The EPA

STEPL model is used for TMDL assessment and is capable of estimating sediment load,

nutrient loads, and BOD5. In addition, the model is used to estimate annual pollutant load

reductions for various BMPs. The model employs SCS-CN methods to estimate annual

direct runoff. However, the model uses an unconventional approach for annual direct

126

126

runoff calculation. The initial abstraction was empirically found to be 0.2S in the original

development of the CN runoff method, and thus the equations in the SCS-CN method

need to be modified when other initial abstraction coefficients are used. In addition, CNs

are typically used to compute event- or daily-based direct runoff. Annual direct runoff

using the EPA STEPL approach showed large differences compared to the annual direct

runoff computed by general use of the SCS-CN method. Annual direct runoff computed

from generated daily precipitation data from CLIGEN showed smaller differences than

values computed from EPA STEPL approaches.

A web-based model was developed based on the corrected EPA STEPL model, which

employs the CLIGEN model to generate daily precipitation data and to compute annual

direct runoff from daily direct runoff. The web-based model provides HTML interfaces

for watershed inputs and a map-based interface for CLIGEN stations. In addition, the

model was integrated with the Web-based LDC Tool so that suggested BMP scenarios

can be simulated. Since the BMPs suggested by the Web-based LDC Tool need to be

optimized, STEPL WEB establishes a priority list of BMPs based on implementation cost

per mass of pollutant reduction, and then the model performs iterative simulations to

identify the most cost-effective BMP implementation plans. The web-based model will

be useful for conducting BMP simulations to meet TMDL standard loads.

127

127

5.6 References

Arnold, J. G., Srinivasan, R., Muttiah, R. S., and J. R. Williams, 1998. Large area

hydrologic modeling and assessment – part 1: model development. Journal of the

American Water Resources Association 34(1): 73-89.

Arabi, M., Govindaraju R. S., Hantush, M. M., 2006. Cost-effective allocation of

watershed management practices using a genetic algorithm. Water Resources

Research 42, W10429, DOI: 10.1029/2006WR004931.

Ashkar, F. and S. Mahdi. 2006. Fitting the log-logistic distribution by generalized

moments. Journal of Hydrology 328. 694-703.

Baltas, E. A., Dervos, N. A., and M. A. Mimikou. 2007. Technical Note: Determination

of the SCS initial abstraction ratio in an experimental watershed in Greece.

Hydrology and Earth System Science. 11: 1825-1829.

Commonwealth Biomonitoring. 2009 January. Little River watershed diagnostic study.

Commonwealth Biomonitoring, Indianapolis, Indiana.

Elliot, W. J. and C. D. Arnold. 2001. Validation of the Weather Generator CLIGEN with

Precipitation Data from Uganda. Transactions of the ASAE. 44(1): 53-58.

FDEP (Florida Department of Environmental Protection). 2009 September. State of

Florida FY2010 section 319(h) grant work plan. 3900 Commonwealth Boulevard

M.S. 49 Tallahassee, Florida 32399.

Garai, G. and B. B. Chaudhuri. 2007. A distributed hierarchical genetic algorithm for

efficient optimization and pattern matching. The Journal of the Pattern Recognition

40. 212-228.

128

128

Garen, D. C. and D. S. Moore. 2005. Curve number hydrology in water quality modeling:

uses, abuses, and future directions. Journal of the American Water Resources

Association.41: 377-388.

Gusselin, L., Tye-Gingras, M., and F. Mathieu-Potvin. 2009. Review of utilization of

genetic algorithms in heat transfer problems. International Journal of Heat and Mass

Transfer 52, 2169–2188.

Holland, J. H., 1975. Adaptation in Natural and Artificial Systems. University of

Michigan Press, Ann Arbor, MI. 183.

Hong, Y., Yeh, N., and J. Chen. 2006. The simplified methods of evaluating detention

storage volume for small catchment. Ecological Engineering 26. 355-364.

Indiana Department of Environmental Management (IDEM), July 2007. Total maximum

daily load for Escherichia coli (E. coli) for the East Fork Whitewater River

Watershed, Wayne, Union, Fayette, and Franklin Counties.

Indiana Department of Environmental Management (IDEM), 2013. Water quality targets.

Available at < http://www.in.gov/idem/nps/3484.htm>. Accessed in October 2013.

Kang, M. S., Park, S. W., Lee, J. J., and K. H. Yoo. 2006. Applying SWAT for TMDL

Programs to a Small Watershed Containing Rice Paddy Fields. Agricultural Water

Management. 79, 72-92.

Keegstra, N., Parks, J., and L. V. Linden. 2012 May. Whiskey Creek final report. Calvin

College, 201 Burton SE Grand Rapids, MI 49546

Knisel, W. G. 1980. CREAMS, a field-scale model for chemicals, runoff, and erosion

from agricultural management systems. Conservation Report 26, USDA Agriculture

Research Service, Washington, DC

129

129

Lim, K. J. and B. A. Engel. 2003. Extension and Enhancement of National Agricultural

Pesticide Risk Analysis (NAPRA) WWW Decision Support System to Include

Nutrients. Computers and Electronics in Agriculture. 38: 227-236.

Lim, K. J., Engel, B. A., Tang, Z., Muthukrishnan, S., Choi, J., and K. Kim. 2006. Effects

of calibration on L-THIA GIS runoff and pollutant estimation. Journal of

Environmental Management. 78: 35-43.

Lim, K. J., Park, Y. S., Kim, J., Shin, Y. C., Kim, N. W., Kim, S. J., Jeon, J. H., and B. A.

Engel. 2010. Development of genetic algorithm-based optimization module in

WHAT system for hydrograph analysis and model application. Computers &

Geosciences. 36: 936-944.

Neupauer, R. M. and B. Borchers. 2001. A MATLAB implementation of the minimum

relative entropy method for linear inverse problems. Computers & Geosciences 27.

757-762.

Nicks, A. D. and L. J. Lane. 1989. Weather Generator. Chapter 2 in USDA-Water

Erosion Prediction Project: Hillslope Profile Version. L. J. Lane, and M. A. Nearing,

ed. NSERL Report No. 2. West Lafayette, Ind.: USDA–ARS National Soil Erosion

Research Laboratory.

Park, Y. S., Kim, J., Kim, N. W., Kim, S. J., Jeon, J. H., Engel, B. A., Jang, W., and K. J.

Lim. 2010. Development of new R, C, and SDR modules for the SATEEC GIS

system. Computers & Geosciences 36(6), 726-734.

Patil, A and Z. Deng. 2011. Bayesian Approach to Estimating Margin of Safety for Total

Maximum Daily Load Development. Journal of Environmental Management. 92,

910-918.

130

130

Pease, L. M., Oduor, P., and G. Padmanabhan. 2010. Estimating Sediment, Nitrogen, and

Phosphorus Loads from the Pipestem Creek Watershed, North Dakota, using

AnnAGNPS. Computers & Geosciences. 36, 282-291.

Richards, C. E., Munster, C. L., Vietor, D. M., Arnold, J. G., and R. White. 2008.

Assessment of a Turfgrass Sod Best Management Practice on Water Quality in a

Suburban Watershed. Journal of Environmental Management. 86, 229-245.

Runkel, R. L., Crawford, C. G., Cohn, T. A., 2004. Load estimator (LOADEST): A



Shi, Z., Chen, L., Fang, N., Qin, D., and C. Cai. 2009. Research on the SCS-CN initial

abstraction ratio using rainfall-runoff event analysis in the Three Gorges Area, China.

CATENA. 77(1-7).

Tedela, N. H., McCutcheon, S. C., Rasmussen, T. C., Hawkins, R. H., Swank, W. T.,

Campbell, J. L., Adams, M. B., Jackson C. R., E. W. Tollner. 2012. Runoff curve

numbers for 10 small forested watersheds in the mountains of the eastern United

States. Journal of Hydrologic Engineers. 17:1188-1198.

Tetra Tech, Inc. 2011. User’s Guide Spreadsheet tool for the estimation of pollutant load

(STEPL) version 4.1. Tetra Tech, Inc. 10306 Eaton Place, Suite 340 Fairfax, VA

22003.

Tog˘an, V., Dalog˘lu, T. A., 2008. An improved genetic algorithm with initial population

strategy and self-adaptive member grouping. Computers and Structures 86, 1204-

1218.

131

131

USDA-NRCS (U.S. Department of Agriculture, Natural Resources Conservation Service).

1983. Sediment sources, yields, and delivery ratios. In National Engineering

Handbook, Chapter 6, Section 3, Sedimentation.


1985. National Engineering Handbook, Section 4 Hydrology.


1986. Urban hydrology for small watersheds. Washington, DC

Williams J. R. and V. LaSeur. 1976. Water yield model using SCS CN curve numbers.

Journal of Hydraulic Engineering. 102(HY9): 1241-1253.

Williams J. R., Arnold, J. G., and R. Srinivasan. 2000. The APEX model. BRC Report

No. 00–06 Blackland Research and Extension Center, Texas Agricultural

Experiment Station, Texas A & M University System, Temple, TX.

Woodward, D.E., Hawkins, R.H., Jiang, R., Hjelmfelt, A.T., and J.A. Van Mullem. 2003.

Runoff Curve Number method: examination of the initial abstraction ratio. American

Society of Civil Engineers. Conference Proceeding Paper. World Water &

Environmental Resources Congress 2003 and Related Symposia.

Zhang, X. C. and J. D. Garbrecht. 2003. Evaluation of CLIGEN Precipitation Parameters

and Their Implication on WEPP Runoff and Erosion Prediction. Transactions of the

ASAE. 46(2): 311-320.

132

132

CHAPTER 6. SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS

6.1 Summary

Flow and load duration curves (FDC and LDC) have been used in many states for total

maximum daily load (TMDL) implementations. A web-based LDC Tool was developed

to facilitate development of FDC and LDC. The Web-based LDC Tool provided various

benefits in development of FDC and LDC, deriving streamflow data from the USGS data

server and integrating with LOADEST and LOADIN to generate daily pollutant loads

from intermittent water quality data. Therefore, there was a need to evaluate the

predictive ability of LOADEST and LOADIN for annual pollutant loads. The Web-based

LDC Tool requires that users prepare water quality data which are essential inputs in

LDC development. Moreover, a model capable of simulating BMPs was required for the

Web-based LDC Tool, since the Web-based LDC Tool is not capable of simulating

BMPs to meet standard pollutant loads.

Therefore, this research was conducted to explore pollutant load regression model

behaviors in estimating annual pollutant loads and to develop web-based tools supporting

TMDL implementations. The specific objectives of the dissertation were to:

1. Evaluate water quality sampling frequency strategies and 10 regression models to

estimate annual pollutant load. The regression models evaluated in this study were

133

133

nine regression models of LOADEST (numbered 1 to 9) and the LOADIN regression

model using a genetic algorithm.

2. Identify the correlation between LOADEST model behavior and water quality datasets

for various proportions of water quality data from storm events.

3. Improve the Web-based LDC Tool to allow collection of USGS and STORET/WQX

WQ data via web access.

4. Enhance the Web-based LDC Tool to identify BMPs to reduce annual pollutant load

against required reduction of pollutant load.

5. Develop a web-based model to estimate annual pollutant loads and to simulate BMPs

to meet the required reduction of annual pollutant load.

Ten pollutant load regression models from LOADEST and LOADIN were evaluated with

six water quality sampling strategies for sediment, nitrogen, and phosphorus, under the

first objective. A measured ‘true load’ was required to evaluate pollutant load regression

models. Daily water quality data were collected from the USGS Water-Quality Data for

the Nation and the National Center for Water Quality Research of Heidelberg University.

The collected water quality data were artificially degraded with six sampling strategies.

The first three sampling strategies were fixed interval sampling strategies, and the other

three sampling strategies were fixed interval with storm event sampling. The terms

affecting the pollutant regression models’ behaviors were investigated since the daily

water quality parameters used in the study showed different relationships with

streamflow. The predictive ability of pollutant load regression models was evaluated for

accurate and precise annual pollutant load estimation.

134

134

The second objective was to identify the correlation between LOADEST model behavior

and water quality datasets. The regression model numbered 3 in LOADEST was selected

which showed the most accurate and precise annual sediment load estimates. The first

step was to identify the correlation between annual sediment load estimates and statistics

of water quality datasets. A regression equation was developed to determine the required

mean flow in the calibration data (MFC), since MFC was correlated to the error in annual

sediment load estimates. The second step was to evaluate the regression equation;

therefore the regression equation was applied to improve the poorest annual sediment

load estimates from the first objective. The objective demonstrated several distinct

features in annual sediment load estimation using LOADEST. One is that high sampling

frequency does not necessarily improve the accuracy and precision of estimated loads.

The other feature observed was that a water quality dataset needs to represent the

distribution of given data and should not to be biased toward a specific flow regime.

The Web-based LDC Tool was enhanced to allow collection of water quality data via

web access and to suggest best management practices (BMPs). The tool allowed use of

USGS streamflow via web access, but users had to supply water quality data. Water

quality data are an essential input in load duration curve (LDC) development with flow

data. Water quality data can be prepared by the users, or the data can be collected from

USGS, Environmental Protection Agency (EPA), or the Water Quality Portal (WQP).

Therefore, the web tool was upgraded to automatically access water quality data from the

USGS and STORET/WQX for all states in the U.S. via web access. LDC is one approach

used to develop total maximum daily loads (TMDLs) by identifying specific flow

135

135

regimes violating water quality standards. Exceeding pollutant loads in each flow regime

implies potentially different sources of pollutant loads. A BMP scenario needs to be

based on the flow regime in which pollutant loads are exceeded. Therefore, the web tool

was improved to select the BMPs able to reduce pollutant loads corresponding to the flow

regime for which pollutant loads are exceeded.

Various models have been used to identify strategies to attain pollutant load limits by

facilitating creation of plans identifying BMPs to reduce pollutant loads. Spreadsheet

Tool for the Estimation of Pollutant Load (STEPL) is a spreadsheet model to compute

annual direct runoff, sediment load, nutrient loads, and 5-day biological oxygen demand

(BOD5). The model computes direct runoff in a watershed based on the Soil Conservation

Service Curve Number (SCS-CN) method, but the model uses processes that are not

scientifically based in calculating annual direct runoff. Therefore, four approaches to

estimate annual direct runoff with the SCS-CN method were explored, including the

STEPL approach. The first approach was to obtain annual runoff by aggregating daily

direct runoff computed by the SCS-CN method and measured daily precipitation data.

The second and third approaches represented current STEPL approaches using values of

0.0 and 0.2 for initial abstraction coefficients. In the fourth approach, daily precipitation

data were generated by the CLIGEN model. In addition, a web-based model was

developed to simulate BMPs, and two modules were developed and integrated to

calibrate model parameters and to optimize BMPs for cost-effective BMP

implementations.

136

136

6.2 Conclusions

The research included two major parts. One was to explore the predictive ability of

pollutant load regression models, and the other was to enhance and to develop web-based

tools. The findings from analysis of pollutant load regression model behavior contribute

to improved accuracy of pollutant load regression models, and the web-based tools

enhanced and developed in the research contribute to the advancement of TMDL plan

development.

The results from this research showed that:

Use of extensive water quality data in regression models to estimate pollutant

loads did not necessarily lead to precise and accurate annual pollutant load

estimates. For instance, higher sampling frequency led to better precision in

sediment load estimates, but this did not occur in phosphorus load estimates.

Water quality data to estimate annual pollutant loads needs to consist of an

appropriate proportion of water quality data from storm events, 20-30% of water

quality data from high-flow (i.e. the upper 10 percent of flows for a given

analysis period) provided the closest estimated sediment and phosphorus loads to

measured loads. Extrapolation needs to be avoided in use of pollutant

concentrations within regression models for annual pollutant load estimates,

since the fixed sampling frequencies supplemented with stratified sampling

strategies led to more accurate and more precise pollutant load estimates than the

fixed interval sampling strategies. The water quality parameters showed different

relationships with streamflow, and therefore a regression model needs to be

employed based on the behaviors of water quality parameters.

137

137

The mean of flow to calibrate (MFC) regression model coefficients were

correlated to the error in annual sediment load estimations. Use of the water

quality dataset with appropriate MFC showed smaller errors than use of a large

amount of data (e.g. daily water quality data). The load estimates differed

significantly from the measured loads if MFC was too small or too great; in other

words, use of water quality data biased toward low or high flows led to great

error. The research indicates that a water quality dataset needs to represent the

distribution of given data.

The Web-based LDC Tool was upgraded to allow use of water quality data from

USGS and WQP providing the water quality for all location in the United States.

LOADEST is used for annual pollutant load estimation in the Web-based LDC

Tool. A module to adjust MFC of a given water quality dataset was developed

and integrated into the web-based tool based on the results from the first and

second objectives. The web-based tool identifies the required pollutant

reductions for five flow regimes and makes lists of BMPs corresponding to the

flow regime for which pollutant loads are exceeded.

The approach to compute annual direct runoff in EPA STEPL was examined.

Annual direct runoff using the EPA STEPL approach showed large differences

compared to the annual direct runoff computed by general use of SCS-CN

method. However, annual direct runoff computed from generated daily

precipitation data from CLIGEN showed smaller differences than values

computed from EPA STEPL approaches. Therefore, a web-based model to

simulate BMPs, or STEPL WEB (https://engineering.purdue.edu/~ldc/STEPL/),

138

138

was developed employing the CLIGEN model to generate daily precipitation

data and to compute annual direct runoff from daily direct runoff. For cost-

effective BMP implementation, selection of BMPs needs to be optimized.

Therefore, STEPL WEB establishes a priority list of BMPs based on

implementation cost per mass of pollutant reduction, and then the model

performs cumulative and iterative simulations to identify the most cost-effective

BMP implementation plans.

6.3 Recommendations for Future Research

The research identified pollutant load regression models’ behaviors by water quality

parameters and various sampling strategies. Web-based tools were developed to identify

required pollutant reductions, to simulate BMPs, and to optimize BMPs. During the study,

opportunities for future research were identified.

LOADEST was integrated with Web-based LDC Tools to estimate annual

pollutant loads. LOADEST can be used if water quality data are insufficient to

develop LDC. In addition, LOADEST can be used if a TMDL plan needs to be

based on annual pollutant loads (e.g. ton/year). The study was focused on

pollutant load regression models’ behaviors in annual pollutant load estimates.

Identification of the predictive ability of LOADEST for daily pollutant loads is

needed. For instance, the evaluation of the estimated loads to the measured loads

can be based on criteria (e.g. Nash-Sutcliffe Efficiency and Coefficient of

Determination) but also 90th percentile estimated loads for five flow regimes.

Since TMDL development can be daily-based or annual-based, LOADEST needs

139

139

to be tested to know if the pollutant load estimates by LOADEST are appropriate

for daily-based TMDL.

Three water quality parameters were selected in the study, which were nitrogen,

phosphorus, and sediment. LOADEST model 3 showed the most accurate and

precise annual sediment load estimates, and LOADIN was able to consider

seasonality of pollutant loads required for nitrogen. Therefore, it is suggested to

select or use LOADEST model 3 for annual sediment load estimates and

LOADIN for annual nitrogen load estimates. The collected daily phosphorus data

showed both flow-proportional and seasonal behaviors. Of course, both

LOADEST and LOADIN showed reasonable annual load estimates with

appropriate portion of water quality data from high-flow, however, it might be

required to explore phosphorus data characteristics affecting annual load

estimates more than flow-proportional and seasonal behaviors. Also, selecting

and using other pollutant regression models might be considered in exploring

phosphorus load estimates.

A web-based model to simulate BMPs, STEPL WEB, is used with the Web-

based LDC Tool. STEPL WEB is a lumped model to estimate annual direct

runoff, shallow groundwater, and nutrient loads. One of benefits to use the model

is the database for BMP efficiencies. The database can be used for other models

such as L-THIA. Therefore, the Web-based LDC Tool can be integrated with

other models allowing BMP simulations to reduce pollutant loads.

140

APPENDICES

140

140

Appendix A Statistics of USGS Stations

Table A 1 provides the statistics of flow data collected from USGS stations for Chapters

2 and 3. Drainage area were collected from USGS. Dominant landuse for each station

was determined based on the USGS Gap Analysis Program

(http://dingo.gapanalysisprogram.com/landcoverv2/).

‘Min.’, ‘Average’, ‘Max.’, and ‘Standard Deviation’ are the minimum flow, average flow,

maximum flow, and standard deviation (S. D.) of flow in cubic meter per second (cms)

for the flow data period.

http://dingo.gapanalysisprogram.com/landcoverv2/

141

Table A 1. Statistics of USGS Stations

USGS

Station State

Drainage

Area (km2)

Dominant Landuse Period Min.

(m3/s)

Average

(m3/s)

Max.

(m3/s)

S. D.

(m3/s)

09474000 AZ 46,648 Semi-Desert 1973-1973 0.01 0.41 2.45 0.28

09380000 AZ 289,561 Nonvascular & Sparse Vascular Rock Vegetation 1963-1964 0.56 2.56 16.04 2.73

09402500 AZ 366,742 Forest & Woodland 1969-1971 2.15 10.23 18.36 3.04

10336645 CA 19 Semi-Desert 1982-1991 0.00 0.01 0.33 0.03

10336660 CA 29 Semi-Desert 1986-1986 0.00 0.05 0.74 0.07

11525600 CA 80 Forest & Woodland 1999-2001 0.01 0.04 0.44 0.05

11048500 CA 108 Developed & Other Human Use 1978-1978 0.00 0.02 0.69 0.07

10336610 CA 142 Semi-Desert 1990-1990 0.00 0.03 0.16 0.04


11151870 CA 293 Shrubland & Grassland 1969-1969 0.00 0.29 7.63 0.79




11481000 CA 1,256 Forest & Woodland 1970-1972 0.00 1.31 27.42 2.86


11042000 CA 1,443 Shrubland & Grassland 1975-1977 0.00 0.01 0.16 0.01

11179000 CA 1,639 Shrubland & Grassland 1967-1968 0.00 0.10 5.37 0.32



11467000 CA 3,465 Developed & Other Human Use 1976-1983 0.00 2.18 54.93 5.51






142

Table A 1. Statistics of USGS Stations (continued)

USGS

Station State

Drainage

Area (km2)


(m3/s)

Average

(m3/s)

Max.

(m3/s)

S. D.

(m3/s)

11407700 CA 10,293 Agricultural Vegetation 1966-1975 0.13 4.65 59.74 5.64




11447650 CA

Agricultural Vegetation 1994-2003 4.91 21.28 90.61 16.13

09306242 CO 82 Forest & Woodland 1978-1979 0.00 0.00 0.01 0.00



09306200 CO 1,311 Forest & Woodland 1973-1982 0.00 0.02 0.13 0.01

09251000 CO 8,762 Forest & Woodland 1951-1957 0.03 1.18 12.35 2.00

01481500 DE 813 Developed & Other Human Use 1973-1973 0.12 0.52 3.92 0.47

02269160 FL 5,271 Developed & Other Human Use 2008-2010 0.09 0.77 4.46 0.74

02383500 GA 2,152 Forest & Woodland 1961-1962 0.26 1.40 19.49 1.82

16270900 HI 1

1990-1990 0.00 0.00 0.01 0.00

16265600 HI 3

1987-1996 0.00 0.00 0.14 0.01

16272200 HI 10

1991-1992 0.00 0.01 0.26 0.01

16213000 HI 117

1986-1986 0.00 0.02 0.24 0.03

05455000 IA 8 Agricultural Vegetation 1973-1982 0.00 0.00 0.07 0.00



06817000 IA 1,974 Agricultural Vegetation 1983-1984 0.02 0.51 7.19 0.80





143


USGS

Station State

Drainage

Area (km2)


(m3/s)

Average

(m3/s)

Max.

(m3/s)

S. D.

(m3/s)


05464500 IA 16,861 Developed & Other Human Use 1944-1953 0.17 3.03 43.06 4.20




13341000 ID 6,327 Forest & Woodland 1967-1967 0.24 4.53 21.89 4.61

12318500 ID 34,706 Forest & Woodland 1967-1971 1.76 13.37 76.74 16.04

05570370 IL 107 Agricultural Vegetation 1977-1986 0.00 0.03 0.81 0.05

05591200 IL 1,225 Agricultural Vegetation 1987-1996 0.00 0.36 7.77 0.65

05532500 IL 1,632 Developed & Other Human Use 2003-2009 0.08 0.64 7.14 0.67





07020500 IL 1,835,265 Agricultural Vegetation 1996-2004 45.87 169.87 586.96 98.47

07022000 IL 1,847,179 Agricultural Vegetation 2001-2010 50.12 190.00 663.93 116.31

05588720 IL 22 Agricultural Vegetation 2010-2010 12.99 106.30 352.01 69.03

03340800 IN 360 Agricultural Vegetation 1975-1976 0.00 0.11 2.76 0.25

04182000 IN 1,974 Agricultural Vegetation 1965-1965 0.01 0.32 3.39 0.65

03365500 IN 6,063 Agricultural Vegetation 1969-1971 0.22 1.76 28.06 2.75

07147800 KS 4,869 Shrubland & Grassland 1971-1972 0.00 0.25 5.96 0.53

06869500 KS 7,304 Agricultural Vegetation 1968-1969 0.00 0.04 1.04 0.10




144


USGS

Station State

Drainage

Area (km2)


(m3/s)

Average

(m3/s)

Max.

(m3/s)

S. D.

(m3/s)


03217000 KY 627 Forest & Woodland 1965-1965 0.00 0.23 3.45 0.43

03308500 KY 4,333 Agricultural Vegetation 1983-1992 0.13 2.16 50.36 3.15

03212500 KY 5,553 Forest & Woodland 1967-1972 0.08 2.16 30.63 3.28



02489500 LA 17,024 Forest & Woodland 1985-1987 1.33 7.26 56.53 9.48

01614500 MD 1,279 Agricultural Vegetation 1977-1979 0.07 0.59 8.98 0.84

01603000 MD 2,271 Forest & Woodland 1969-1978 0.10 1.14 12.67 1.47

01638500 MD 24,996 Agricultural Vegetation 1982-1982 1.24 7.29 61.26 8.48

04102700 MI 217 Agricultural Vegetation 1981-1981 0.02 0.08 0.87 0.09

04102420 MI 805 Agricultural Vegetation 1981-1981 0.18 0.33 1.38 0.16

04176500 MI 2,699 Agricultural Vegetation 1983-2010 0.01 0.02 0.05 0.00

04125350 MI

Forest & Woodland 1969-1969 0.01 0.02 0.05 0.00

05293000 MN 1,189 Agricultural Vegetation 1978-1979 0.00 0.08 2.08 0.21




05506500 MO 922 Agricultural Vegetation 1992-1994 0.00 0.28 8.74 0.70

07010000 MO 1,805,222 Agricultural Vegetation 1995-2004 43.30 167.24 638.27 100.27

07273100 MS 91 Agricultural Vegetation 1992-1994 0.00 0.03 1.23 0.09

07287404 MS 161 Forest & Woodland 1990-1999 0.01 0.08 3.50 0.24



07277700 MS 313 Agricultural Vegetation 1992-1994 0.03 0.13 5.78 0.42

145


USGS

Station State

Drainage

Area (km2)


(m3/s)

Average

(m3/s)

Max.

(m3/s)

S. D.

(m3/s)


06088500 MT 813 Agricultural Vegetation 1976-1981 0.00 0.11 1.40 0.12

12324200 MT 2,577 Forest & Woodland 1993-2002 0.02 0.21 1.59 0.16


06018500 MT 9,373 Semi-Desert 1964-1973 0.02 0.37 1.38 0.20



06130500 MT 20,321 Shrubland & Grassland 1987-1987 0.00 0.07 2.26 0.14



06115200 MT 106,156 Agricultural Vegetation 1984-1990 2.41 6.30 19.49 2.48


06088300 MT 580 Semi-Desert 1972-1981 0.02 0.37 1.38 0.20

02119400 NC 13 Agricultural Vegetation 1959-1968 0.00 0.01 0.06 0.01

05099600 ND 8,676 Agricultural Vegetation 1971-1972 0.00 0.31 7.47 0.64

06486000 NE 814,810 Agricultural Vegetation 1992-1999 5.94 29.49 78.10 13.00

06610000 NE 836,048 Agricultural Vegetation 1993-2002 10.02 32.82 93.01 13.64

06807000 NE 1,061,895 Agricultural Vegetation 2001-2010 6.94 28.55 132.31 14.05

01463500 NJ 17,560 Developed & Other Human Use 1972-1981 1.52 10.79 87.40 10.04

08334000 NM 1,088 Forest & Woodland 2001-2010 0.00 0.01 0.81 0.05





08383000 NM 6,863 Shrubland & Grassland 1983-1984 0.00 0.09 0.91 0.23

146


USGS

Station State

Drainage

Area (km2)


(m3/s)

Average

(m3/s)

Max.

(m3/s)

S. D.

(m3/s)





08396500 NM 39,627 Shrubland & Grassland 1987-1989 0.00 0.18 0.86 0.22






08354800 NM

Forest & Woodland 1975-1984 0.00 0.40 1.52 0.46

08358300 NM

Forest & Woodland 1984-1993 0.00 0.31 1.43 0.23

01357500 NY 8,935 Forest & Woodland 2005-2010 0.24 5.93 71.93 6.71

4185440 OH 11 Agricultural Vegetation 2008-2011 0.00 0.01 0.26 0.01






4185000 OH 1,062 Agricultural Vegetation 2008-2011 0.00 0.01 0.26 0.01



04212100 OH 1,774 Introduced & Semi Natural Vegetation 1981-1990 0.00 0.88 12.27 1.36



147


USGS

Station State

Drainage

Area (km2)


(m3/s)

Average

(m3/s)

Max.

(m3/s)

S. D.

(m3/s)








03150000 OH 19,223 Forest & Woodland 1981-1990 0.45 6.81 35.36 6.32

14306810 OR 3 Forest & Woodland 1968-1968 0.00 0.01 0.05 0.01

01481000 PA 743 Developed & Other Human Use 1968-1969 0.08 0.25 2.81 0.26

01470500 PA 919 Agricultural Vegetation 1978-1980 0.05 0.63 11.23 0.84

01567000 PA 8,687 Agricultural Vegetation 1975-1984 0.36 3.78 69.84 4.63

01570500 PA 62,419 Agricultural Vegetation 1973-1978 3.86 32.88 412.15 34.00

50065500 PR 18 1993-2002 0.00 0.35 11.29 0.94

50048770 PR 19 1992-1992 0.00 0.35 11.29 0.94

50053025 PR 19 1995-1996 0.00 0.35 11.29 0.94

50058350 PR 20 1992-1992 0.00 0.35 11.29 0.94

50071000 PR 39 1999-1999 0.00 0.35 11.29 0.94

50136400 PR 47 2008-2008 0.00 0.35 11.29 0.94

50028000 PR 48 2002-2002 0.01 0.05 0.24 0.04

50055750 PR 58 1995-2004 0.00 0.35 11.29 0.94

50057000 PR 156 1994-1994 0.00 0.35 11.29 0.94

50055000 PR 233 1995-1996 0.00 0.35 11.29 0.94

50043800 PR 281 1995-1996 0.01 0.15 25.98 1.01

50059050 PR 541 1990-1999 0.00 0.35 11.29 0.94

148


USGS

Station State

Drainage

Area (km2)


(m3/s)

Average

(m3/s)

Max.

(m3/s)

S. D.

(m3/s)

05291000 SD 1,031 Agricultural Vegetation 1974-1980 0.00 0.04 2.93 0.15

06441500 SD 8,151 Shrubland & Grassland 1998-1998 0.01 0.19 2.47 0.32

06452000 SD 25,693 Shrubland & Grassland 2006-2007 0.00 0.27 3.53 0.40

03407876 TN 45 Forest & Woodland 1978-1979 0.00 0.04 0.68 0.08

07030137 TN 207 Agricultural Vegetation 1986-1986 0.00 0.04 2.00 0.14

03584500 TN 4,621 Agricultural Vegetation 1937-1937 0.22 2.75 32.07 4.39

08136500 TX 17,027 Shrubland & Grassland 1979-1980 0.00 0.08 19.08 0.78

08065000 TX 33,237 Agricultural Vegetation 1977-1979 0.31 2.85 29.11 4.67

08066500 TX 44,512 Forest & Woodland 1969-1970 0.27 5.79 38.49 8.56

08161000 TX 107,847 Agricultural Vegetation 1963-1972 0.09 1.88 37.37 2.56

09379500 UT 59,570 Forest & Woodland 1970-1979 0.06 1.71 27.82 1.82




01658500 VA 20 Developed & Other Human Use 2005-2005 0.00 0.01 0.21 0.02

01664000 VA 1,603 Agricultural Vegetation 1990-1991 0.01 0.52 7.27 0.65

02075500 VA 6,700 Agricultural Vegetation 1971-1980 0.36 2.66 50.84 3.13

02066000 VA 7,682 Forest & Woodland 1971-1980 0.23 2.91 56.53 3.81

13351000 WA 6,475 Agricultural Vegetation 1962-1962 0.01 0.32 4.78 0.53

054310157 WI 11 Agricultural Vegetation 1999-2008 0.00 0.00 0.08 0.01






149


USGS

Station State

Drainage

Area (km2)


(m3/s)

Average

(m3/s)

Max.

(m3/s)

S. D.

(m3/s)


03199000 WV 697 Forest & Woodland 1975-1977 0.01 0.31 7.04 0.47

03200500 WV 2,233 Forest & Woodland 1981-1983 0.04 0.82 11.23 1.10

06250000 WY 922 Semi-Desert 1950-1957 0.00 0.06 0.40 0.05

06253000 WY 1,083 Semi-Desert 1959-1968 0.01 0.13 0.70 0.09

06317000 WY 15,669 Semi-Desert 1975-1976 0.00 0.25 5.21 0.36

09217000 WY 36,260 Semi-Desert 1982-1991 0.16 1.44 11.47 1.70

06279500 WY 40,823 Semi-Desert 1954-1963 0.24 1.50 13.39 1.14

150

150

Appendix B 95% Confidence Intervals of Ratios

The six figures that follow (Figures B 1-6) show variations of 95% confidence intervals

of the ratios (estimated pollutant load / measured pollutant load) by different sampling

strategies and regression models of LOADIN (LD) and LOADEST (LT). The numbers

with LT indicate the LOADEST model number; for instance, LT(3) indicates LOADEST

model number 3. Each model in each graph has three 95% confidence intervals; the first

95% confidence interval is for the monthly (M) sampling strategy, the second 95%

confidence interval is for the biweekly sampling strategy (B), and the third 95%

confidence interval is for the weekly sampling strategy (W).

Figure B 1 Ratio Comparison of Sediment Estimation by Fixed Sampling Frequencies

151

151

Figure B 2 Ratio Comparison of Sediment Estimation by Fixed Sampling Frequencies

Supplemented with Stratified Sampling

152

152

Figure B 3 Ratio Comparison of Phosphorus Estimation by Fixed Sampling Frequencies

153

153

Figure B 4 Ratio Comparison of Phosphorus Estimation by Fixed Sampling Frequencies


154

154

Figure B 5 Ratio Comparison of Nitrogen Estimation by Fixed Sampling Frequencies

155

155

Figure B 6 Ratio Comparison of Nitrogen Estimations by Fixed Sampling Frequencies


156

156

Appendix C Relationships between Flow, Concentration, Logarithm Flow, Logarithm

Load, and Squared Logarithm Load

Figures C 1-7 demonstrate LOADEST model behavior by percentage of calibration data

in high flow (PCH; see the Chapters 2 and 3) and are supplementary figures for Chapters

2 and 3, which indicate that an appropriate portion of water quality data from high flow

regime is required and that the relationship between flow and concentration data affects

annual sediment load estimation.

The figures are scatter plots of (a) flow (cubic meter per second; cms) and sediment

concentration data (mg/L), (b) logarithm flow and logarithm sediment load (kg), and (c)

squared logarithm flow and logarithm sediment load.

Daily flow and sediment concentration data were collected from USGS station 10336610,

and annual measured load (kg; sum of measured daily load) was 491,387 kg/year. PCH of

the subsampled data was initially 0% (Figure C 1) and was increased up to 65% (Figure

C 7) by adding the water quality data collected from the high flow regime. The flow and

sediment concentration data (Figure C 1-7 (a)) were used to run LOADEST model 1 (the

simplest model in LOADEST) and 9 (the most complex model in LOADEST), and

estimated sediment loads increased with PCH increases. Error was computed with Eq. 3.6

from Chapter 3. LOADEST assumes that pollutant load is an exponential function of data

variables such as logarithm flow, squared logarithm flow, etc. Therefore, the plots of

logarithm flow (Figure C 1-7 (b)) and squared logarithm flow (Figure C 1-7 (c)) with

logarithm sediment load were created.

157

(a) (b) (c)

Figure C 1 Scatter plot of flow, concentration, and load with PCH of 0%

(a) flow and sediment concentration data, (b) logarithm flow and logarithm sediment load,

and (c) squared logarithm flow and logarithm sediment load

Number of water quality data: 12

Estimated load by LOADEST model 1: 364,360 kg/year (Error: -26%)


158

(a) (b) (c)






Estimated load by LOADEST model 9: 520,523 kg/year (Error: 6%)

159

(a) (b) (c)







160

(a) (b) (c)







161

(a) (b) (c)







162

(a) (b) (c)







163

(a) (b) (c)







VITA

164

164

VITA

Youn Shik Park was born in Wonju-si, South-Korea. He received his Bachelor of

Engineering degree in Agricultural Engineering from Kangwon National University,

South-Korea in 2007. He graduated with a Master of Engineering in Agricultural

Engineering from Kangwon National University, South-Korea in 2009. He joined

Agricultural and Biological Engineering Department’s Ph.D. program at Purdue

University in August 2010.

Date post:	23-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

PURDUE UNIVERSITY GRADUATE SCHOOL Thesis/Dissertation ... · Obviously, I was able to finish my...

Documents