01 14
PURDUE UNIVERSITY GRADUATE SCHOOL
Thesis/Dissertation Acceptance
Thesis/Dissertation Agreement.Publication Delay, and Certification/Disclaimer (Graduate School Form 32)adheres to the provisions of
Department
DZ«² ͸·µ п®µ
Ü»ª»´±°³»²¬ ¿²¼ Û²¸¿²½»³»²¬ ±º É»¾ó¾¿»¼ ̱±´ ¬± Ü»ª»´±° ̱¬¿´ Ó¿¨·³«³ Ü¿·´§ Ô±¿¼
ܱ½¬±® ±º и·´±±°¸§
Þ»®²¿®¼ ßò Û²¹»´
Ö±²¿¬¸¿² Óò Ø¿®¾±®
ײ¼®¿¶»»¬ ݸ¿«¾»§
Ê»²µ¿¬»¸ Óò Ó»®©¿¼»
Þ»®²¿®¼ ßò Û²¹»´
Þ»®²¿®¼ ßò Û²¹»´ ðíñîêñîðïì
i
i
DEVELOPMENT AND ENHANCEMENT OF WEB-BASED TOOLS
TO DEVELOP TOTAL MAXIMUM DAILY LOAD
A Dissertation
Submitted to the Faculty
of
Purdue University
by
Youn Shik Park
In Partial Fulfillment of the
Requirements for the Degree
of
Doctor of Philosophy
May 2014
Purdue University
West Lafayette, Indiana
ii
ii
To my parents, brother, and N. Kim
iii
iii
ACKNOWLEDGEMENTS
Obviously, I was able to finish my dissertation and Ph.D. program with many thankful
people I met. I’ve never had happier moments than they could have been, since I was
with the people I met and raised me up when I was frustrated.
I would like to express my deepest gratitude to my advisor, Dr. Bernie Engel, for his
excellent guidance, patience, support, and providing me with opportunities to do what I
wanted to try and suggestions to resolve problems I had. It was only several years here,
but I am convinced that the last years with him are enough to change my thirty years
from now. I am also grateful to Dr. Jon Harbor, Dr. Indrajeet Chaubey, and Dr.
Venkatesh Merwade for serving on my research committee and for excellent comments
and suggestions.
I would like to thank to Larry Theller (also known as Uncle Larry) for assistance in web
programming and GIS; it was delightful to travel to Chicago and Ann Arbor for projects.
I am grateful to Barbara Davies and Rebecca Peer for their kindness (five-dollar-per-
month-coffee as well).
iv
iv
I would also like to thank Dr. Kyoung Jae Lim who introduced me Ph. D. program at
Purdue University and helped me gain programming skills for my research at Purdue
University.
Nayoung Kim, you are the person I am most thankful. Thank you. I have been seizing the
days with you, and I will do still, but I won’t let the days seize me.
v
v
TABLE OF CONTENTS
Page
LIST OF TABLES ........................................................................................................... viii
LIST OF FIGURES ........................................................................................................... ix
ABSTRACT ................................................................................................................ xii
CHAPTER 1. INTRODUCTION .................................................................................... 1
1.1 Problem Statement .................................................................................... 1
1.2 Objectives ................................................................................................ 4
1.2.1 Proposed Objectives for Annual Load Estimation ..............................4
1.2.2 Proposed Objectives for Enhancement and Development of
TMDL Models ............................................................................................................ 5
1.3 Dissertation Organization ........................................................................ 6
1.4 References ................................................................................................ 7
CHAPTER 2. ANALYSIS FOR REGRESSION MODEL BEHAVIOR BY
SAMPLING STRATEGY FOR ANNUAL LOAD ESTIMATION ................................ 10
2.1 Abstract .................................................................................................. 10
2.2 Introduction ........................................................................................... 11
2.3 Methodology ......................................................................................... 13
2.3.1 Water Quality Data ...........................................................................16
2.3.2 Subsampling Methods and Regression Models ................................19
2.4 Results and Discussions ........................................................................ 23
2.4.1 Ratio Comparison of Sampling Strategies and Regression Models .....
...........................................................................................................23
2.4.2 Water Quality Data from High Flow ................................................26
2.4.3 Improvement of Annual Load Estimation.........................................36
2.5 Conclusions ........................................................................................... 38
2.6 References .............................................................................................. 41
vi
vi
Page
CHAPTER 3. IDENTIFYING THE CORRELATION BETWEEN WATER
QUALITY DATA AND LOADEST MODEL BEHAVIOR ........................................... 46
3.1 Abstract .................................................................................................. 46
3.2 Introduction ........................................................................................... 47
3.3 Methodology ......................................................................................... 51
3.3.1 Water Quality Data Statistics for Annual Load Estimates ................51
3.3.2 Water Quality Data Selection for LOADEST runs ...........................53
3.4 Results and Discussions ........................................................................ 54
3.4.1 Required Statistics for Annual Load Estimates ................................54
3.4.2 Mean Flow in Calibration Data and Annual Load Estimates ...........60
3.4.3 Improvement of the Poorest Annual Load Estimates .......................66
3.5 Conclusions ........................................................................................... 69
3.6 References .............................................................................................. 71
CHAPTER 4. A WEB TOOL FOR STORET/WQX WATER QUALITY DATA
RETRIEVAL AND BEST MANAGEMENT PRACTICE SCENARIO
IDENTIFICATION........................................................................................................... 76
4.1 Abstract .................................................................................................. 76
4.2 Introduction ........................................................................................... 77
4.3 Methodology ........................................................................................... 80
4.3.1 Module Development to Use Water Quality Data ............................80
4.3.2 Module Development to Suggest BMP Scenarios ............................83
4.4 Application of the Web Tool ................................................................. 88
4.5 Conclusions ........................................................................................... 93
4.6 References .............................................................................................. 96
CHAPTER 5. A WEB MODEL TO ESTIMATE THE IMPACT OF BEST
MANAGEMENT PRACTICES ..................................................................................... 100
5.1 Abstract ................................................................................................ 100
5.2 Introduction ......................................................................................... 101
5.3 Methodology ....................................................................................... 104
5.3.1 Annual Direct Runoff Computations ..............................................105
5.3.2 Web Interfaces and CLIGEN Use ...................................................109
5.3.3 Auto-Calibration Modules ..............................................................111
vii
vii
Page
5.3.4 Optimization of Best Management Practices ..................................114
5.4 Results ................................................................................................ 116
5.4.1 Annual Direct Runoff Computations ..............................................116
5.4.2 Application of STEPL WEB ...........................................................121
5.5 Conclusions ......................................................................................... 125
5.6 References ............................................................................................ 127
CHAPTER 6. SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS ....... 132
6.1 Summary .............................................................................................. 132
6.2 Conclusions ......................................................................................... 136
6.3 Recommendations for Future Research ............................................... 138
APPENDICES
Appendix A Statistics of USGS Stations .................................................................. 140
Appendix B 95% Confidence Intervals of Ratios ................................................... 150
Appendix C Relationships between Flow, Concentration, Logarithm Flow,
Logarithm Load, and Squared Logarithm Load ........................................................ 156
VITA .............................................................................................................. 164
viii
viii
LIST OF TABLES
Table .............................................................................................................................. Page
Table 2.1 Number of Stations and Total Years ................................................................. 17
Table 2.2 Improvement of Annual Load Estimates by Increasing PCH Data .................. 38
Table 3.1 Various water quality data for LOADEST uses ............................................... 50
Table 3.2 Statistics in Calibration and Estimation Data ................................................... 53
Table 3.3 Daily Sediment Data from USGS Stations ....................................................... 54
Table 3.4 Comparison of Errors between Regression Equation and All Data.................. 66
Table 3.5 Improvement of the Poorest Load Estimates by MFC Fitting .......................... 69
Table 4.1 BMP Categories for Each Flow Regime (USEPA, 2007) ................................ 85
Table 4.2 Default BMP Costs for Landuses ..................................................................... 87
Table 4.3 Target Load, 90th
Percentile Load, and Required Reduction Percentage ......... 93
Table 5.1 Daily Precipitation Collection from NCDC.................................................... 108
Table 5.2 Annual Precipitation and Direct Runoff Computations .................................. 120
Table 5.3 Landuse Distribution for Study Watershed .................................................... 124
Table 5.4 STEPL WEB Parameters (Default / Calibrated) ............................................. 125
Table 5.5 Annual Direct Runoff, Baseflow, and Sediment load .................................... 125
Appendix Table Page
Table A 1. Statistics of USGS Stations ........................................................................... 141
ix
ix
LIST OF FIGURES
Figure ............................................................................................................................. Page
Figure 2.1 Locations of Water Quality Data Stations ....................................................... 17
Figure 2.2 Correlation Coefficients for Concentrations of Water Quality Parameters with
Streamflow ........................................................................................................................ 19
Figure 2.3 Example of Subsampled Dataset ..................................................................... 21
Figure 2.4 Examples of Pollutant Load Estimates (Sed: sediment, Phs: phosphorous, Mn:
monthly, Bi: biweekly, Wk: weekly, Fx: fixed interval sampling, St: fixed interval
sampling with storm events, 0 and 1: LOADEST model number 0 and 1) ...................... 25
Figure 2.5 Mean and Width of 95% Confidence Intervals by Percentage of Calibration
Data from High Flow ........................................................................................................ 35
Figure 2.6 Seasonal Variation in Blanchard River near Findlay, Ohio Nitrogen Data from
the National Center for Water Quality Research of Heidelberg University ..................... 36
Figure 3.1 Correlation between Errors and Mean of Flow in Calibration Data ............... 57
Figure 3.2 Correlation between Mean of Flows in Calibration and Estimation Data....... 58
Figure 3.3 Required MFC by Regression Equation .......................................................... 59
Figure 3.4 Comparison of Slopes from Linear Regression Formula (LRS in equation 3.7)
and Calibrated LOADEST Model (a1 in equation 3.4) ..................................................... 64
Figure 3.5 Annual Sediment Load Estimates by MFCs ................................................... 65
x
x
Figure Page
Figure 3.6 Load Estimate Improvement when Excluding Water Quality Samples (USGS
Station Number 02119400, Monthly Fixed Sampling Strategy on 19th
of Every Month)....
........................................................................................................................................... 68
Figure 4.1 Google Maps Interface to Retrieve USGS Water Quality Data ...................... 82
Figure 4.2 Schematic Depicting Web-based Tool Access of Water Quality Data from
EPA STORET/WQX Location Database and Web Access to WQP ................................ 83
Figure 4.3 Flow Data Collection by USGS Flow Station Location Tool ......................... 91
Figure 4.4 Water Quality Data Collection by WQP Location Tool ................................. 92
Figure 4.5 Load Duration Curve for the Study Watershed ............................................... 93
Figure 5.1 Location Map of NCDC and CLIGEN Stations within Indiana.................... 109
Figure 5.2 Annual Direct Runoff, Groundwater, and Pollutant Load Computation in
STEPL WEB ................................................................................................................... 114
Figure 5.3 Comparison of Annual Direct Runoff by Different Approaches .................. 121
Figure 5.4 Landuses of Tippecanoe River at North Webster Watershed ....................... 124
Appendix Figure
Figure B 1. Ratio Comparison of Sediment Estimation by Fixed Sampling Frequencies
......................................................................................................................................... 150
Figure B 2. Ratio Comparison of Sediment Estimation by Fixed Sampling Frequencies
Supplemented with Stratified Sampling ......................................................................... 151
Figure B 3. Ratio Comparison of Phosphorus Estimation by Fixed Sampling Frequencies
......................................................................................................................................... 152
xi
xi
Appendix Figure Page
Figure B 4. Ratio Comparison of Phosphorus Estimation by Fixed Sampling Frequencies
Supplemented with Stratified Sampling ......................................................................... 153
Figure B 5. Ratio Comparison of Nitrogen Estimation by Fixed Sampling Frequencies.....
......................................................................................................................................... 154
Figure B 6. Ratio Comparison of Nitrogen Estimations by Fixed Sampling Frequencies
Supplemented with Stratified Sampling ......................................................................... 155
Figure C 1. Scatter plot of flow, concentration, and load with PCH of 0% ................... 157
Figure C 2. Scatter plot of flow, concentration, and load with PCH of 14% ................. 158
Figure C 3. Scatter plot of flow, concentration, and load with PCH of 25% ................. 159
Figure C 4. Scatter plot of flow, concentration, and load with PCH of 37% ................. 160
Figure C 5. Scatter plot of flow, concentration, and load with PCH of 46% ................. 161
Figure C 6. Scatter plot of flow, concentration, and load with PCH of 56% ................. 162
Figure C 7. Scatter plot of flow, concentration, and load with PCH of 65% ................. 163
xii
xii
ABSTRACT
Park, Youn Shik. Ph.D., Purdue University, May 2014. Development and Enhancement
of Web-based Tools to Develop Total Maximum Daily Load. Major Professor: Bernard
A. Engel.
Flow and load duration curves (FDCs and LDCs) are commonly used to develop total
maximum daily loads (TMDLs). A web-based tool was previously developed to facilitate
development of FDC and LDC, allowing use of USGS streamflow data via web access.
In the research reported here, the tool has been upgraded to retrieve water quality data
from STORET/WQX and USGS, because significant effort is often required to obtain
water quality data, and additional tools were developed to assist in decision making for
best management practices (BMPs) selection.
The Web-based LDC Tool employs LOADEST and LOADIN to estimate daily pollutant
loads using intermittent water quality data; therefore, LOADEST and LOADIN were
evaluated for annual pollutant load estimations. Daily nitrogen, phosphorus, and sediment
concentration data were collected and subsampled using six sampling strategies. Since
the water quality parameters showed different relationships with streamflow for each
sampling strategy, it was concluded that pollutant regression models need to be selected
based on water quality parameters. In addition, water quality data used to estimate annual
pollutant loads need to include an appropriate proportion of water quality data from storm
xiii
xiii
events, with 20-30% of water quality data from high-flow (i.e. the upper 10 percent of
flows for a given analysis period) providing the closest estimated sediment and
phosphorus loads to measured loads.
After the Web-based LDC Tool identifies pollutant loads exceeding standards and
computes the required pollutant reduction to meet standard loads, a model capable of
simulating BMPs was required. The Spreadsheet Tool for the Estimation of Pollutant
Load (STEPL), a spreadsheet model to estimate annual pollutant loads, was evaluated as
the basis for the BMP model. STEPL computes annual direct runoff using the Soil
Conservation Service Curve Number (SCS-CN) method with average rainfall per event.
Annual direct runoff using the EPA STEPL approach showed large differences compared
to the annual direct runoff computed by general use of SCS-CN method. However,
annual direct runoff computed from daily precipitation data generated from CLIGEN
showed smaller differences than values computed from EPA STEPL approaches.
Therefore, a web-based model to simulate BMPs, STEPL WEB, was developed to
compute annual direct runoff obtained using daily precipitation data generated by
CLIGEN. STEPL WEB establishes a priority list of BMPs based on implementation cost
per mass of pollutant reduction, and then the model performs iterative simulations to
identify the most cost-effective BMP implementation plans.
1
1
CHAPTER 1. INTRODUCTION
1.1 Problem Statement
Total Maximum Daily Load (TMDL) is a water quality (WQ) standard designed to
preserve and regulate the quality of watersheds in the USA. Section 303(d) of the Clean
Water Act indicates that states and other defined authorities having contaminated water
need to establish priority rankings and develop TMDL plans to meet identified water
quality standards.
Watershed models are commonly used to develop TMDLs and to evaluate existing
pollutant loads. One of the greatest benefits of using watershed models is that they allow
consideration of the specific characteristics or conditions of a watershed, using temporal
and spatial data. However, this is also somewhat of a disadvantage, because such models
require not only a wide range of inputs but also expertise to prepare the inputs and to use
the models. Moreover, different models have apparent differences, purposes, applicability,
and uncertainties, and have various structures and assumptions for modeling complex
systems found in nature. Models have various advantages and disadvantages, and it may
be necessary to combine two or more models when solving a particular problem (Babbar-
Sebens and Karthikeyan, 2009; Shen and Zhao, 2010; Cleland, 2003; USEPA, 2008).
Load duration curve (LDC) analysis, a relatively simple approach requiring less input and
2
2
effort than watershed models, identifies the existing and allowable pollutant loads, and
requires streamflow and water quality (WQ) datasets. LDC analysis is a statistical
approach using cumulative streamflow and pollutant loads associated with streamflow
(USEPA, 2007).
A web-based tool to develop flow duration curves (FDC) and LDC was developed by
Kim et al. (2012). The Web-based LDC Tool (https://engineering.purdue.edu/~ldc/) uses
not only the user’s streamflow data but also USGS streamflow data via web access. The
tool simplifies development of FDC and LDC with user-friendly tablet and graphical
interfaces. The tool required users to prepare WQ data, which is fundamental data
required to develop LDC. Significant effort is often required to collect the WQ data, and
thus, there was a need to enhance the tool to access water quality data from
STORET/WQX and USGS.
To estimate annual pollutant loads and required reduction of pollutant loads, WQ data
associated with streamflow needs to have an identical temporal resolution to the
streamflow data. However, WQ data is typically intermittent. The tool employs
LOADEST to estimate daily pollutant loads using intermittent WQ data, however, the
regression models of LOADEST need to be evaluated to identify which regression
models are appropriate in annual pollutant load estimation and which regression model
shows ‘best fit’ to the ‘true load’. Therefore, the regression models of LOADEST and
another regression model used in the Web-based Load Interpolation Tool (LOADIN), a
web-based tool to interpolate pollutant loads, were evaluated in this study. With a
3
3
regression model to estimate daily loads, a sampling frequency is also assumed as one of
the influential factors in estimation of annual pollutant loads. To evaluate the regression
models, various sampling frequency strategies were evaluated to identify the required
number of WQ data per year for reasonable annual pollutant load estimation. For the
evaluation of sampling frequency strategies and regression models, daily WQ data were
collected from 21 stations for total nitrogen, 69 stations for phosphorous, and 211 stations
for suspended sediment. The collected daily WQ data were artificially degraded to
evaluate the effect of different sampling frequencies (i.e. weekly, biweekly, and monthly
fixed sampling intervals) on LOADEST and LOADIN results.
A simple LDC analysis allows a user to identify sources that contribute to pollutant loads
exceeding the TMDL and can help in suggesting Best Management Practices (BMPs) (i.e.
for nonpoint or point source pollution) to reduce pollutant loads in a watershed. However,
the original Web-based LDC tool was not designed to simulate BMPs to reduce pollutant
loads. To supplement the Web-based LDC Tool in terms of simulating BMP effects, the
Spreadsheet Tool for the Estimation of Pollutant Load (STEPL) was selected, which is a
spreadsheet-based model that estimates surface runoff, sediment load, nutrient loads, and
5-day biological oxygen demand (BOD5). The database incorporated with the model is
useful not only for annual runoff and pollutant load estimation but also for simulating the
implementation of various Best Management Practices (BMPs). Although STEPL
provides annual runoff and pollutant load estimation, the model has limitations
introduced by the way in which it calculates runoff. The model calculates runoff based on
the SCS-Curve Number (CN) method using a default initial abstraction and does not
4
4
adjust the CN and the SCS-CN equation to reflect changing initial abstraction.
Additionally, STEPL uses ‘average rainfall per event’ to reproduce annual runoff, rather
than actual rainfall data, and it is important to determine the value of using an alternate
approach. To correct the limitations of STEPL, a web-based model was developed to
integrate the revised STEPL with the Web-based LDC Tool.
1.2 Objectives
The overall goal of this study was to develop a framework for TMDL development using
web-based tools that include the Web-based LDC Tool with LOADEST and a Web-based
STEPL. First, water quality sampling frequency strategies and regression models to
interpolate intermittent WQ data and to estimate annual pollutant load were evaluated.
Second, the Web-based LDC Tool was improved to collect WQ data via web access, and
the Web-based STEPL was developed to estimate annual pollutant loads and to simulate
BMP effects.
1.2.1 Proposed Objectives for Annual Load Estimation
The LOADEST model has been used with intermittent water quality data in various
sampling frequencies such as monthly (Tamm et al. 2008; Qian et al. 2007; Das et al.
2011; Brigham et al. 2009), biweekly (Schuster et al. 2011; Dornblaser and Striegl, 2009),
weekly (Duan et al. 2012), or a few day interval (Horowitz, 2003; Raymond and Oh,
2007; Heimann et al. 2011). Evaluation of the predictive ability of pollutant load
regression models with various sampling strategies is needed to identify the correlation
5
5
between LOADEST model behavior and water quality datasets. These results can be
interpreted and reflected in the Web-based LDC Tool development.
The specific objectives of the study were to:
1. Evaluate water quality sampling frequency strategies and 10 regression models to
estimate annual pollutant loads. The regression models to be evaluated in this study
were nine LOADEST regression models (numbered 1 to 9) and one regression model
using a genetic algorithm.
2. Identify the correlation between LOADEST model behavior and water quality datasets
for various proportions of water quality data from storm events.
1.2.2 Proposed Objectives for Enhancement and Development of TMDL Models
The Web-based LDC Tool provides various benefits in development of FDC and LDC,
deriving streamflow data from the USGS data server, integrating with LOADEST, and
allowing additional analysis (e.g. seasonal variations and surface flow separation).
However, enhancement the Web-based LDC Tool was needed to improve ease-of-use
and to allow further analysis such as BMP suggestion or simulation. Benefits in the use of
STEPL, the other model used in this study, are that most inputs for the model are
provided by a model database and it allows simulating impacts of various BMPs.
However, there was a need for the model to be modified to improve the annual load
estimates, and to be developed into a web-based model to interact with the Web-based
LDC Tool. The specific objectives of the study were to:
6
6
1. Improve the web-based LDC Tool to allow collection of USGS and STORET/WQX
WQ data via web access.
2. Develop a module to suggest BMP scenarios to reduce annual pollutant load to meet
desired WQ goals.
3. Develop a web-based model to estimate annual WQ pollutant load and to simulate
BMPs to meet the required reduction of annual pollutant load.
1.3 Dissertation Organization
The dissertation contains six chapters. This chapter is an introduction and provides
general background on pollutant regression models, LDC, and STEPL for research needs
and objectives. Chapters 2-5 discuss in detail methods and results related to the proposed
objectives in the previous section. These chapters have been prepared following a style to
submit to journals, and thus each chapter contains an introduction, methods, results, and
conclusion. Chapter 2 evaluates the predictive ability of pollutant load regression models
with various sampling strategies for annual pollutant load estimations. Chapter 3 suggests
an approach to prepare water quality datasets for annual pollutant load estimation using
LOADEST. Chapter 4 covers the enhancement of Web-based LDC Tool to allow
automated collection of water quality data via web access and to identify BMPs to reduce
pollutant load to meet the required pollutant load reduction. Chapter 5 examines the
annual direct runoff approach in EPA STEPL and covers the development of a web-based
model capable of simulating annual pollutant load reductions for BMPs. Chapter 6 is an
overall conclusion and summary that provides an overview of the research and
recommendations for future studies.
7
7
1.4 References
Babbar-Sebens, M., Karthikeyan, R., 2009. Consideration of sample size for estimating
contaminant load reductions using load duration curves. Journal of Hydrology 372,
118-123.
Brigham, M. E., Wentz, D. A., Aiken, G. R., and D. P. Krabbenhoft. 2009. Mercury
cycling in stream ecosystems. 1. Water column chemistry and transport.
Environmental Science and Technology. 43: 2720-2725.
Chen, D., Lu, J., Wang, H., Shen, Y., Gong, D., 2011. Combined inverse modeling
approach and load duration curve method for variable nitrogen total maximum daily
load development in an agricultural watershed. Environmental Science Pollution
Research 18, 1405-1413.
Das, S. K., Ng, A. W. M., and B. J. C. Perera. 2011. Assessment of nutrient and sediment
loads in the Yarra River catchment. 19th International Congress on Modeling and
Simulation, Perth, Australia. 3490-3496.
Dornblaser, M. M. and R. G. Striegl. 2007. Nutrient (N,P) loads and yields at multiple
scales and subbasin types in the Yukon River basin, Alaska. Journal of Geophysical
Research. 112:G04S57.
Duan, S., Kaushal, S. S., Groffman, P. M., Band, L. E., and K. T. Belt. 2012. Phosphorus
export across an urban to rural gradient in the Chesapeake Bay watershed. Journal of
Geophysical Research. 117: G01025.
Heimann, D. C., Sprague, L. A., and D. W. Blevins. 2011. Trends in suspended-sediment
loads and concentrations in the Mississippi River Basin, 1950-2009. U. S. Geological
Survey Scientific Investigations Report 2011-5200.
8
8
Horowitz, A. J. 2003. An evaluation of sediment rating curves for estimating suspended
sediment concentrations for subsequent flux calculations. Hydrological Processes. 17:
3387-3409.
Kim. J., Engel, B. A., Park, Y. S., Theller, L., Chaubey, I., Kong, D. S., and K. J. Lim.
2012. Development of Web-based Load Duration Curve System for analysis of total
maximum daily load and water quality characteristics in a waterbody. Journal of
Environmental Management. 97: 46-55.
Qian, Y., Migliaccio, K. W., Wan, Y., Li, Y. 2007. Trend analysis of nutrient
concentrations and loads in selected canals of the Southern Indian River lagoon,
Florida. Water Air Soil Pollut. 186: 195-208.
Raymond, P. A., McClelland, J. W., Holmes, R, M., Zhulidov, A. V., Mull, K., Peterson,
B. J., Striegl, R. G., Aiken, G. R., and T. Y. Gurtovaya. 2007. Flux and age of
dissolved organic carbon exported to the Arctic Ocean: A carbon isotopic study of the
five largest arctic rivers. Global Biogechemical Cycles. 21: GB4011.
Schuster, P. F., Striegl, R. G., Aiken, G. R., Krabbenhoft, D. P., Dewild, J. F., Butler, K.,
Kanmark, B., and M. Dornblaser. 2011. Mercury export from the Yukon River basin
and potential response to a changing climate. Environmental Science and Technology.
45: 9262-9267.
Shen, J., Zhao, Y., 2010. Combined Bayesian statistics and load duration curve method
for bacteria nonpoint source loading estimation. Water Research 44, 77-84.
Tamm, T., Noges, T., Jarvet, A., and F. Bouraoui. 2008. Contributions of DOC from
surface and groundflow into Lake Vortsjarv (Estonia). Hydrobiologia. 599: 213-220.
9
9
USEPA, 2008. Handbook for developing watershed TMDLs. U. S. Environmental
Protection Agency. Washington, DC 20460.
10
10
CHAPTER 2. ANALYSIS FOR REGRESSION MODEL BEHAVIOR BY SAMPLING
STRATEGY FOR ANNUAL LOAD ESTIMATION
2.1 Abstract
Water quality data are typically collected less frequently than streamflow data due to
collection and analysis costs, and therefore water quality data may need to be estimated
for additional days. Regression models are often used as a basis for interpolating water
quality data associated with streamflow data, are extensively used and require relatively
small amounts of data. However, there is a need to evaluate how well regression models
represent pollutant loads from intermittent water quality datasets. Both the specific
regression model used and the water quality data frequency are important factors in
pollutant load estimation. In this study, nine regression models from LOADEST and one
regression model from LOADIN were evaluated with subsampled water quality datasets
from daily measured water quality datasets of nitrogen, phosphorus, and sediment. Each
water quality parameter had different correlations to streamflow, and the subsampled
water quality datasets had various proportions of storm samples. The behaviors of
regression models differed by not only water quality parameter but also by proportion of
storm samples. The regression models from LOADEST provided accurate and precise
annual sediment and phosphorus load estimates when the water quality data included 20-
40% storm samples. LOADIN provided more accurate and precise annual nitrogen load
estimates than LOADEST. In addition, the results indicate that the availability of water
11
11
quality data from storm events was crucial in annual pollutant load estimation using
pollutant regression models, and that accuracy increased if water quality data
extrapolation was avoided.
2.2 Introduction
The Total Maximum Daily Load (TMDL) is a standard measure used to regulate water
quality in watersheds in the USA. Section 303(d) of the Clean Water Act (U. S. Senate,
2002) indicates the states and other defined authorities having contaminated water need
to establish priority rankings and to develop TMDL plans to meet identified water quality
standards. A TMDL plan indicates the allowable total load of a pollutant in a watershed
without violation of water quality standards. TMDL planning involves the identification
of pollutant sources, water body monitoring, and an effort to mitigate pollutant sources so
that loads do not exceed the standard (Babbar-Sebens and Karthikeyan, 2009; Henjum,
2010). An appropriate sampling strategy is important for water quality monitoring that is
the basis of TMDL planning.
Various sampling strategies to collect water quality data are commonly employed,
including time-based, streamflow-based, and time and streamflow composited (King and
Harmel, 2003). Burn (1990) described three sampling strategies, which are fixed
frequency sampling, stratified fixed frequency sampling, and real-time updated stratified
sampling. Fixed frequency sampling is time-based, and regular sampling (Kronvang and
Bruhn, 1996) represents sampling being conducted with equal time intervals. Stratified
sampling is conducted based on streamflow proportion (i.e. high or low streamflow).
12
12
Water quality samples collected to estimate annual pollutant loads are typically collected
less frequently than streamflow due to the cost of collection and analysis. If the water
quality data are insufficient to estimate annual loads, the data needs to be estimated using
an appropriate method for days on which water quality samples are not available.
Regression model (rating curve) methods for estimating water quality parameters on days
for which samples are unavailable are based on a relationship between concentration (or
load) and streamflow. The methods began with simple linear forms of concentration and
streamflow, but have been modified based on logarithmic transformations, seasonal
variability, etc. (Cohn et al., 1992; Gilroy et al., 1990; Johnson, 1979; Robertson and
Richards, 2000). Regression models have come to be extensively used methods requiring
relatively small amounts of data, and are often applied with small datasets collected over
several years (Robertson and Richards, 2000). Even though the methods may cause large
errors in some cases, the methods provide unbiased load estimation with relatively low
variance (Cohn et al., 1992). Regression models with water quality data collected
biweekly or monthly with storm chasing often provided acceptable load estimates
(Horowitz, 2003; Robertson and Roerish, 1999; Robertson, 2003), while in other cases,
regression models often provided inaccurate and imprecise load estimates when the water
quality data are not normally distributed (Henjum et al., 2010) or when the number of
water quality data are small (Horowitz et al., 2001; Johnes, 2007). In addition, regression
models showed different model behaviors for different sites to which they are applied
(Phillips et al., 1999).
13
13
LOAD ESTimator (LOADEST, Runkel et al., 2004)and Web-based Load Interpolation
Tool (LOADIN, Park et al., 2012) are used in the Web-based Load Duration Curve Tool
(Web-based LDC Tool, Kim et al., 2012) which helps to develop Total Maximum Daily
Loads (TMDLs) and to determine required pollutant load reductions. Obtaining water
quality data is difficult and costly, and therefore interpolation of water quality data
associated with streamflow data may be required. The Web-based LDC Tool provides
LOADEST and LOADIN runs as options to generate daily pollutant loads from
intermittent water quality data, so that it is possible to calculate the required reduction of
pollutant loads to meet water quality standards. Here, the required reduction of pollutant
loads for five streamflow regimes (high flow, moist conditions, mid-range flows, dry
conditions, and low flow) in a load duration curve (LDC) are calculated by the sum of the
target loads and sum of the current loads. Therefore, there was a need to evaluate how
well the regression models represent annual pollutant loads from intermittent water
quality datasets.
Thus, the objectives of the study were: 1) to evaluate the predictive ability of pollutant
load regression models for annual pollutant load (i.e. sum of pollutant loads) estimates, 2)
to investigate the terms affecting the regression models’ behaviors; and 3) to explore
various sampling strategies with the regression models.
2.3 Methodology
LOADEST, one of the water quality data regression model methods, has been widely
used to interpolate or estimate daily pollutant loads for various water quality parameters
14
14
(Duan et al., 2013; Foster and Kenney, 2010; Spencer et al. 2009; Stenback et al., 2011;
Raymond et al. 2007; Eshleman et al. 2008). LOADEST estimates constituent loads in
streams given a time series of streamflow, additional data variables, and constituent
concentrations, and requires a minimum of 12 water quality samples (Dornblaser and
Gtriegl, 2007; Runkel et al., 2004). The model calibrates regression model coefficients
using three methods: Adjusted Maximum Likelihood Estimation (AMLE), Maximum
Likelihood Estimation (MLE), and Least Absolute Deviation (LAD) (Runkel et al. 2004).
The MLE method assumes the data have a linear interaction and follow a normal
distribution. The AMLE method uses a similar approach, but it is a nearly-unbiased
estimator for the mean (Cohn et al. 1992). Both methods are based on the assumption that
the residuals of models follow the normal distribution, and AMLE allows use of the
water quality datasets containing censored data. The alternative method, if it is assumed
that the residuals do not follow a normal distribution, is LAD that assumes the errors are
independently and identically distributed random variables (Powell, 1984). LOADEST
has 11 pollutant regression models (Equations 2.1-2.11) to estimate daily pollutant loads,
two of these (model numbers 10 and 11) are for specific periods defined by the user
(Runkel et al., 2004) and thus were not appropriate for use in this study. One of the nine
regression models (model numbers 1 to 9) could be selected automatically by setting the
model number to 0 or manually by setting the model number to one of the model
numbers from 1 to 9.
Eq. 2.1
Eq. 2.2
15
15
Eq. 2.3
Eq. 2.4
Eq. 2.5
Eq. 2.6
Eq. 2.7
Eq. 2.8
Eq. 2.9
Eq. 2.10
Eq. 2.11
Where, a0-6 are coefficients, Q is streamflow, dtime is decimal time, and per is the period
defined by user.
LOADIN estimates pollutant loads using streamflow and intermittent water quality data,
similar to LOADEST. LOADIN has a regression equation (Equation 2.12) composed of
three terms; the first term is for the pollutant loads correlated to streamflow, and the other
terms contain decimal time for the pollutant loads varying with time (e.g. season). The
coefficients of the regression model are calibrated by a genetic algorithm.
[ ] [ ]
Eq. 2.12
Where, Li is estimated load at time step i, Qi is streamflow at time step i, dectime is
decimal time at time step i, and a1-8 are coefficients.
16
16
In this study, LOADEST and LOADIN were evaluated with subsampled water quality
datasets from measured daily water quality data.
2.3.1 Water Quality Data
A measured ‘true load’ was required to evaluate how well pollutant load regression
models perform with intermittent water quality data and to examine what sampling
strategies are appropriate to estimate annual pollutant load. Daily water quality data for
sediment, phosphorus, and nitrogen were collected from the USGS Water-Quality Data
for the Nation (http://waterdata.usgs.gov/nwis/qw) and the National Center for Water
Quality Research of Heidelberg University
(http://www.heidelberg.edu/academiclife/distinctive/ncwqr) (Fig. 2.1). The daily water
quality data collected had at least a 1 year and a maximum 37 year period, and most of
the stations had 1-10 year periods for sediment and 1-5 years for phosphorus and nitrogen
(Table 2.1).
17
17
Figure 2.1 Locations of Water Quality Data Stations
Table 2.1 Number of Stations and Total Years
1~5 years 6~10 years 11~20 years 21 ~ 37 years Total
Sediment 97
(187)
104
(972)
5
(79)
5
(159)
211
(1397)
Phosphorus 52
(101)
7
(55)
5
(79)
5
(159)
69
(394)
Nitrogen 10
(21)
1
(7)
5
(79)
5
(159)
21
(266)
18
18
The daily water quality parameters used in the study showed different relationships with
streamflow, and correlation coefficients (equation 2.13) were calculated to determine the
relationships between streamflow and concentration data (Fig. 2.2). Compared with
phosphorus and nitrogen concentrations, sediment concentrations were more related or
proportional to streamflow, while nitrogen concentrations showed a relatively poor
relationship with streamflow data.
∑
√∑ √∑ Eq. 2.13
Where Ci is concentration at time step i, is mean of concentration data, Fi is streamflow
at time step i, and is mean of streamflow data.
19
19
Figure 2.2 Correlation Coefficients for Concentrations of Water Quality Parameters with
Streamflow
2.3.2 Subsampling Methods and Regression Models
To examine sampling strategies to estimate annual loads using regression models, all
daily water quality data collected were artificially degraded with six sampling strategies.
The first three sampling strategies were fixed interval sampling frequencies (weekly,
biweekly, and monthly); the other three sampling strategies were fixed interval with fixed
sampling frequencies supplemented with stratified sampling (fixed interval with storm
event sampling strategies). Storm samples in the study were defined as water quality data
collected at the peak flow of each hydrograph in the high-flow regime, which is the upper
10 percent of streamflow for a given analysis period (USEPA, 2007). Water quality data
20
20
were not designated to be storm samples for the peaks of hydrographs in Moist-Condition
(10-40%), Mid-Range Flow (40-60%), Dry-Conditions (60-80%), and Low-Flow (90-
100%).
All water quality data were used in subsampling processes for sampling strategies with
different beginning dates. For instance, the first water quality dataset for the weekly
interval sampling strategy was comprised of the water quality data from every Monday,
while the second dataset had the water quality data from every Tuesday. Therefore, the
water quality dataset for weekly fixed interval sampling frequencies had seven water
quality datasets. Similarly to the weekly interval sampling strategy, the biweekly interval
sampling strategy was subsampled based on days of the week with fourteen-day intervals.
The first water quality dataset for the sampling strategy was comprised of the water
quality data from every alternate Monday, and the first water quality data of the dataset
was from the first Monday of the entire water quality dataset. The eighth water quality
dataset for the sampling strategy was also from every alternate Monday; however, the
first water quality data of dataset was from the second Monday of the entire water quality
dataset. Thus, fourteen water quality datasets were subsampled for fixed interval
biweekly sampling frequencies. For the monthly sampling strategies, the water quality
data on 1st date of each month were subsampled for the first dataset, and the water quality
data on 2nd
date of each month were subsampled for the second dataset. Therefore, twenty
eight water quality datasets were subsampled for fixed interval monthly sampling
frequencies. The water quality data from storm events were added to the fixed sampling
frequencies supplemented with stratified sampling strategies. Ninety eight water quality
21
21
datasets for each sampling station were created by subsampling the measured water
quality dataset.
Figure 2.3 Example of Subsampled Dataset
Ten regression models were evaluated with the subsampled water quality datasets. Nine
regression models were from LOADEST (model numbers 1 to 9), and one regression
model was from LOADIN. In addition, LOADEST was run with the subsampled water
quality datasets by setting model number to ‘0’ to explore accuracy and precision of
annual pollutant load estimates for the “best” regression model automatically selected by
LOADEST. Measured daily water quality datasets were used to calculate the true
measured loads by direct numeric integration (Equation 2.14).
∑
Eq. 2.14
Where, i is day, Qi is streamflow on day i, Ci is concentration on day i, and Num.Yr is the
number of years.
22
22
To evaluate estimated annual loads from regression models, a ratio was calculated for
individual annual pollutant load estimates. The ratio is the division of the estimated
annual pollutant load by the measured annual pollutant load (Equation 2.15). The ratio is
1.0 when the estimated annual load is the same as the measured annual load; it is smaller
than 1.0 when a regression model underestimated load; and it is greater than 1.0 when a
regression model overestimated load.
Eq. 2.15
Numerous pollutant load estimates were performed. LOADEST was executed ten times
for each subsampled water quality dataset for model numbers 1 to 9 to evaluate the nine
regression models and for model number 0 to investigate how well LOADEST selects a
regression model. The number of pollutant load estimates for one measured water quality
dataset was 1,078 (i.e. (7 (weekly) + 14 (biweekly) + 28 (monthly)) × 2 (fixed interval
only or fixed interval with stratified) × 11 (LOADIN + 10 of LOADEST model number)).
Therefore, the total number of load estimates was 227,458 for sediment, 74,382 for
phosphorus, and 22,638 for nitrogen.
Both accuracy and precision are important in pollutant load estimation, because accuracy
represents the degree of systematic error, and precision indicates the degree of dispersion
or range (Phillips et al., 1999; Preston et al., 1989). Pollutant load estimates need to be
accurate (low bias) and to be precise (low variance). Accuracy was evaluated with the
mean of ratios, and precision was evaluated with the 95% confidence interval of ratios.
23
23
2.4 Results and Discussions
2.4.1 Ratio Comparison of Sampling Strategies and Regression Models
The study computed 324,478 pollutant load estimates with regression models (Appendix
B), and three distinct features were observed. The first feature was that pollutant load
estimates for LOADEST model number ‘0’, which is supposed to estimate pollutant
loads using the best regression model for the water quality dataset (Runkel et al., 2004),
were not necessarily more precise or accurate than pollutant load estimation for manual
model selection. For instance, the sediment load estimates when selecting LOADEST
model number ‘0’ were less accurate and less precise than the sediment load estimates for
LOADEST model number 1 (Fig. 2.4; comparison of ‘Sed-Wk-Fx-0’ and ‘Sed-Wk-Fx-
1’). Not surprisingly, often the pollutant load estimates for automatic model selection
provided more accurate and precise load estimates compared to the pollutant load
estimates for regression models selected manually. The LOADEST model number ‘0’
selects one of nine regression models based on the Akaike Information Criterion (AIC)
computed for each of the models based on regression model parameters and residuals
from AMLE methods (Runkel et al., 2004), and therefore the possibility for inaccurate or
imprecise estimates exists.
The second feature observed in the estimated pollutant load results was that including
water quality data from storm events improved the accuracy and precision in pollutant
load estimates. This feature was readily found throughout all three water quality
parameters. For instance, the estimates ‘Sed-Wk-St-1’ displayed higher accuracy (i.e.
ratio mean close to 1.0) and precision (i.e. narrow 95% CI) than the estimates ‘Sed-Wk-
24
24
Fx-1’. In addition, this feature indicates that extrapolation needs to be avoided, since the
fixed sampling frequencies supplemented with stratified sampling strategies include the
water quality data for maximum streamflow.
The last feature observed in the estimated pollutant load results was that high sampling
frequency did not necessarily improve the accuracy and precision of estimated loads.
Monthly subsampled water quality datasets had less water quality data than weekly
subsampled water quality datasets, but nevertheless, monthly subsampled water quality
datasets sometimes led to more accurate and precise pollutant load estimates than weekly
subsampled water quality datasets. For instance, the load estimate ‘Phs-Bi-St-2’ displayed
higher accuracy and precision than the estimate ‘Phs-Wk-St-2’ which displayed lower
accuracy and precision than the estimate ‘Phs-Mn-St-2’. The accuracy of ‘Sed-Wk-St-1’
was higher than that of ‘Sed-Mn-St-1’, but the precision of ‘Sed-Wk-St-1’ was lower than
that of ‘Sed-Mn-St-1’. Therefore, a more extensive water quality dataset did not
necessarily lead to accurate and/or precise pollutant load estimates.
25
Figure 2.4 Examples of Pollutant Load Estimates (Sed: sediment, Phs: phosphorous, Mn: monthly, Bi: biweekly, Wk: weekly, Fx:
fixed interval sampling, St: fixed interval sampling with storm events, 0 and 1: LOADEST model number 0 and 1)
26
26
2.4.2 Water Quality Data from High Flow
Through the results in the previous section, it was concluded that: (1) an extensive
sampling strategy does not necessarily lead to accurate and precise annual pollutant load
estimates using pollutant regression models, and (2) the use of water quality data from
storm events improves pollutant load estimates. Therefore, an analysis to explore the
influence of the portion of water quality data from storm events on load estimation was
performed.
The results showed that the water quality data for high flow conditions play an important
role in annual pollutant load estimation. Therefore, the results were categorized into
seven groups based on the percentage of calibration data from the high flow regimes
(PCH) (Fig. 2.5). Regression models displayed different behaviors by water quality
parameter and PCH.
In sediment load estimation, the regression models displayed two different ratio trends
against the PCH (Fig. 2.5(a) and (b)). In sediment load estimation, the four LOADEST
regression models numbered 1, 3, 4, and 7 (first group of regression models)
underestimated loads if the PCH was smaller than 20% and overestimated loads when the
PCH was greater than 20%. Use of water quality data with a PCH range of 20-30%
provided the closest estimated sediment loads to measured loads. The LOADEST
regression models numbered 2, 5, 6, 8, and 9 (second group of regression models)
typically overestimated sediment loads (Fig. 2.5(a)) and showed the narrowest 95% CI
when PCHs were 30-40%.
27
27
Similar to sediment load estimation, two ratio trends were identified in annual
phosphorus load estimation. The first group of regression models (LOADEST regression
models numbered 1, 3, 4, and 7) showed similar behavior as for annual sediment load
estimation. The second group (LOADEST regression models numbered 2, 5, 6, 8, and 9)
typically overestimated loads, but the ratios of annual phosphorus loads were close to 1.0
when PCH was greater than 20% (Fig. 2.5(c)). Moreover, the 95% CI for all LOADEST
regression model results were narrow when PCHs were greater than 20% (Fig. 2.5(d)).
LOADIN was more sensitive to storm samples than LOADEST and displayed low
precision in annual sediment and phosphorous load estimates (Fig. 2.5(a), (b), (c), and
(d)). The second group of regression models (model numbers 2, 5, 6, 8, and 9) showed
greater mean ratios than the first group of regression models in annual sediment and
phosphorous estimation. The difference between the first group and the second group of
regression models is the inclusion of the term ‘squared logarithm streamflow’ in the
second group. The forms of regression models in LOADEST were determined based on
various functions of streamflow and time (Cohn et al, 1992; Crawford, 1991; Helsel and
Hirsch, 2002; Runkel et al., 2004). LOADEST model number 1 (the simplest model)
consists of streamflow and two coefficients, and LOADEST model 9 (the most
sophisticated model) consists of six variables (i.e. logarithm streamflow, squared
logarithm streamflow, time, etc.) and seven coefficients. The regression model composed
of streamflow and two coefficients demonstrated reasonable sediment load estimates
(Crawford, 1991), and the regression model composed of six variables and seven
coefficients demonstrated reasonable phosphorus load estimates (Cohn et al, 1992).
However, the second group of regression models with ‘squared logarithm streamflow’
28
28
(model numbers 2, 5, 6, 8, and 9) tend to overestimate loads and showed low accuracy
and precision. Therefore, it was concluded that equations with the term ‘squared
logarithm streamflow’ should be carefully evaluated prior to using them for estimating
annual pollutant loads.
Unlike for annual sediment and phosphorus load estimates, the LOADEST regression
models mostly overestimated nitrogen loads, and the annual nitrogen load estimates with
PCH of 50-60% were the closest to the measured loads (Fig. 2.5(e)). The 95% CI for all
LOADEST regression models were wider than LOADIN, indicating that LOADIN has
higher precision in annual nitrogen load estimation than LOADEST. Inclusion of more
high flow water quality data from storm events led to more precise annual nitrogen load
estimates using LOADEST, because the 95% CI narrowed with PCH increase (Fig. 2.5
(f)). While the regression models in LOADEST overestimated and displayed low
precision in annual nitrogen load estimation, the LOADIN estimates were notably close
to the measured annual nitrogen loads, since the means of ratios were typically close to
1.0 (Fig. 2.5(e)) and the 95% CIs were typically narrower than those for LOADEST (Fig.
2.5(f)).
The nitrogen data collected in the study displayed seasonal variance or low relationships
to streamflow (Figs. 2.2 and 2.6). Similar to LOADEST, LOADIN identifies the
relationship between streamflow and pollutant loads. The regression model in LOADIN
is composed of two functions. One is a function of streamflow and model coefficients to
represent pollutant loads for streamflow variation, and the other is a function of
29
29
streamflow, decimal time, and model coefficients to represent pollutant loads based on
time (or seasonal) variation. Regression models calibrate the coefficients by pollutant
loads not by pollutant concentrations. Therefore, the term for the pollutant loads for time
variability needs to be the multiplication of ‘decimal time’, ‘streamflow’, and ‘coefficient
to calibrate’, not the multiplication of ‘decimal time’ and ‘coefficient to calibrate’.
LOADIN has a term in the form of multiplication of ‘decimal time’, ‘streamflow’, and
‘coefficient to calibrate’ for seasonality; however, LOADEST does not.
The regression models used in the study identify the correlation between stream
volumetric streamflow rate (i.e. cubic meters per second) and water quality parameter
mass (i.e. load, the multiplication of pollutant concentration (milligram per liter) and
streamflow). Thus, if streamflow and concentration have a proportional relationship, the
pollutant load for high concentration and high streamflow is much greater than the
pollutant load for low concentration and low streamflow. In other words, the relationship
between pollutant loads and streamflow would be closer to an exponential function; this
assumption corresponds to LOADEST regression models. However, the nitrogen data
typically did not have a proportional relationship to streamflow, thus, LOADEST led to
less accurate and less precise annual nitrogen load estimates.
30
30
(a) Mean of Ratio for Sediment
31
31
(b) Width of 95% Confidence Intervals of Ratio for Sediment
32
32
(c) Mean of Ratio for Phosphorus
33
33
(d) Width of 95% Confidence Intervals of Ratio for Phosphorus
34
34
(e) Mean of Ratio for Nitrogen
35
35
(f) Width of 95% Confidence Intervals of Ratio for Nitrogen
Figure 2.5 Mean and Width of 95% Confidence Intervals by Percentage of Calibration
Data from High Flow
36
36
Figure 2.6 Seasonal Variation in Blanchard River near Findlay, Ohio Nitrogen Data from
the National Center for Water Quality Research of Heidelberg University
2.4.3 Improvement of Annual Load Estimation
It was concluded that: (1) the use of water quality data from storm events improves
annual pollutant load estimates, and (2) the PCH plays an important role in annual
pollutant load estimates. Therefore, an analysis to investigate the influence of water
quality data from high flow was performed with some of the poorest pollutant load
estimates for each water quality parameter.
Three water quality datasets for each water quality parameter were selected from the
‘poorest’ pollutant load estimates. The ‘Est. 0’s in Table 2.2 are the pollutant load
37
37
estimates using the intact water quality dataset from the subsampling process. The
sampling strategies for datasets 1 and 3 for sediment were the monthly fixed interval
sampling strategy, and that for dataset 2 was the biweekly fixed interval sampling
strategy. In the case of phosphorus load estimation, the sampling strategies were
biweekly (dataset 1), weekly (dataset 2), and monthly (dataset 3) fixed interval sampling
strategies. All datasets for nitrogen load estimation were for the monthly fixed interval
sampling strategy. The regression model of the datasets were model numbers 2, 5, 6, 8,
and 9 (second group of regression models) in LOADEST. The datasets had different
frequencies; however, they had concomitant features in that they had fixed interval
sampling strategies, the models were the second group mentioned in the previous section,
and the PCH was smaller than 10%. High flow water quality data were added from ‘Est.
1’ to ‘Est. 3’; in other words, PCH data were intentionally increased to improve the
pollutant load estimates. The water quality data for maximum streamflow were added in
‘Est. 1’ to investigate model behaviors by avoiding extrapolation in the estimates.
The second group of regression models overestimated sediment load more than ten times
and overestimated phosphorus and nitrogen loads several hundred times with the nine
subsampled water quality datasets (‘Est. 0’ in Table 2.2). However, inclusion of water
quality data for maximum streamflow (i.e. avoiding extrapolation) significantly improved
all pollutant estimates (comparisons of ‘Est. 0’s and ‘Est. 1’s in each dataset). Pollutant
load estimates were improved from ‘Est. 1’ to ‘Est. 3’ with PCH increases; however, use
of a large proportion of water quality data from storm events led to overestimated
pollutant loads (e.g. comparison of ‘Est. 2’ and ‘Est. 3’ of dataset 1 for phosphorus).
38
38
Compared to the first group of regression models, the second group of regression models
was inaccurate, imprecise, and overestimated sediment and phosphorus loads. However,
the nine pollutant load estimates by the regression models were improved by adding
water quality data for maximum streamflow. This suggests that extrapolation might lead
to inaccurate and imprecise pollutant load estimates by the second group of regression
models.
Table 2.2 Improvement of Annual Load Estimates by Increasing PCH Data
Water Quality
Parameter Dataset 1 Dataset 2 Dataset 3
Sediment
(2, 9, 5)
Est.* PCH Ratio PCH Ratio PCH Ratio
0 0.0 18.0 7.7 17.3 8.3 16.4
1 14.3 0.6 20 0.7 15.4 1.0
2 53.9 1.0 33.3 0.9 20.0 1.0
3 65.7 1.0 45.5 1.0
Phosphorus
(2, 6, 8)
Est.* PCH Ratio PCH Ratio PCH Ratio
0 6.7 291.1 7.7 658.9 5.8 422.3
1 10.2 0.7 12.7 0.9 13.1 0.8
2 21.1 1.0 18.6 1.1 27.1 1.0
3 27.6 1.3 37.2 1.0
Nitrogen
(6, 8, 9)
Est.* PCH Ratio PCH Ratio PCH Ratio
0 8.3 571.9 0.0 189.8 0.0 169.4
1 15.4 1.1 14.3 0.7 14.3 0.8
2 35.3 1.0 40 1.0 45.5 1.0
3 47.6 1.0 52 1.1 53.9 1.1
*Est.: Pollutant Load Estimation Number, the numbers in () are the regression models for datasets 1, 2, and
3 respectively.
2.5 Conclusions
Water quality samples are typically collected less frequently than streamflow due to the
cost of collection and analysis. Water quality data need to be interpolated using an
39
39
appropriate method, if the samples are insufficient. Regression models are based on a
relationship between concentration (or load) and streamflow; they are applicable to
interpolate or generate water quality data associated with streamflow data. The Web-
based LDC Tool employs LOADEST to develop TMDLs and to calculate the required
reduction of pollutant loads against standard pollutant loads. The regression models in
LOADEST needed to be evaluated with various water quality datasets. Therefore, the
regression models and one regression model from LOADIN were evaluated with six
sampling strategies for three water quality parameters. The water quality data collected in
the study were subsampled to investigate the influence of water quality sample strategy
(i.e. weekly, biweekly, and monthly fixed interval sampling frequencies) and inclusion of
storm events (i.e. fixed interval with fixed sampling frequencies supplemented with
stratified sampling).
It was concluded that 1) use of extensive water quality data does not necessarily lead to
precise and accurate estimates of annual pollutant loads by regression models, 2) water
quality data to estimate annual pollutant loads needs to consist of an appropriate
proportion of water quality data from storm events, 3) extrapolation needs to be avoided
in use of pollutant concentrations within regression models for annual pollutant load
estimates, and 4) a regression model needs to be employed based on the behaviors of
water quality parameters.
Regression models were evaluated with large datasets from six sampling strategies, and
several regression models demonstrated better accuracy and precision than the others. In
40
40
addition, the appropriate proportion of water quality data from storm events for
accurately estimating pollutant loads using these regression models was identified.
However, further study is required to investigate the correlation between regression
model behavior and calibration datasets within various proportions of water quality data
from storm events, using the regression model which provided the most accurate and
precise pollutant load estimates.
41
41
2.6 References
Babbar-Sebens, M., Karthikeyan, R., 2009. Consideration of sample size for estimating
contaminant load reductions using load duration curves. Journal of Hydrology 372,
118-123.
Burn, D. H., 1990. Real-time sampling strategies for estimating nutrient loadings. Journal
of Water Resources Planning and Management 116(6), 727-741.
Cohn, T. A., Caulder, D. L., Gilroy, E. J., Zynjuk, L. D., Summers, R. M., 1992. The
validity of a simple statistical model for estimating fluvial constituent loads: an
empirical study involving nutrient loads entering Chesapeake Bay. Water Resources
Research 28(9), 2353-2463.
Crawford, C. G., 1991. Estimation of suspended-sediment rating curve and mean
suspended-sediment loads. Journal of Hydrology 129, 331-348.
Dornblaser, M. M., Striegl, R. G., 2007. Nutrient (N,P) loads and yields at multiple scales
and subbasin types in the Yukon River basin, Alaska. Journal of Geophysical
Research 112, G04S57.
Duan, W., Takara, K., He, B., Luo, P., Nover, D., Yamashiki, Y., 2013. Spatial and
temporal trends in estimates of nutrients and suspended sediment loads in the Ishikari
River, Japan, 1985 to 2010. Science of the Total Environment 461-462, 499-508.
Eshleman, K. N., Kline, K. M., Morgan, R. P., Castro, N. M., Legley, T. L., 2008.
Contemporary trends in the acid-base status of two acid-sensitive streams in Western
Maryland. Environmental Science and Technology 42, 56-61.
42
42
Foster, K., Kenney, T. A., 2010. Dissolved-solids load in Henrys Fork upstream from the
Confluence with Antelope Wash, Wyoming, water year 1970-2009. U. S. Geological
Survey Scientific Investigations Report 2010-5048, Reston, Virginia.
Gilroy, E. J. Hirsch, R. M., Cohn, T. A. 1990. Mean square error of regression-based
constituent transport estimates. Water Resources Research 26(9), 2069-2077.
Haggard, B. E., Soerens, T. S., Green, W. R., Richards, R. P., 2003. Using regression
Methods to estimate stream phosphorus loads at the Illinois River, Arkansas. Applied
Engineering in Agriculture 19(2), 187-194.
Helsel, D. R., Hirsch, R. M., 2002. Statistical methods in water resources. U.S.
Geological Survey Techniques and Methods, Book 4, Chap. A3, Reston, Virginia.
Henjum, M. B., Hozalski, R. M., Wennen, C. R., Novak, P. J., Arnold, W. A., 2010. A
comparison of total maximum daily load (TMDL) calculations in urban streams using
near real-time and periodic sampling data. Journal of Environmental Monitoring 12,
234-241.
Horowitz, A. J., 2001. Estimating suspended sediment and trace element fluxes in large
river basins: methodological considerations as applied to the NASQAN programme.
Hydrological Processes 15, 1107-1132.
Horowitz, A. J., 2003. An evaluation of sediment rating curves for estimating suspended
sediment concentrations for subsequent flux calculations. Hydrological Processes 17,
3387-3409.
Johnes, P. J., 2007. Uncertainties in annual riverine phosphorus load estimation: impact
of load estimation methodology, sampling frequency, baseflow index and catchment
population density. Journal of Hydrology 332, 241-258.
43
43
Johnson, A. H. 1979. Estimating solute transport in streams from grab samples. Water
Resources Research 15(5), 1224-1228.
Kim. J., Engel, B. A., Park, Y. S., Theller, L., Chaubey, I., Kong, D. S., Lim, K. J., 2012.
Development of web-based load duration curve system for analysis of total maximum
daily load and water quality characteristics in a waterbody. Journal of Environmental
Management 97, 46-55.
King, K. W., Harmel, R. D., 2003. Considerations in selecting a water quality sampling
strategy. Transaction of the ASAE 46(1), 63-73.
Kronvang, B., Bruhn, A. J., 1996. Choice of sampling strategy and estimation method for
calculating nitrogen and phosphorus transport in small lowland streams. Hydrological
Processes 10, 1483-1501.
Park, Y. S., Chaubey, I., Lim, K. J., Engel, B. A., 2012. Development of a web-based
pollutant load interpolation tool using an optimization algorithm. American Society of
Agricultural and Biological Engineers Annual International Meeting. Paper Number:
121337988.
Phillips, J. M., Webb, B. W., Walling D. E., Leeks, L., 1999. Estimating the suspended
sediment loads of rivers in the LOIS study area using infrequent samples.
Hydrological Processes 13, 1035-1050.
Powell, J. L., 1984. Least absolute deviations estimation for the censored regression
model. Journal of Econometrics 25, 303-325.
Preston, S. D., Bierman, V. J,,Silliman, S. E., 1989. An evaluation of methods for the
estimation of tributary mass loads. Water Resources Research. 25(6), 1379-1389.
44
44
Raymond, P. A., McClelland, J. W., Holmes, R, M., Zhulidov, A. V., Mull, K., Peterson,
B. J., Striegl, R. G., Aiken, G. R., Gurtovaya, T. Y., 2007. Flux and age of dissolved
organic carbon exported to the Arctic Ocean: A carbon isotopic study of the five
largest arctic rivers. Global Biogechemical Cycles 21, GB4011.
Robertson, D. M., 2003. Influence of different temporal sampling strategies on estimating
total phosphorus and suspended sediment concentration and transport in small streams.
Journal of the American Water Resources Association. 1281-1308.
Robertson, D. M., Roerish, E. E., 1999. Influence of various water quality sampling
strategies on load estimates for small streams. Water Resources Research 35(12),
3747-3759.
Robertson, D. M., Richards, K. D., 2000. Influence of different temporal sampling
strategies on estimating loads and maximum concentrations in small streams.
Proceedings of the National Water Quality Monitoring Council National Monitoring
Conference, Austin TX. 209-223.
Runkel, R. L., Crawford, C. G., Cohn, T. A., 2004. Load Estimator (LOADEST): A
Fortran program for estimating constituent loads in streams and rivers. U.S.
Geological Survey Techniques and Methods, Book 4, Chap. A5, Reston, Virginia.
Spencer, R. G. M., Aiken, G. R., Butler, K. D., Dornblaser, M. M., Striegl, R. G., Hernes,
P. J., 2009. Utilizing chromophoric dissolved organic matter measurements to derive
export and reactivity of dissolved organic carbon exported to the Arctic Ocean: A case
study of the Yukon River, Alaska. Geophysical Research Letters 36, L06401.
Stenback, G. A., Crumpton, W. G., Schilling, K. E., Helmers, M. J., 2011. Rating curve
estimation of nutrient loads in Iowa rivers. Journal of Hydrology 396, 158-169.
45
45
USEPA. 2007. An Approach for using load duration curves in the development of
TMDLs. Watershed Branch (4530T), Office of Wetlands, Ocean and Watersheds, U.S.
Environmental Protection Agency, 1200 Pennsylvania Ave., Northwest.
USGS. 2001. Effect of storm-sampling frequency on estimation of water-quality loads
and trends in two tributaries to Chesapeake Bay in Virginia. Water-Resources
Investigations Report 01-4136, Richmond, Virginia.
U. S. Senate. 2002. Federal water pollution control act. U. S. Senate, Dirksen Senate
Office Bldg. Washington, DC.
46
46
CHAPTER 3. IDENTIFYING THE CORRELATION BETWEEN WATER QUALITY
DATA AND LOADEST MODEL BEHAVIOR
3.1 Abstract
Water quality samples are typically collected less frequently than flow since water quality
sampling is costly. LOADEST is used to predict water quality concentration (or load) on
days when flow data are measured so that the water quality data are sufficient for annual
pollutant load estimation. However, there is a need to identify water quality data
requirements for accurate pollutant load estimation. Measured daily sediment data were
collected from 211 stream records. Estimated annual sediment loads from LOADEST and
subsampled data were compared to the measured annual sediment loads (true load). The
means of flow for calibration data were correlated to model behavior. A regression
equation was developed to compute the required mean of flow in calibration data to best
calibrate the LOADEST regression model coefficients. LOADEST runs were performed
to investigate the correlation between the mean flow in calibration data and model
behaviors as daily water quality data were subsampled. LOADEST calibration data used
sediment concentration data for flows suggested by the regression equation displayed
small errors in annual sediment load estimates. Moreover, use of more extensive water
quality data only occasionally led to the annual load estimates with small error.
47
47
3.2 Introduction
Water quality samples are collected less frequently than flow, because water quality
sampling requires significant labor, and the samples are costly to collect and analyze.
Therefore, water quality samples are collected by various sampling strategies which are
based on flow, time, or flow and time composited (Burn, 1990; King and Harmel, 2003).
Fixed frequency sampling strategies collect samples based on time and represent the
sampling being conducted with equal time intervals (e.g. 52, 26, and 12 per year in cases
of weekly, biweekly, and monthly, respectively), while stratified sampling strategies are
conducted based on flow proportion (e.g. 10 mm volumetric depth). Water quality data
samples may not be consecutive or associated with the range of flow data, and therefore a
straightforward annual load estimate (e.g. sum of daily loads) may not be possible. Thus,
water quality samples typically need to be estimated for days on which samples were not
collected (Robertson, 2003).
Regression models (rating curves) are used to predict water quality concentrations (or
loads) on days when flow data are measured. Regression models have been used
extensively for this purpose, and have been modified from simple linear forms to
logarithmic transformations and to consider seasonal variability (Cohn et al., 1992b;
Gilroy et al., 1990; Johnson, 1979; Robertson and Richards, 2000). Various ranges of
water quality data sampling frequencies were used to predict pollutant loads with
regression models (Coynel et al., 2004; Henjum et al., 2010; Horowitz, 2003; Johnes,
2007; Kronvang and Bruhn, 1996; Robertson, 2003; Robertson and Roerish, 1999) to
investigate what sampling frequencies are appropriate for regression model use.
48
48
Several approaches have been suggested to determine the number of water quality data
required for estimating pollutant loads (Equations 3. 1-3). The equations are to determine
the number of samples (n) to estimate the mean concentration within a margin of error (d)
and are composed of the Student’s t value and statistical factors. The equations require an
initial estimate of sample size (no) to compute the numbers of samples (n); iterations are
necessary until n corresponds to no. For instance, if it is required to compute the number
of samples for “0.05 (α) level of significance with a 90 percent chance (β=0.1) of
detecting a mean significantly different within 0.04 mg/l (d)”, assuming that an initial
estimate of 12 (no=12 thus v=11) would be required and that the sample standard
deviation (s’) would be the same as the population standard deviation (S = 0.05 mg/l),
then the number of samples would be 9 samples using equation 1 (n=8.12), 9 samples
using equation 2 (n=8.31), and 19 samples for equation 3 (n=18.39). However, the
equations are used to determine the number of samples required to determine the mean
concentration within a margin of error, not for load estimation regression model uses. In
addition, the equations require assumptions for degree of freedom (v), sample standard
deviation (s’), and population standard deviation (S, i.e. standard deviation of true water
quality concentrations).
(
)
Eq. 3. 1 (Cochran, 1963)
Eq. 3.2 (USEPA, 1997)
Eq. 3.3 (Zar, 1984)
49
49
Where, n is number of samples, no is the initial estimate, N is the total number of possible
observations, t is the student’s t value, s’ is sample standard deviation, S is population
standard deviation, d is absolute margin of error, α is a probability of committing a Type
I error, β is a probability of committing a Type II error, and v is degrees of freedom.
Robertson (2003) indicated that the most extensive sampling strategies did not always
lead to accurate load estimates. For example, Haggard et al. (2003) defined storm events
as times when the flow stage exceeds 1.5 m. The daily flow stage was less than 1.5 m
approximately 80 % of the time during their study period. USGS (2001) used the
program PART (USGS, 1998) to separate runoff and baseflow from streamflow, and then
the water quality data collected on days for which baseflow was less than 60 % of
streamflow were designated as water quality data from storm events. They found that use
of water quality samples from fifty percent of storm events led to accurate and precise
load estimates (Haggard et al., 2003; USGS, 2001). In the previous chapter, it was found
that approximately twenty to thirty percent of storm samples were required for accurate
and precise annual pollutant load estimates.
LOAD ESTimator (LOADEST, Runkel et al., 2004) has 11 regression models to estimate
constituent loads in streams and rivers using streamflow, constituent concentration, and
regression model coefficients. The model calibrates regression model coefficients using
three statistical methods, which are Adjusted Maximum Likelihood Estimation (AMLE),
Maximum Likelihood Estimation (MLE), and Least Absolute Deviation (LAD) (Runkel
et al. 2004). The AMLE and MLE methods are appropriate when the calibration model
50
50
error (or residuals) follows a normal distribution (Cohn et al. 1992b; Helsel and Hirsch,
2002), and the LAD assumes the errors are independently and identically distributed
random variables (Powell, 1983). The model has been used to estimate daily pollutant
loads for various water quality parameters with various sample sizes (or sampling
strategies) (Table 3.1).
Table 3.1 Various water quality data for LOADEST uses
Water Quality Parameter Sample Size Period Num. of
Sites Reference
Mercury 30-47 samples (Monthly
sampling) 2002-2006 8 Brigham et al.(2009)
Suspended sediment ±30 samples
(6-8 per year) 2001-2005 5
Dornblaser and
Striegl (2009)
Chromophoric dissolved
organic matter 39 samples 2004-2005 1 Spencer et al. (2009)
NOx-N, NH3-N, Total
Phosphorus
88-155 samples
(Monthly sampling) 1992-2006 18 Carey et al. (2011)
Total Nitrogen,
Total Phosphorus,
Total Suspended Solids
Monthly sampling 1970-2009 12 Das et al. (2011)
Total Nitrogen 54-152 samples 12-22 years 18
Oh and
Sankarsaubramanial
(2011)
Soluble reactive
phosphorus, Total
phosphorus
Weekly sampling 1998-2007 8 Duan et al. (2012)
Chapter 2 showed that water quality data should include an appropriate portion of water
quality samples from storm events rather than a fixed number of water quality samples.
In other words, an appropriate water quality sampling strategy is required to accurately
estimate annual pollutant loads using approaches like LOADEST. Therefore, the
objectives of the study were to: 1) identify the correlation between LOADEST model
behavior and water quality datasets for various proportions of water quality data from
51
51
storm events, and 2) suggest an approach to prepare water quality datasets for annual
pollutant load estimation using LOADEST.
3.3 Methodology
3.3.1 Water Quality Data Statistics for Annual Load Estimates
In the Chapter 2, daily sediment data were collected from 211 streams from the USGS
Water-Quality Data for the Nation (http://waterdata.usgs.gov/nwis/qw) and the National
Center for Water Quality Research of Heidelberg University
(http://www.heidelberg.edu/academiclife/distinctive/ncwqr), the daily data were
subsampled using six sampling strategies, the 9 regression models were run in
LOADEST, and the estimated annual sediment loads were compared to the measured
annual sediment loads. Regression model number 3 (Equation 3.4) provided the most
accurate and precise annual sediment load estimates, and therefore the model was
selected for use in this study.
Eq. 3.4 (Runkel et al. 2004)
∑
∑ Eq. 3.5 (Cohn et al. 1992a)
Where, L is load, log(Qi) is “log(Si) - ”, dtime is “decimal time - center of decimal time”,
ax are coefficients to calibrate, Si is log(streamflowi), is mean of log(streamflow), and
is center of log(streamflow).
52
52
The number of subsampled water quality datasets for each stream was 98 (i.e. (7 for
weekly + 14 for biweekly + 28 for monthly sampling strategies) × 2 (with or without
storm event)). Therefore, 20,678 (98 subsampled datasets × 211 streams) annual sediment
load estimates from Chapter 2 were used in the study.
The annual sediment load estimates for the regression model were explored to identify
what factors (or statistics) of model inputs affected annual load estimates. LOADEST
requires two inputs, one is to calibrate the regression model coefficients (i.e. water
quality and streamflow datasets), and the other is to estimate daily loads (i.e. streamflow
data). Thus, various statistics were derived from the subsampled input datasets (Table
3.2). The statistics for calibration and estimation data listed in Table 3.2 were assumed to
be possibly correlated to annual sediment load estimates.
53
53
Table 3.2 Statistics in Calibration and Estimation Data
From Calibration Data From Estimation Data
Q1
Minimum, Maximum, Mean, Standard
deviation
Minimum, Maximum, Mean,
Standard deviation
C2
Minimum, Maximum, Mean, Standard
deviation
Minimum, Maximum, Mean,
Standard deviation
Q, C, and
L3
Correlation Coefficient of :
Q and C, log(Q) and C, (log(Q))2 and
C, Q and L, log(Q) and L, (log(Q))2
and L
Coefficient of determination of :
Q and C, log(Q) and C, (log(Q))2 and
C, Q and L, log(Q) and L, (log(Q))2
and L
Percentage of Q with C data in high, moist, mid-range, dry, and low flow
regimes4
Minimum Q in calibration data / Minimum flow in estimation data
Maximum Q in calibration data / Maximum flow in estimation data
Mean Q in calibration data / Mean flow in estimation data
Standard deviation Q in calibration data / Standard deviation Q in
estimation data 1Q: streamflow data,
2C: water quality data (i.e. concentration),
3L: load (multiplication of
measured streamflow by water quality data), 4 Flow Regimes: defined by flow
frequencies (USEPA, 2007)
3.3.2 Water Quality Data Selection for LOADEST runs
The 20,678 annual sediment load estimates from LOADEST regression model number 3
were analyzed with the input data statistics (Table 3.2). Following these runs, there was a
need to run LOADEST to investigate regression model behavior for calibration data
characteristics. Thus, 5 USGS stations were selected from the 211 streams. The USGS
stations selected had long-term, daily water quality data, and the drainage areas ranged
from 12.5 km2 to 814,810 km
2 (Table 3.3). LOADEST requires at least 12 water quality
data to calibrate the model coefficients, and also it is limited to use a maximum of 2,440
(approximately 6 years of daily data) water quality data samples (Runkel et al. 2004).
Since the subsampling methods in the study included using all data (i.e. calibration data
54
54
period and interval are the same as estimation data period and interval), each water
quality dataset collected in the study was split into two datasets. For instance, the daily
water quality dataset of 10 years was split into two water quality datasets of 5 years.
Therefore, 10 water quality datasets were prepared from 5 USGS stations.
Table 3.3 Daily Sediment Data from USGS Stations
Station Number Station Name Data Period Drainage
Area (km2)
02119400 Third Creek near Stony Point, NC 1959-1968 12.5
07287150 Abiaca Creek near Seven Pines, MS 1993-2002 246.6
03265000 Stillwater River at Pleasant Hill, OH 1967-1973 1302.8
12334550 Clark Fork at Turah Bridge nr Bonner,
MT 1993-2002 9430.1
06486000 Missouri River at Sioux City, IA 1992-1999 814,810.3
3.4 Results and Discussions
3.4.1 Required Statistics for Annual Load Estimates
The means of flow in calibration data (MFC) were correlated to the errors in estimated
pollutant loads (Equation 3.6). There were no notable correlations between the errors and
the other statistics listed in Table 3.2. Moreover, the other statistics did not show
correlations to annual sediment load estimates categorized by streamflow flashiness,
drainage area, and geographical location (states).
Eq. 3.6
However, MFCs were correlated to annual sediment load estimates, LOADEST
underestimated loads with small MFCs and overestimated loads with large MFCs (Figure
55
55
3.1). This correlation (or trend) was identified through analysis of the load estimates in
the 211 streams. Chapter 2 showed that the portion of water quality data from storm
events used in creating the model was correlated to errors in annual sediment load
estimates. The correlation between load estimates and MFCs corresponded to these
results, since larger MFCs had more water quality data from storm events.
(a) USGS Station Number 01357500
56
56
(b) USGS Station Number 01463500
(c) USGS Station Number 01470500
57
57
(d) USGS Station Number 01481000
Figure 3.1 Correlation between Errors and Mean of Flow in Calibration Data
While the correlation was readily identified, the MFCs of the annual sediment load
estimates for a value of 0 % error differed for the 211 streams (see Figure 3.1). For
instance, annual sediment load estimates were the same as measured loads when MFC
was approximately 650 cubic meters per second (cms) for USGS Station 01463500, but
MFC was approximately 14 cubic meters per second for USGS Station 01481000 (Figure
3.1). Therefore, annual sediment load estimates with errors from -10% to +10% were
taken to be acceptable load estimates, and these annual sediment load estimates were
extracted to investigate correlation between MFCs and characteristics of the 211 streams.
The MFCs were correlated to mean flow in estimation (MFE) data and the MFCs were
slightly greater than MFEs (Figure 3.2).
58
58
Figure 3.2 Correlation between Mean of Flows in Calibration and Estimation Data
As a correlation between MFC and MFE was found, a linear regression to estimate a
required MFC was derived (Equation 3.8) using the formula for linear regression
(Equation 3.7).
∑ ∑ ∑
∑ ∑
∑ ∑
Eq. 3.7
Eq. 3.8
59
59
Where, y is a dependent variable, x is an independent variable, n is the number of
variables, LRS is linear regression slope, and b is constant.
The required MFCs were computed by the regression equation using the MFEs from 211
streams, and the coefficient of determination (R2) between the required MFCs estimated
by the regression equation and the MFCs from subsampled water quality data in 211
streams was 0.98 (Figure 3.3).
Figure 3.3 Required MFC by Regression Equation
60
60
3.4.2 Mean Flow in Calibration Data and Annual Load Estimates
LOADEST runs were performed to identify the correlation between MFCs and load
estimates. The 10 water quality datasets from 5 USGS stations were subsampled based on
flow size. Water quality datasets were in sequence by date, so the datasets were
manipulated based on flow size in two ways to be ascending and to be descending before
the data were subsampled. The first subsampled datasets from the ascending dataset were
composed of the smallest 12 flow data (i.e. the smallest 12 flow data with water quality
data for calibration data) from original datasets, since LOADEST requires at least 12
water quality data samples with flow data. The first subsampled dataset had the minimum
MFC from the original dataset. The second subsampled dataset from the ascending
dataset was composed of the smallest 42 flow data from the original dataset, which was
30 flow data added to the first subsampled dataset. In the same manner, the third
subsampled dataset of the ascending dataset was composed of the smallest 72 flow data
form the original dataset. In other words, 30 flow data were added in each subsampling
until all data were included. This approach was used to explore how LOADEST
performed with data biased toward low flow. In this subsampling method, the water
quality data from the largest flow were added in the last subsampling, therefore the model
extrapolated with all subsampled datasets, except the last subsampled dataset, which used
all data.
For the method of subsampling the descending dataset, the first subsampled dataset was
composed of the greatest 12 flow data from the original dataset, and the second
subsampled dataset had the greatest 42 flow data. As with the other subsampling method,
61
61
30 data were added until the calibration data included all data. The first subsampled
dataset had the maximum MFC from the original dataset. The load estimates were not
extrapolated, but the data were biased toward high flow (or storm events). These
sampling methods are not practical in a water quality monitoring program, but they were
used for evaluation of model behaviors with MFCs.
Regression models predict loads based on the correlation between flow and water quality
data. The slope, for instance a1 in equation 3.4, is a key factor in the correlation between
flow and concentration (or load), and so the slope for the measured data used to calibrate
LOADEST model coefficients was compared to the calibrated slope (a1) in LOADEST.
In other words, linear regression slopes (LRSs) between flow and concentration data in
calibration data were computed using equation 3.7, and the slope coefficients (a1) in the
LOADEST regression model (Equation 3.4) were derived from all load estimates.
Ten water quality datasets from five streams were subsampled to run LOADEST with
subsampled datasets. The five USGS stations selected in the study had different
geological locations and drainage areas. However, the model behaviors by changes in
MFC were similar. Both LRSs (of calibration data) and a1 (from LOADEST) fluctuated
when MFCs were too small or too great (Figure 3.4). Therefore, the errors of load
estimates were fluctuating when MFCs were too small or too great. In other words, the
model showed low precision with the data biased toward either low or high flows (Figure
3.5). This indicates that there is a limitation on reproduction of the true load with water
quality datasets biased toward low or high flow. Moreover, water quality datasets biased
62
62
toward low or high flow could lead to load estimates that differ significantly from true
loads. This is because the regression model requires only flow and water quality data, and
therefore load estimates close to true loads cannot be expected if the data are too biased
to reproduce true load.
It might be thought that use of more water quality sample data in estimating loads would
lead to load estimates that are closer to true loads. This premise was examined using all
data to calibrate regression model coefficients. Although load estimates using all data
were close to measured annual loads, use of all data did not necessarily lead to the
smallest error in annual sediment load estimates. For instance, the error of load estimates
using all data were -13.0 % and -1.7 % with the data from USGS Station Number
02119400 (Figure 3.5 (a)) and 06486000 (Figure 3.5 (b)), while it was not difficult to find
load estimates with smaller error than the load estimates using all data. Moreover, when
MFCs were close to the required MFCs based on the regression equation (Equation 3.8,
Table 3.4), the errors were smaller than the errors of the load estimates using all data.
Therefore, a water quality dataset should consist of samples associated with appropriate
flows (e.g. to be the required MFC by Equation 3.8) rather than using data from extensive
sampling strategies (e.g. daily water quality data collection).
The percentage of calibration data from high flow (PCH) conditions was computed in
each sediment load estimate (Table 3.4). High flow was defined as the upper 10 % of
flows for a given analysis period (USEPA, 2007). PCHs by the regression equation
ranged from 16.7 % (Station 12334550 from 1998 to 2002) to 36.8 % (Station 02119400
63
63
from 1959 to 1963). Haggard et al. (2003) and USGS (2001) suggest that 50 % of water
quality samples used in estimating loads should come from storm events, however, there
are differences in the definitions of storm events. Haggard et al. (2003) defined storm
events as flow stages exceeding 1.5 m which was approximately 20 % of days during
their study period, while the storm events (i.e. PCH) in the study were defined as the
upper 10 % of flows for a given period.
The results presented here indicate that the regression equation approach using Equation
3.8 can be used, for water quality monitoring programs intended to estimate annual
sediment loads using approaches like LOADEST, in the following way:
1) Compute MFCo using the regression equation with a mean flow of historical data prior
to initiating a water quality monitoring program,
2) Collect a few water quality samples based on MFCo,
3) Compute MFCi using the regression equation with the mean flow from the beginning
of the water quality monitoring program,
4) Collect water quality samples from low flow if MFCi is greater than the required MFC
by regression equation, collect water quality samples from high flow (storm events) if
MFCi is smaller than the required MFC by regression equation,
5) Repeat processes 3) and 4) to the end of water quality monitoring program.
However, collecting water quality samples only for the flow regime close to the required
MFC needs to be avoided, because it would be biased toward a certain regime of flow
data.
64
64
(a) USGS Station Number 02119400
(b) USGS Station Number 06486000
Figure 3.4 Comparison of Slopes from Linear Regression Formula (LRS in equation 3.7)
and Calibrated LOADEST Model (a1 in equation 3.4)
65
65
(a) USGS Station Number 02119400
(b) USGS Station Number 06486000
Figure 3.5 Annual Sediment Load Estimates by MFCs
66
66
Table 3.4 Comparison of Errors between Regression Equation and All Data
USGS Station Number
(Data Period)
Error of Load Estimates (%)
(PCH, %)
Regression1 All Data
02119400 (1959-1963) 6.4
(36.8)
-13.0
(11.1)
02119400 (1964-1968) 2.5
(36.2)
-8.7
(10.3)
07287150 (1993-1997) 13.0
(16.8)
22.4
(10.1)
07287150 (1998-2002) 8.1
(17.4)
-5.1
(10.1)
03265000 (1967-1969) 14.8
(20.3)
-29.5
(10.1)
03265000 (1970-1973) -10.8
(18.8)
-39.7
(10.2)
12334550 (1993-1997) 7.5
(17.1)
-16.6
(10.1)
12334550 (1998-2002) 0.7
(16.7)
-12.7
(10.3)
06486000 (1992-1995) -2.9
(18.4)
-1.7
(10.1)
06486000 (1995-1999) -5.8
(21.7)
-2.8
(10.1) 1Regression: Load Estimates when MFCs were close to the required MFC computed by
regression equation
3.4.3 Improvement of the Poorest Annual Load Estimates
Using the regression equation approach to estimate a required MFC can be employed
from the beginning of water quality monitoring programs. However, there is also a need
to explore an approach to employ the regression equation for water quality datasets that
have already been collected. When the regression equation is employed from the
beginning of water quality monitoring programs, water quality samples will be added
based on MFCs. However, if a water quality monitoring program has already been
finished, or if a water quality dataset has been collected by others (e.g. EPA, USGS, etc.),
water quality data cannot be added to obtain the required MFC by the regression equation.
67
67
Another way to obtain the required MFC would be to exclude water quality data from the
original data.
To explore this concept, five load estimates from Chapter 2 were selected, which had
poor load estimates and had enough water quality data to allow data to be excluded when
estimating loads. LOADEST runs were performed for the 5 water quality datasets, with
water quality samples excluded. The water quality datasets were from monthly or
biweekly fixed interval sampling strategies, and the number of water quality data samples
were 84, 120, and 261 (Table 3.5). All MFCs from the calibration data were smaller than
the required MFCs based on the regression equation. Therefore, water quality samples
from the smallest flow data were removed, and LOADEST runs were performed for the
reduced water quality datasets.
Load estimates improved as water quality samples were removed. For instance, load
estimates using the original datasets showed large error, however, the error became
smaller with MFC increases due to exclusion of water quality data associated with the
smallest flow data (Figure 3.6). For the five water quality datasets, the errors ranged from
132.8 % to 223.0 % with the original dataset, while the errors ranged from -27.0 % to 1.7 %
when the required MFCs from the regression equation were obtained for the LOADEST
calibration datasets. The original datasets were biased toward low flow; in other words,
MFCs were smaller than the required MFCs. The results indicate that a water quality
dataset needs to consist of an appropriate portion of water quality data from low and high
flows rather than large numbers of water quality samples, because the water quality
datasets for a small number of data but with appropriate MFC demonstrated smaller
68
68
errors than the water quality datasets with larger numbers of data (Table 3.5). The PCHs
of the original datasets (‘Original’ in Table 3.5) were approximately 10 %, while they
need to be 20-30 % (Chapter 2). The PCHs of water quality datasets based on the
regression equation were approximately 30 %, except the water quality dataset for station
05291000 since the PCH was 13.6 %.
Figure 3.6 Load Estimate Improvement when Excluding Water Quality Samples (USGS
Station Number 02119400, Monthly Fixed Sampling Strategy on 19th
of Every Month)
69
69
Table 3.5 Improvement of the Poorest Load Estimates by MFC Fitting
USGS Station
Number
(Sampling Strategy)
MFE1 R. MFC2
MFC3
(Error, %)
Num. Data4
(PCH, %)
Original5 Regression
6 Original
5 Regression
6
02119400
(Monthly on 18th)
0.18 0.36 0.19
(195.5)
0.35
(1.7)
120
(10.0)
45
(26.7)
02119400
(Monthly on 19th)
0.18 0.36 0.22
(223.0)
0.36
(-3.0)
120
(12.5)
57
(26.3)
02119400
(Monthly on 20th)
0.18 0.36 0.21
(132.8)
0.36
(-13.0)
120
(12.5)
52
(28.8)
02119400
(biweekly on 12th)
0.18 0.36 0.17
(144.7)
0.36
(1.4)
261
(7.7)
65
(30.8)
05291000
(Monthly on 25th)
1.43 2.50 1.78
(204.0)
2.51
(-27.0)
84
(9.5)
59
(13.6) 1MFE: Mean Flow (cms) of Estimation Data,
2R. MFC: Required MFC (cms) by the
Regression Equation (Equation 3.8), 3MFC: Mean Flow (cms) of Calibration Data,
4Num.
of Data: Number of data in calibration data, 5Original: Calibration Data from Chapter 2,
6Regression: Calibration Data after the Exclusion of Minimum Flow Data
3.5 Conclusions
Regression models are used to estimate pollutant loads or concentrations for a given time
sequence, and also for annual load estimation. LOADEST is used for various water
quality parameters in various sample sizes. In the study, one of the regression models
from LOADEST was evaluated with various sample sizes. Several distinct features were
found in the study: 1) the mean of flow to calibrate (MFC) regression model coefficients
were correlated to the mean of flow to estimate (MFE) pollutant loads, 2) the load
estimates differed significantly from the measured loads if MFC is too small or too great,
3) the use of all data having identical data intervals and period to estimation data did not
lead to the smallest error against the measured load, 4) the water quality dataset of
appropriate MFC showed smaller errors than the water quality dataset for a large amount
of data but biased toward low or high flows, and 5) exclusion of water quality data to fit
the required MFC improved annual load estimates. The results imply that a water quality
70
70
dataset needs to represent the distribution of given data; in other words, it is required not
to be biased toward a certain flow regime.
Calibration data needs to be representative of the watershed characteristics or to be
representative of the period used to estimate pollutant loads. In other words, calibration
data for pollutant regression models need to include an appropriate portion of storm
samples that are not too biased toward specific flow conditions. This is why the water
quality datasets collected less frequently but including an appropriate portion of storm
samples led to smaller differences in the predicted annual pollutant loads compared to
predicted loads for the water quality datasets collected more frequently but including
inappropriate portions of storm samples.
A regression equation was developed to compute the required mean flow in calibration
data. The regression equation is applicable from the beginning of water quality
monitoring programs. Furthermore, the regression equation approach can be used to
exclude water quality data if a water quality dataset has already been collected. The
regression equation is expected to be employed, if the purpose is to estimate annual
sediment loads using LOADEST.
71
71
3.6 References
Brigham, M. E., Wentz, D. A., Aiken, G. R., Krabbenhoft, D. P., 2009. Mercury cycling
in stream ecosystems. 1. water column chemistry and transport. Environmental
Science & Technology 43, 2720-2725.
Burn, D. H., 1990. Real-time sampling strategies for estimating nutrient loadings. Journal
of Water Resources Planning and Management 116(6), 727-741.
Cochran, W. G., 1963. Sampling Techniques (Second Edition). John Wiley and Sons,
New York, New York. Chapter 4. The Estimation of Sample Size. 75.
Cohn, T. A., Caulder, D. L., Gilroy, E. J., Zynjuk, L. D., Summers, R. M., 1992a. The
Validity of a simple statistical model for estimating fluvial constituent loads: an
empirical study involving nutrient loads entering Chesapeake Bay. Water Resources
Research 28(9), 2353-2363.
Cohn, T. A., Gilroy, E. J., Baier, W. G., 1992b. Estimating fluvial transport of trace
constituents using a regression model with data subject to censoring. paper presented
at the Joint Statistical Meetings, Am. Stat. Assoc., Boston.
Coynel, A., Schafer, J., Hurtrez, J. Dumas, J., Etcheber, H., Blanc, G., 2004. Sampling
frequency and accuracy of SPM flux estimates in two contrasted drainage basins.
Science of the Total Environment 330, 233-247.
Carey, R. O., Migliaccio, K. W., Brown, M. T., 2011. Nutrient discharges to Biscayne
Bay, Florida: Trends, loads, and a pollutant index. Science of the Total Environment
409, 530-539.
72
72
Das, S. K., Ng, A. W. M., Perera, B. J. C., 2011. Assessment of nutrient and sediment
loads in the Yarra river catchment. 19th International Congress on Modelling and
Simulation, Perth, Australia. 3490-3496.
Dornblaser, M. M., Striegl, R. G., 2009. Suspended sediment and carbonate transport in
the Yukon river basin, Alska: Flouxes and potential future responses to climate
change. Water Resources Research 45, W06411, doi:10.1029/2008WR007546
Duan, S., Kaushal, S. S., Groffman, P. M., Band, L. E., Belt, K. T., 2012. Phosphorus
export across an urban to rural gradient in the Chesapeake Bay watershed. Journal of
Geophysical Research 117, G01025, doi:10.1029/2011JG001782.
Gilroy, E. J. Hirsch, R. M., Cohn, T. A., 1990. Mean square error of regression-based
constituent transport estimates. Water Resources Research 26(9), 2069-2077.
Haggard, B. E., Soerens, T. S., Green, W. R., Richards, R. P., 2003. Using regression
methods to estimate stream phosphorus loads at the Illinois River, Arkansas. Applied
Engineering in Agriculture 19(2), 187–194.
Helsel, D. R., Hirsch, R. M. 2002. Statistical Methods in Water Resources. U. S.
Geological Survey Techniques of Water-Resources Investigations of the United
States Geological Survey Book 4, Hydrologic Analysis and Interpolation, 352.
Henjum, M. B., Hozalski, R. M., Wennen, C. R., Novak, P. J., Arnold, W. A., 2010. A
comparison of total maximum daily load (TMDL) calculations in urban streams
using near real-time and periodic sampling data. Journal of Environmental
Monitoring 12, 234-241.
73
73
Horowitz, A. J., 2003. An evaluation of sediment rating curves for estimating suspended
sediment concentrations for subsequent flux calculations. Hydrological Processes 17,
3387-3409.
Johnson, A. H., 1979. Estimating solute transport in streams from grab samples. Water
Resources Research 15(5), 1224-1228.
Johnes, P. J., 2007. Uncertainties in annual riverine phosphorus load estimation: Impact
of load estimation methodology, sampling frequency, baseflow index and catchment
population density. Journal of Hydrology 332, 241-258.
King, K. W., Harmel, R. D., 2003. Considerations in selecting a water quality sampling
strategy. Transaction of the ASAE 46(1), 63-73.
Kronvang, B., Bruhn, A. J., 1996. Choice of sampling strategy and estimation method for
calculating nitrogen and phosphorus transport in small lowland streams.
Hydrological Processes 10, 1483-1501.
Oh, J., Sankarasubramanian, A., 2011. Interannual hydroclimatic variability and its
influence on winter nutrients variability over the southeast United States. Hydrology
and Earth System Sciences Discussions 8, 10935-10971.
Powell, J. L. 1984. Least absolute deviations estimation for the censored regression
model. Journal of Econometrics 25, 303-325.
Robertson, D. M., 2003. Influence of different temporal sampling strategies on estimating
total phosphorus and suspended sediment concentration and transport in small
streams. Journal of the American Water Resources Association 39(5), 1281-1308.
74
74
Robertson, D. M., Roerish, E. D., 1999. Influence of various water quality sampling
strategies on load estimates for small strams. Water Resources Research 35(12),
3747-3759.
Robertson, D. M., Richards, K. D., 2000. Influence of different temporal sampling
strategies on estimating loads and maximum concentrations in small streams.
Proceedings of the National Water Quality Monitoring Council National Monitoring
Conference, Austin TX. 209-223.
Runkel, R. L., Crawford, C. G., Cohn, T. A., 2004. Load Estimator (LOADEST): A
Fortran program for estimating constituent loads in streams and rivers. U.S.
Geological Survey Techniques and Methods, Book 4, Chap. A5
Spencer, R. G. M., Aiken, G. R., Bulter, K. D., Dornblaser, M. M., Striegl, R. G., Hernes,
P. J., 2009. Utilizing chromophoric dissolves organic matter measurements to derive
export and reactivity of dissolved organic carbon exported to the Arctic Ocean: A
case study of the Yukon river, Alaska. Geophysical Research Letters 36, L06401,
doi:10.1029/2008GL036831.
USEPA (U.S. Environmental Protection Agency), 1997. Monitoring guidance for
determining the effectiveness of nonpoint source controls. EPA/841-B-96-004, U.S.
EPA Office of Water Nonpoint Source Control Branch. Washington D.C.
USEPA (U.S. Environmental Protection Agency), 2007. An approach for using load
duration curves in the development of TMDLs. Washington, DC 20460.
USGS (U. S. Geological Survey), 1998. Computer programs for describing the recession
of ground-water discharge and for estimating mean ground-water recharge and
discharge from streamflow records-update. Water-Resources Investigation Report
98-4148. Reston, Virginia.
75
75
USGS (U. S. Geological Survey), 2001. Effect of storm-sampling frequency on
estimation of water-quality loads and trends in two tributaries to Chesapeake Bay in
Virginia. Water-Resources Investigations Report 01-4136. Richmond, Virginia.
Zar, J. H., 1984. Biostatistical Analysis (Second Edition). Prentice Hall, New Jersey.
Chapter 8. One-Sample Hypotheses. 110.
76
76
CHAPTER 4. A WEB TOOL FOR STORET/WQX WATER QUALITY DATA
RETRIEVAL AND BEST MANAGEMENT PRACTICE SCENARIO
IDENTIFICATION
4.1 Abstract
Total Maximum Daily Load is a water quality standard used to regulate and protect the
quality of water in streams, rivers and lakes. A wide range of approaches is used
currently to develop TMDLs for impaired streams and rivers. Flow and load duration
curves (FDC and LDC) have been used in many states to evaluate the relationship
between flow and pollutant loading along with other models and approaches. A web-
based LDC Tool has been developed to facilitate development of FDC and LDC as well
as to support other hydrologic analyses. In this study, the FDC and LDC tool was
enhanced to allow collection of water quality data via the web and to assist in
establishing cost-effective best management practice (BMP) implementations. The
enhanced web-based tool uses water quality data from the US Geological Survey and
from the US Water Quality Portal via web access. Moreover, the web-based tool
identifies required pollutant reductions to meet standard loads and suggests a BMP
scenario to meet this reduction, based on ability of BMPs to reduce pollutant loads and on
BMP establishment and maintenance costs. In this study, flow and water quality data
were collected via web access to develop LDC and to identify the required reductions in
load. BMP scenario suggestions are based on the EPA Spreadsheet Tool for the
77
77
Estimation of Pollutant Load model with the goal of achieving the required pollutant
reduction at the least cost.
Keywords: Flow Duration Curve, Load Duration Curve, STORET/WQX, Total
Maximum Daily Load
4.2 Introduction
The Total Maximum Daily Load (TMDL) is a measure of the maximum load of a
pollutant and is used as a water quality standard to regulate water quality of streams.
Section 303(d) of the Clean Water Act requires states and other defined authorities to
develop lists of impaired waters, indicating that jurisdictions having contaminated water
need to establish priority rankings and to develop TMDL plans (USEPA, 2008). A wide
range of approaches is currently used to develop TMDLs for impaired streams. Different
models have apparent differences, purposes, applicability, and uncertainties, and have
various structures and assumptions for modeling sophisticated systems found in nature. In
other words, models have various advantages and disadvantages, and it may be necessary
to combine two or more models when solving a problem (Babbar-Sebens and
Karthikeyan, 2009; Shen and Zhao, 2010; Cleland, 2003; USEPA, 2008). Load duration
curves (LDCs) have been used in many states along with other models or approaches to
evaluate the relationship between flow and pollutant loading (Alaska Department of
Environmental Conservation, 2008; Babbar-Sebens and Karthikeyan, 2009; Chen et al.,
2011; IDEM, 2007; Minnesota Pollution Control Agency, 2010; New Hampshire
Department of Environmental Services, 2008; Tennessee Department of Environment
and Conservation, 2005; Texas Institute for Applied Environmental Research, 2010).
78
78
Flow duration curves (FDCs) and LDCs can be plotted either manually (Kim and Yoon,
2011; Babbar-Sebens and Karthikeyan, 2009; Shen and Zhao, 2010) or by computer
programs (Johnson et al., 2009; Kim et al., 2012). The steps to plot the curves are: 1)
collecting flow and water quality data, 2) manipulating the data from chronological
division into cumulative frequency of flow data, 3) calculating the standard pollutant
loads by multiplying the numerical target concentration by flow data, 4) calculating
existing pollutant loads by multiplying the numerical observed water quality data by flow
data, and 5) plotting FDC and LDC against the cumulative frequency (Shen and Zhao,
2010). The method is a relatively simple approach for developing TMDLs, as little
expertise and time are required to prepare input data and to develop FDCs and LDCs.
However, historical records of streamflow and water quality data associated with the
streamflow data are required. Both streamflow and water quality datasets need to be
imported into spreadsheets, and the datasets need to be manipulated so that the other
processes can be followed to develop FDCs and LDCs.
A web-based LDC Tool (https://engineering.purdue.edu/~ldc/; Kim et al., 2012) has been
developed to automate the processes for generation of FDC and LDC. The Web-based
LDC Tool has numerous benefits. One of them is that it is easy to use and operate the
tool as no installation or other software is required due to the fact that it is a web-based
tool. Further, it provides a user friendly interface to identify sampling stations using
Google Maps. The web-based tool allows use not only of a user’s streamflow data,
fundamental data for FDC and LDC, but also U.S. Geological Survey (USGS)
79
79
streamflow data retrieved from the USGS data server for stations selected via a Google
Maps interface.
In addition, the web-based tool employs LOAD ESTimator (LOADEST; Runkel et al.,
2004). Generally, water quality data measurement is costly and requires many procedures,
and the data size may not be sufficient to perform TMDL analysis. LOADEST can be
used to generate daily pollutant loads using intermittent water quality data. The web-
based tool interacts with LOADEST, preparing inputs and plotting LDC with the results
of LOADEST.
The web-based tool provides various benefits in development of FDC and LDC, deriving
streamflow data from the USGS data server, integrating with LOADEST, and allowing
additional analysis (e.g. seasonal variations and surface flow separation). However, users
were required to prepare water quality data, as water quality data are an essential input in
LDC development. Also, upgrading the web-based tool was required to improve ease-of-
use and to allow further analysis of Best Management Practice (BMP) recommendation.
Thus, the objectives of the study were to: 1) enhance the Web-based LDC Tool to allow
collecting water quality data via web access, and 2) develop a tool module to suggest
BMP scenarios to reduce annual pollutant load to meet required pollutant load reduction.
80
80
4.3 Methodology
4.3.1 Module Development to Use Water Quality Data
Two datasets are required to develop LDC, which are flow and water quality datasets.
While the web-based tool provided use of USGS streamflow data via a Google Maps
interface, users must prepare water quality data manually. Water quality data can be
prepared by the users, or the data can be collected from USGS, US Environmental
Protection Agency (EPA), or the Water Quality Portal (WQP). Therefore, three modules
were developed to automate collecting water quality data and to facilitate use of the data
in the web-based tool.
The first module requests and receives USGS water quality data which are provided
through web access (http://waterdata.usgs.gov/nwis/qw). Similar to the use of the USGS
streamflow data, a module with a Google Maps interface displaying the locations of the
USGS water quality stations was developed (Fig. 4.1). As the user finds and selects the
USGS station of interest on the Google Maps interface, the module requests and receives
water quality data from the USGS server. Since the dataset from the USGS is for various
water quality parameters, one of the water quality parameters must be selected by the
user, making the data for the selected water quality parameters available for use in the
web-based tool without any manual processes (e.g. data formatting).
The second module uploads and uses water quality data from the EPA My WATERS
Mapper (http://watersgeo.epa.gov/mwm/) application. The My WATERS Mapper
provides STOrage and RETrieval (STORET) data, and the web site allows downloading
81
81
a water quality data file formatted in comma separated values (CSV), extensible markup
language (XML), or keyhole markup language (KML). The downloaded file from the
EPA My WATERS Mapper consists of various water quality parameters. Therefore, a
module has been developed to upload the data file, to extract the water quality parameter
of interest, and to allow use of the extracted data in the web-based tool without any
manual processes.
The third module requests and receives water quality data from the Water Quality Portal
(WQP; http://www.waterqualitydata.us/), which provides use of water quality data from
the USGS, USEPA, and the National Water Quality Monitoring Council. The WQP
allows not only downloading water quality data but also requesting/receiving water
quality data by the web with location information (i.e. “Water Monitoring Location” and
“Organization ID”). A monitoring location database was built from the EPA
STORET/WQX locations (http://www.epa.gov/storet/) and was stored in a Purdue
University server. A module was developed to display the STORET/WQX locations, to
request water quality data from the WQP, to extract water quality parameters of interest,
and to convert data formats (Fig. 4.2).
In brief, the web-based tool has been upgraded: 1) to use the USGS water quality data for
the entire U.S. via web access, 2) to upload water quality data downloaded from the EPA
My WATERS Mapper, and 3) to use water quality data from STORET/WQX for the
entire U.S. via web access.
82
82
Figure 4.1 Google Maps Interface to Retrieve USGS Water Quality Data
83
83
Figure 4.2 Schematic Depicting Web-based Tool Access of Water Quality Data from
EPA STORET/WQX Location Database and Web Access to WQP
4.3.2 Module Development to Suggest BMP Scenarios
Flow data can be analyzed to evaluate hydrologic conditions and to interpret water
quality data. The FDC is divided into five regimes (or zones); high flows (0-10%), moist
conditions (10-40%), mid-range flows (40-60%), dry conditions (60-90%), and low flows
(90-100%). Streamflow patterns are related to watershed characteristics (Chen et al.,
2011; Sigua and Tweedale, 2003; May et al., 2001; Hsu et al., 2010; Grizzetti et al.,
2005). The five flow regimes in the FDC are helpful in understanding watershed
characteristics since the FDC shows the magnitude and frequency of flow. The FDC and
LDC are fundamental analyses used to develop TMDLs, identifying specific flow
regimes violating water quality standards.
84
84
A LDC for water quality criterion is developed by multiplying streamflow by the water
quality target (USEPA, 2007; Cleland, 2003). The pollutant loads exceeding standards in
each zone imply potentially different sources of pollutant loads, although the LDC
approach does not require any information about the source of watershed pollutants such
as landuse types. For instance, if the pollutant loads exceed the standard pollutant loads in
‘Dry Conditions’ or ‘Low Flows’, they can be identified as point sources or potentially
livestock in the case of an agricultural watershed. Pollutant delivery related to runoff
from riparian areas, from impervious areas in urban watersheds with light rain, or from
saturated soils may cause pollutant loads to exceed standards in ‘Moist conditions’ or
‘Mid-range Flow’. Pollutant loads exceeding standards in ‘High Flow’ typically result
from stream bank erosion, channel processes, and non-point source (NPS) pollutant loads
(USEPA, 2007).
Since the sources and causes of the pollutant loads exceeding standards in the flow
regimes are different, a BMP scenario needs to be based on the flow regime in which
pollutant loads are exceeded (Table 4.1). For instance, if the pollutant loads exceed the
standards in the ‘High-Flow’ regime, the BMPs categorized as ‘Post Development
BMPs’, ‘Streambank Stabilization’, or ‘Erosion Control Program’ are selectable (e.g.
diversion, streambank stabilization and fencing, porous pavement). If the pollutant loads
exceed the standards in the ‘Dry Condition’ regime, the BMPs categorized as ‘Riparian
Buffer Protection’ or ‘Municipal Wastewater Treatment Plant’ are selectable (e.g. filter
strip, runoff management system, waste storage facility) (USEPA, 2007).
85
85
Table 4.1 BMP Categories for Each Flow Regime (USEPA, 2007)
High-Flow Moist
Conditions
Mid-range
Flow
Dry
Conditions
Low
Flow
Implementatio
n
Opportunities
Post
Development
BMPs
Streambank
Stabilization
Erosion Control Program
Riparian Buffer Protection
Municipal Wastewater
Treatment Plant
The LDC approach allows simple analysis to determine if pollutant loads are exceeded in
any the five flow regimes, and identification of potential BMPs to address problems
identified since the sources and causes of pollutant loads for flow regimes are different.
The EPA Spreadsheet Tool for the Estimation of Pollutant Load (EPA STEPL; Tetra
Tech, 2011) is a spreadsheet model to compute annual runoff, sediment load, nutrient
loads, and 5-day biological oxygen demand (BOD5). The spreadsheet tool allows
estimation of various BMP implementations and Low Impact Development (LID)
practices so that pollutant load reduction for BMPs or LIDs can be computed. The
spreadsheet tool has a BMP database with 54 BMP efficiencies for nitrogen, phosphorus,
BOD, and sediment load reductions. The BMPs were categorized into the five
implementation categories in Table 4.1.
The Web-based LDC Tool was enhanced to identify the required pollutant reduction
percentage for each flow regime, and then the web-based tool makes lists of the BMPs
corresponding to the flow regimes responsible for exceeding the standards. The BMP
86
86
lists are associated with each landuse because the BMPs in the EPA STEPL database
were categorized by landuse types (Cropland, Forest, Feedlots, and Urban).
After the BMP lists are established, the web-based tool computes BMP implementation
costs (ct) using a cost function (Arabi et al., 2006) that requires establishment cost (c0),
ratio of annual maintenance cost to establishment cost (rm), interest rate (s), and BMP
design life (td).
[∑ ] Eq. 4.1
BMP establishment cost and ratio of annual maintenance cost were collected from
various documents (Table 4.2), but actual cost might differ for a given watershed.
Therefore, the module displays BMP costs from the database as a default and allows the
user to update BMP costs before the module suggests BMP scenarios for each landuse.
After BMP implementation costs are estimated by the cost function (Eq. 4.1), the web-
based tool establishes BMP scenarios for each landuse based on least BMP
implementation cost per unit of BMP efficiency. In other words, the web-based tool
computes BMP implementation costs for a pollutant reduction of 1 percent.
87
Table 4.2 Default BMP Costs for Landuses
Landuse BMP
Establishment
Cost
($/ha)
Maintenance Cost
(% of Establishment
Cost)
Reference
Cropland
Contour Farming 15 1 Pertsova, 2007
Filter strip1 21 10 Buckner, 2001
Reduced tillage systems 7 1 Kieser & Associates, 2008
Forest
Site preparation/hydro
mulch/seed/
fertilizer
3707 1 USEPA, 2005
Site preparation/straw/crimp/net 35,481 1 GLEC, 2008
Feedlots Filter strip1 21 10 Buckner, 2001
Urban
Alum treatment 1,112 0 Wisconsin DNR, 2003
Grass swales 1,730 5 USEPA, 1999
Infiltration Basin 7,413 3 USEPA, 1999
Infiltration Trench 22,239 5 USEPA, 1999
Porous Pavement 592,015 1 King and Hagan, 2011
Sand Filter 25,946 12 USEPA, 1999
Vegetated Filter Strips 2,224 4 USEPA, 1999
Weekly Street Sweeping 14,947 7 King and Hagan, 2011
Wetland Detention 6,178 2 USEPA, 1999 1Filter strip: the ratio of contributing drainage area to filter strip area is assumed to be 40:1.
88
88
4.4 Application of the Web Tool
A watershed was selected to demonstrate use of the web-based tool to develop LDC, to
compute required pollutant reduction percentage, and to recommend a BMP scenario to
reduce pollutant loads to achieve TMDL pollutant levels. A 254 km2 watershed in
northeast Indiana was selected. The landuses in the watershed are 33.7 % cropland (90
km2), 33.9 % pastureland (90.5 km
2), 6.3 % urban (17.0 km
2), and 7.7 % forest (20.6
km2). Flow data were collected from USGS station number 04177870 (Fish Creek near
Artic, Indiana; Fig. 4.3) and ranged from 0.1 m3/s to 38.5 m
3/s. Total suspended solids
data were collected by the module from EPA STORET (Fig. 4.4), and the data ranged
from 4.0 mg/l to 100.0 mg/l. The flow data period was from 1998-04-08 to 2007-12-05,
and the water quality data period was from 1999-04-06 to 2007-11-28. Therefore, the
data from 1999-04-06 to 2007-11-28 were selected to develop the LDC, and the water
quality target for the total suspended solids was set to 46.0 mg/l (IDEM, 2013). An
interest rate of 4.5% was used for the cost function to estimate BMP implementation
costs.
The LDC was developed using the web-based tool (Fig. 4.5). The watershed had
sediment loads exceeding standards in the High-Flow and Moist-Condition, and the
required pollutant reductions were 48.1% in High-Flow and 31.9% in Moist-Condition
flow regimes (Table 4.3). The web-based tool established BMP scenarios to reduce
sediment from both High-Flow and Moist-Condition flow regimes. In the BMP scenario
for cropland, the most cost-effective BMP was ‘reduced tillage systems’ with an
estimated annual cost of $7/ha/year, with ‘filter strip’ (estimated annual cost of
89
89
$7/ha/year) and ‘contour farming’ (estimated annual cost of $16/ha/year) the second and
third most cost-effective BMPs for cropland, respectively. Estimated annual cost for both
‘reduced tillage systems’ and ‘filter strip’ was $7/ha/year, however ‘reduced tillage
systems’ was the most cost-effective BMP since ‘reduced tillage systems’ is able to
reduce 75% of sediment losses while ‘filter strip’ is able to reduce 65% of sediment
losses based on the EPA STEPL BMP database. In the BMP scenario for urban, the most
cost-effective BMP was ‘vegetative filter strip’ ($5/ha/year), and ‘grass swales’
($397/ha/year) was the second most cost-effective BMP.
The web-based tool identifies BMP scenarios for each landuse, however, there is a need
to optimize the area to which a BMP is applied (BMPapplied) to identify the most cost-
effective BMP implementation plan. The most cost-effective BMP from the BMPs
identified is applied first. If application of the first BMP to the available area of
associated landuse (i.e. max level of BMP application) does not meet the required
pollutant load reduction, the second most cost-effective BMP needs to be applied.
Therefore, iterative simulations using another model to evaluate impacts of BMPs and
BMPapplied are required.
To demonstrate the iterative simulations described above, the EPA STEPL model was
applied with the BMP scenario for cropland to demonstrate the iterative simulations,
since cropland is dominant in the watershed. The BMPapplied for the first BMP (reduced
tillage systems) was specified at up to 54 km2 (60 % of cropland), in other words, it was
assumed that 40 % of cropland had this BMP already or would not adopt this practice.
90
90
During modeling, the BMPapplied for ‘reduced tillage systems’ was increased iteratively
until the estimated sediment reduction was greater than the required reduction (48.1%) or
until the BMPapplied met the maximum specified BMPapplied (54 km2). Estimated sediment
reduction was 42.3 % when the BMPapplied of ‘reduced tillage system’ was 54 km2.
Therefore, the second most cost-effective BMP, ‘filter strip’, was applied in combination
with ‘reduced tillage system’. Estimated sediment reduction was 48.3 %, when ‘reduced
tillage systems’ of 54 km2 and ‘filter strip’ of 36 km
2 were applied for cropland. There
was no further simulation required (e.g. increasing BMPapplied or applying more BMPs),
since estimated sediment reduction met the required sediment reduction. The estimated
annual cost was $62,968 which resulted from $37,781 for ‘reduced tillage system’
applied to 54 km2 and $25,187 for ‘filter strip’ applied to 36 km
2.
91
91
Figure 4.3 Flow Data Collection by USGS Flow Station Location Tool
92
92
Figure 4.4 Water Quality Data Collection by WQP Location Tool
93
93
Figure 4.5 Load Duration Curve for the Study Watershed
Table 4.3 Target Load, 90th
Percentile Load, and Required Reduction Percentage
Flow Regime Target Load
(tons/day)
90th
Percentile Load
(tons/day)
Required Reduction
(%)
High-Flow 42.1 81.1 48.1
Moist-Condition 11.6 17.0 31.9
Mid-range Flow 4.6 3.5 0.0
Dry-Condition 1.8 1.2 0.0
Low-Flow 0.8 0.2 0.0
4.5 Conclusions
Section 303(d) of the Clean Water Act indicates that the states or authorities need to
develop lists of impaired waters, establish priority rankings, and develop TMDL plans. A
wide range of approaches are used currently with flow and load duration curves in the
94
94
implementation of TMDL plans. The curves can be employed to identify the source of
pollutant loads, compute required pollutant reduction percentage against standard
pollutant loads, and establish BMP scenarios. While the LDC approach is simple, LDC
development can be time-consuming and opens the possibility of human error.
A web-based tool was developed previously to simplify the LDC development process,
however only flow data retrieval was automated by accessing USGS data in the tool,
while both flow and water quality data are essential in LDC development. Therefore, the
web-based tool was upgraded to allow use of water quality data from USGS and WQP
providing the water quality data of STORET/WQX for any location in the United States.
Moreover, the web-based tool now provides Google Maps interfaces to display and select
water quality locations of interest. In TMDL implementations, required pollutant
reductions must be computed and also used to establish BMP scenarios to meet the
standard loads. The pollutant loads exceeding standards in the five flow regimes implies
different sources of pollutant loads, therefore different BMPs need to be applied to
address the specific pollutant sources. BMP implementation costs are required to
facilitate identification of cost-effective BMP implementation plans. Therefore, the web-
based tool identifies the required pollutant reduction for each flow regime, makes lists of
BMPs able to reduce pollutant loads corresponding to the flow regime for which
pollutant loads are exceeded, and identifies the BMP with the least cost for each landuse.
As the tool has been upgraded, it is expected the web-based tool be useful for collection
of water quality data, identification of pollutant sources, and computation of required
95
95
pollutant reduction. Moreover, the web-based tool can be used to identify BMP scenarios
for simulation by models in developing watershed management plans.
Currently, the web-based tool suggests BMP scenarios to meet the standard load for one
water quality parameter such as nitrogen, phosphorus, BOD, or sediment. Therefore, the
web-based tool will be upgraded in the future to suggest BMP scenarios for cases in
which two or more water quality parameters must be considered. The web-based tool
identifies BMPs with the least cost for each landuse, however, the BMPs and the area to
which these BMPs are applied need to be optimized for cost-effective BMP
implementation. Therefore, in the future the Web-based LDC Tool will be integrated with
a hydrologic/water quality model and optimization code to identify cost-effective BMP
implementations.
96
96
4.6 References
Alaska Department of Environmental Conservation, February 2008. Total maximum
daily load (TMDL) for fecal coliform in the waters of Pederson Hill Creek in Juneau,
Alaska.
Arabi, M., Govindaraju R. S., Hantush, M. M., 2006. Cost-effective allocation of
watershed management practices using a genetic algorithm. Water Resources
Research 42, W10429, DOI: 10.1029/2006WR004931.
Babbar-Sebens, M., Karthikeyan, R., 2009. Consideration of sample size for estimating
contaminant load reductions using load duration curves. Journal of Hydrology 372,
118-123.
Buckner, E.R., 2001. An Evaluation of the Use of Vegetative Filter Strips on Agricultural
Lands in the Upper Wabash River Basin. Dissertation, Purdue University, West
Lafayette, Indiana, UMI Microform 3037543.
Chen, D., Lu, J., Wang, H., Shen, Y., Gong, D., 2011. Combined inverse modeling
approach and load duration curve method for variable nitrogen total maximum daily
load development in an agricultural watershed. Environmental Science Pollution
Research 18, 1405-1413.
Cleland, B., 2003. TMDL development from the “Bottom Up” - Part III: duration curves
and wet-weather assessments. National TMDL Science and Policy 2003 - WEF
Specialty Conference. Chicago, IL.
Great Lakes Environmental Center (GLEC), December 2008. National level assessment
of water quality impairments related to forest roads and their prevention by best
management practices.
97
97
Grizzetti, B., Bouraoui, F., Marsily, G. d., Bidoglio, G., 2005. A statistical method for
source apportionment of riverine nitrogen loads. Journal of Hydrology 304, 302-315.
Hsu, T., Kin, J., Lee, T., Zhang, H. X., Lu, S. L., 2010. A storm event-based approach to
TMDL development. Environmental Monitoring and Assessment 163, 81-94.
Indiana Department of Environmental Management (IDEM), July 2007. Total maximum
daily load for escherichia coli (E. coli) for the East Fork Whitewater River
Watershed, Wayne, Union, Fayette, and Franklin Counties.
Indiana Department of Environmental Management (IDEM), 2013. Water quality targets.
Available at < http://www.in.gov/idem/nps/3484.htm>. Accessed in October 2013.
Johnson, S. L., Whiteaker, T., Maidment, D. R., 2009. A tool for automated load duration
curve creation. Journal of the American Water Resources Association 45(3): 654-663.
Kieser & Associates, February 2007. Modeling of agricultural BMP scenarios in the Paw
Paw River Watershed using the Soil and Water Assessment Tool (SWAT).
Kim. G., Yoon, J., 2011. Development and application of total coliform load duration
curve for the Geum River, Korea. Korean Society of Civil Engineers, Journal of
Civil Engineering 15(2), 239-244.
Kim. J., Engel, B. A., Park, Y. S., Theller, L., Chaubey, I., Kong, D. S., Lim, K. J., 2012.
Development of Web-based Load Duration Curve System for analysis of total
maximum daily load and water quality characteristics in a waterbody. Journal of
Environmental Management 97, 46-55.
King, D., Hagan, P., October 2011. Costs of stormwater management practices in
Maryland Counties. University of Maryland Center for Environmental Science.
98
98
May, L., House, W. A., Bowes, M., McEvoy, J., 2001. Seasonal export of phosphorus
from a lowland catchment: upper River Cherwell in Oxfordshire, England. The
Science of the Total Environment 269, 117-130.
Minnesota Pollution Control Agency, June 2010. Rabbit River turbidity total maximum
daily load report.
New Hampshire Department of Environmental Services, April 2008. Total maximum
daily load (TMDL) study for waterbodies in the Vicinity of the I-93 Corridor from
Massachusetts to Manchester, NH: North Tributary to Canobie Lake in Windham,
NH.
Pertsova, C. C., 2007. Ecological Economics Research Trends. Chapter 3. A recent trend
in ecological economic research: Quantifying the benefits and costs of improving
ecosystem services, 57.
Runkel, R. L., Crawford, C. G., Cohn, T. A., 2004. Load estimator (LOADEST): A
Fortran program for estimating constituent loads in streams and rivers. U.S.
Geological Survey Techniques and Methods, Book 4, Chap. A5
Sigua, G. C., Tweedale, W. A., 2003. Watershed scale assessment of nitrogen and
phosphorus loadings in the Indian River Lagoon basin, Florida. Journal of
Environmental Management 67, 363-372.
Shen, J., Zhao, Y., 2010. Combined Bayesian statistics and load duration curve method
for bacteria nonpoint source loading estimation. Water Research 44, 77-84.
Tennessee Department of Environment and Conservation. July 2005. Total maximum
daily load (TMDL) for low dissolved oxygen & nutrients in the Upper Duck River
Watershed (HUC 06040002) Bedford, Coffee, Marshall, & Maury Counties,
Tennessee.
99
99
Tetra Tech Inc., 2011. User’s guide spreadsheet tool for the estimation of pollutant load
(STEPL) version 4.1. Tetra Tech, Inc. 10306 Eaton Place, Suite 340 Fairfax, VA
22003.
Texas Institute for Applied Environmental Research, March 2010. Technical support
document for bacteria TMDLs Segment 0822A-Cottonwood Branch & Segment
0822B-Grapecine Creek.
USEPA, 1999. Preliminary data summary of urban storm water best management
practices, August 1999. United States Environmental Protection Agency, Office of
Water (4303) Washington.
USEPA, April 2005. National management measures to control nonpoint source pollution
from forestry. U. S. Environmental Protection Agency. Washington, DC 20460.
USEPA, 2007. An approach for using load duration curves in the development of
TMDLs. U. S. Environmental Protection Agency. Washington, DC 20460.
USEPA, 2008. Handbook for developing watershed TMDLs. U. S. Environmental
Protection Agency. Washington, DC 20460.
Wisconsin Department of Natural Resources (Wisconsin DNR), March 2003. Alum
treatments to control phosphorus in lakes.
100
100
CHAPTER 5. A WEB MODEL TO ESTIMATE THE IMPACT OF BEST
MANAGEMENT PRACTICES
5.1 Abstract
The Spreadsheet Tool for the Estimation of Pollutant Load (STEPL) can be used for
Total Maximum Daily Load (TMDL) processes, because the model is capable of
simulating impacts of various best management practices (BMPs) and low impact
development (LID) practices. The model computes annual direct runoff using the Soil
Conservation Service Curve Number (SCS-CN) method with average rainfall per event,
but this is not a typical use of the SCS-CN method. Five SCS-CN based approaches to
compute annual direct runoff were investigated to explore estimated differences in annual
direct runoff computations using daily precipitation data collected from the National
Climate Data Center and generated by the CLIGEN model for twelve stations in Indiana.
Compared to the annual direct runoff computed for the conventional use of the SCS-CN
method, the approach used to estimate annual direct runoff within EPA STEPL showed
large differences. A web-based model (STEPL WEB) was developed with an updated
approach to estimate annual direct runoff. Moreover, the model was integrated with the
Web-based Load Duration Curve Tool which identifies least cost BMPs for each landuse
and optimizes BMP selection to identify the most cost-effective BMP implementations.
The integrated tools provide an easy to use approach to performing TMDL analysis and
identifying cost effective approaches to controlling nonpoint source pollution.
101
101
5.2 Introduction
Section 303(d) of the Clean Water Act requires states and other defined authorities to
develop lists of impaired rivers and streams that have seriously contaminated water. They
need to establish priority rankings for waters on the lists and to develop Total Maximum
Daily Loads (TMDLs). Various models have been used not only to develop TMDLs but
also to perform analyses to identify strategies to attain pollutant load limits with plans
typically identifying Best Management Practices (BMPs) to reduce loads (Kang et al.,
2006; Patil and Deng, 2011; Pease et al., 2010; Richards et al., 2008).
One such model is the Spreadsheet Tool for the Estimation of Pollutant Load (STEPL)
that computes annual direct runoff, sediment load, nutrient loads, and 5-day biological
oxygen demand (BOD5) (Tetra Tech, 2011). The model is capable of estimating annual
non-point source (NPS) pollutant loads, and in addition, the model allows estimation of
impacts of various BMP implementations and Low Impact Development (LID) practices
so that pollutant load reduction for BMPs or LIDs can be computed (Commonwealth
Biomonitoring, 2009; FDEP, 2009; Keegstra et al., 2012; Tetra Tech, 2011). EPA STEPL
requires landuse and Hydrologic Soil Group (HSG) to define Curve Number (CN) as it
computes direct runoff in a watershed based on the Soil Conservation Service Curve
Number (SCS-CN) method (USDA 1985). Landuse categories in the model are urban,
cropland, pastureland, forest, and user defined.
However, the model uses an approach that is not the conventional approach used to
calculate annual direct runoff, and the annual direct runoff is a key parameter in
102
102
estimating annual NPS pollutant loads. Thus, there is a need to explore the reliability and
consistency of the runoff, pollutant loads, and BMP impacts predicted by EPA STEPL.
The first unconventional process in EPA STEPL is the use of precipitation to compute
direct runoff based on ‘average rainfall per event’ calculated by ‘Rainfall’, ‘Rain Days’,
‘Precipitation Correction Factor’, and ‘Number of Rain Days Correction Factor’ for each
county (Tetra Tech, 2011). However, the SCS-CN method is typically used to simulate
event- or daily-based direct runoff using specific daily rainfall in hydrologic models
(Arnold et al., 1998; Knisel, 1980; Lim et al., 2006; Williams and LaSeur, 1976,
Williams et al., 2000). EPA STEPL calculates direct runoff for a rainfall value (i.e.
‘average rainfall per event’) with the SCS-CN method, and then the model multiplies the
direct runoff from that rainfall by ‘Rain Days’ and ‘Number of Rain Days Correction
Factor’ to reproduce annual direct runoff. This approach using average rainfall values
may not accurately reproduce long-term annual direct runoff because the relationships
between CNs and rainfall are not linear (Tedela et al., 2012; USDA, 1986).
The second unconventional process employed in STEPL is the selection of the default
initial abstraction coefficient (λ) for the SCS-CN method. The SCS-CN method equations
are as follows (USDA, 1986).
Eq. 5.1
Eq. 5.2
Eq. 5.3
103
103
Eq. 5.4
Where, Q is direct runoff (mm), P is precipitation (mm), Ia is initial abstraction (mm), and
λ is initial abstraction coefficient.
The initial abstraction coefficient typically varies from 0.0 to 0.2 (Baltas et al., 2007; Shi
et al., 2009). The CN tables published and currently used typically assume the initial
abstraction coefficient is 0.2. If a different initial abstraction coefficient is used, the CN
values need to be adjusted (Lim et al., 2006; Woodwart et al., 2003). The EPA STEPL
default for the initial abstraction coefficient is 0.0 while the denominator of equation 5. 1
in EPA STEPL is fixed as “P + 0.8 S” which is inconsistent with a default initial
abstraction coefficient of 0.0. Further, STEPL does not adjust CNs when the initial
abstraction is updated by users. Although the EPA STEPL model allows changing the
initial abstraction coefficient, if the user leaves it at the default value, this will likely
result in overestimation of runoff and therefore overestimation of annual pollutant loads,
because the direct runoff calculated with a value of 0.0S for initial abstraction with a
fixed denominator is greater than the direct runoff calculated with a value of 0.2S for
initial abstraction. Most significantly, it is incorrect to use a modified initial abstraction
value with the CNs provided in standard references, as these CN values are for an initial
abstraction of 0.2S (USDA 1986). However, EPA STEPL estimates annual direct runoff
using an initial abstraction of 0.0S and the CNs provided in standard references. If an
adjustment of initial abstraction is made, CN and S both need to be adjusted based on the
change to initial abstraction. This is because the initial abstraction coefficient was
empirically determined to be 0.2 (USDA, 1986), and equations 5.1-4 need to be modified
104
104
when other initial abstraction coefficient values are used (Lim et al., 2006; Woodward et
al., 2003).
EPA STEPL is a spreadsheet model that can be used for annual pollutant load estimation
and simulation of BMP impacts. Our interest in EPA STEPL was driven by our search for
a model capable of simulating BMPs for use with the Web-based Load Duration Curve
Tool (Web-based LDC Tool; https://engineering.purdue.edu/~ldc/). The Web-based LDC
Tool identifies pollutant loads exceeding standards and computes the required pollutant
reduction to meet the standard loads. Moreover, the Web-based LDC Tool identifies
BMPs with the least cost for each landuse, but the BMPs and area to which these
practices should be applied need to be optimized for cost-effective BMP implementation.
Therefore, the objectives of the study were: 1) to examine and correct the annual direct
runoff approach in EPA STEPL, and 2) to develop a web-based model capable of
simulating pollutant load reductions for BMPs.
5.3 Methodology
The study had two purposes, one was to examine the annual direct runoff approach in
EPA STEPL to identify the impacts of its assumptions on runoff computations and
corresponding pollutant loads. And the other was to develop a web-based model that used
a load duration curve tool and the corrected STEPL for identifying appropriate BMPs and
simulating their effects on pollutant load reduction. Five approaches to estimate annual
direct runoff with the SCS-CN method were explored. Daily precipitation data from
twelve National Climate Data Center (NCDC, www.ncdc.noaa.gov) stations were
105
105
collected for conducting the analyses. The first approach was to obtain annual runoff by
aggregating daily direct runoff computed by the SCS-CN method and NCDC daily
precipitation data. The second and third approaches represented current EPA STEPL
approaches using values of 0.0 and 0.2 for initial abstraction coefficients. In the fourth
approach, daily precipitation data were generated by the CLIGEN (Nicks and Lane, 1989)
model, and annual direct runoff was computed from daily direct runoff obtained using
daily precipitation data. The fifth approach was using the EPA STEPL model.
For the second objective of the study, a web-based model was developed based on a
corrected EPA STEPL model. The web-based model provides web interfaces and
employs the CLIGEN model to generate daily precipitation data. Modules to calibrate
model parameters and to optimize BMPs were developed and integrated in the web-based
model.
5.3.1 Annual Direct Runoff Computations
Daily precipitation data were collected from the NCDC to explore the approaches to
compute annual direct runoff using measured daily precipitation data. Daily precipitation
data were generated with the CLIGEN model using the inputs collected from the United
States Department of Agriculture (www.ars.usda.gov/Research/docs.htm?docid=18094)
(Table 5.1, Figure 5.1). Twelve NCDC stations providing long-term daily data were
selected within Indiana, and twelve CLIGEN stations were selected which are
geographically identical to the NCDC stations.
106
106
Five approaches to compute annual direct runoff were established. The approaches were
to investigate the annual direct runoff differences between the methods 1) using daily
precipitation data with equations 5.1~4) using average rainfall per event with equations
5.5~7 (described below) and an initial abstraction coefficient of 0.0 (i.e. original EPA
STEPL method), 3) using average rainfall per event with equations 5.5~7 and initial
abstraction coefficient of 0.2 (i.e. corrected EPA STEPL method), 4) using daily
precipitation data generated by the CLIGEN model with equations 1~4, and 5) using the
EPA STEPL model.
Eq. 5.5
Eq. 5.6
Eq. 5.7
The first approach used equations 5.1~4 to compute daily direct runoff depth with daily
precipitation data from NCDC with an initial abstraction coefficient of 0.2. In other
words, the approach was to represent the general use of the SCS-CN method to compute
annual direct runoff depth.
The second and third approaches were to represent the annual direct runoff depth
computations in EPA STEPL. The EPA STEPL database related to precipitation data was
not used for the second and third approaches, thus preventing possible annual direct
runoff differences due to precipitation data differences, because the purpose of this step
107
107
was comparison of approaches. If the EPA STEPL database had been used, the
approaches would have been incomparable, because different precipitation data would
have led to different annual direct runoff estimates. Therefore, the required inputs (i.e.
annual rainfall, rain days, and correction factors for equation 5.5) in the second and third
approaches were computed based on daily precipitation data collected from NCDC for
each location. Rainfall correction factors are the percentage of annual precipitation
greater than 5 millimeters; the factors were computed by dividing the sum of daily
precipitation values greater than 5 millimeters by sum of daily precipitation values. Rain
day correction factors are the percentage of rain days that have precipitation greater than
5 millimeters (Tetra Tech, 2011). Average rainfall per event (ARE) was computed using
equation 5. An initial abstraction coefficient of 0.0 was used for equation 6 in the second
approach to compute annual direct runoff depth, while the initial abstraction coefficient
in the third approach was 0.2.
The fourth approach to compute annual direct runoff used equations 1~4 with daily
precipitation data generated by CLIGEN. The fourth approach was identical to the first
approach, except that daily precipitation data generated by CLIGEN were used in the
fourth approach. The EPA STEPL model, the fifth approach, was used to compute annual
direct runoff with the model database providing annual rainfall, rainfall correction factor,
rain days, and rain days correction factor for the counties in which NCDC stations are
located.
108
108
Table 5.1 Daily Precipitation Collection from NCDC
Station Number Station Name Period
USC00120676 Berne WWTP 1949-1999
USC00121747 Columbus 1921-2010
USC00121869 Crane NSA 1943-1959
USW00014848 South Bend Michiana Regional Airport 1948-2012
USC00128999 Valparaiso Waterworks 1985-2000
USC00129138 Wabash 1989-2004
USC00129430 West Lafayette 6 NW 1989-2012
USC00129678 Winchester AAP 3 1989-2012
USC00121229 Cambridge City 3 N 1975-1992
USC00123547 Greensburg 1933-1941
USC00127125 Princeton 1 W 1898-1952
USC00127875 Scottsburg 1897-2000
109
109
Figure 5.1 Location Map of NCDC and CLIGEN Stations within Indiana
5.3.2 Web Interfaces and CLIGEN Use
The EPA STEPL model includes a database to provide the input parameters describing
precipitation and soil erosion estimation, and the model also has a user-friendly interface
within Microsoft Excel. A web-based model (STEPL WEB;
https://engineering.purdue.edu/~ldc/STEPL/) was developed following a similar overall
program flow to that used in EPA STEPL, which is to define subwatershed characteristics
110
110
and then to identify watershed inputs and BMPs in turn. However, HTML interfaces
instead of Visual Basic and MS Excel interfaces were created, with the core engine
programmed in the FORTRAN programming language to perform the calculations
computed in MS Excel in EPA STEPL. A number of Python CGIs and Java Script-based
HTML were programmed to handle inputs from databases.
EPA STEPL computes annual direct runoff using equations 5.5~7, but this was replaced
with two approaches in the web-based tool. One uses 0.2S for the initial abstraction,
because CNs commonly available and used are based on a value of 0.2S for the initial
abstraction. The other approach employs CLIGEN to generate long-term, daily
precipitation data. Zhang and Garbrecht (2003) reported that the CLIGEN model showed
reasonable means and standard deviations of daily precipitation amounts when 100 years
of precipitation data were generated. The model performed well with 20 years
precipitation data generation for annual amounts, monthly amounts, and number of
events (Elliot and Arnold, 2001). Lim and Engel (2003) employed the model to generate
climate data for a web-based model. Therefore, the web-based model employs CLIGEN
for daily precipitation data generation and provides 2,368 locations with CLIGEN inputs
collected from the United States Department of Agriculture
(http://www.ars.usda.gov/Research/docs.htm?docid=18094). The daily precipitation data
generated by CLIGEN were used to estimate annual direct runoff in STEPL WEB.
111
111
5.3.3 Auto-Calibration Modules
STEPL WEB computes annual direct runoff using the SCS-CN method, annual
contribution to shallow groundwater by soil infiltration fractions for precipitation, and
annual pollutant loads by pollutant coefficients multiplied by annual direct runoff and
groundwater (Figure 5.2). Sediment load is computed based on the Universal Soil Loss
Equation (USLE) and sediment delivery ratio (SDR, Equation 5.10 and 5.11; USDA-
NRCS, 1983). STEPL WEB has two sources of nutrient loads (N, P, and BOD). The first
source is the nutrient loads from landuses, which are computed by pollutant coefficients
and annual direct runoff and shallow groundwater contribution. The second source is
nutrient loads in sediment, which are computed by soil nutrient concentrations and
sediment load (Tetra Tech, 2011). Therefore, CNs and soil infiltration fractions should be
calibrated for annual direct runoff and annual shallow groundwater so that nutrient loads
are correctly computed. Since sediment load is computed by USLE and SDR, the SDR
(Equations 5.10 and 5.11) can be calibrated for sediment load. Pollutant coefficients also
need to be calibrated for nutrient loads.
Since CNs are defined by landuse and HSG, the relationships between CNs need to be
maintained. For instance, CNs for urban are typically greater than forest, and CNs for
HSG A are smaller than HSG D for a given landuse. One approach to calibrate CNs is to
multiply default CNs by an identical fraction (or percentage, Frcn in Equation 5.8); in
other words, CNs are increased or decreased by an identical percentage. Annual shallow
groundwater is computed based on the soil infiltration fractions for precipitation which
are defined by landuse and HSG. Therefore, the soil infiltration fractions for precipitation
112
112
can be calibrated by the approach used for calibrating CNs. The pollutant coefficients are
defined by landuse, and the calibration approach can be applied (Frnt,1 for N, Frnt,2 for P,
and Frnt,3 for BOD in Equation 5.9). USLE factors are defined by landuses in STEPL
WEB, with factors based on those from the EPA STEPL database. Since sediment loads
are computed by multiplying soil erosion (USLE factors) and SDR, sediment loads can
be calibrated by SDR calibration rather than by individual USLE factors (Park et al.,
2010); the fraction in equations 10 and 11 is calibrated for annual sediment load. The
fraction is multiplied by SDR, which also implies that USLE factors for soil erosion are
increased or decreased by an identical fraction.
Two modules were developed to calibrate the fractions/coefficients in equations 5.8-11.
One module uses a genetic-algorithm (GA), and the other uses the bisection method. The
algorithm is based on the principles of ‘natural evolution’ and ‘survival of the fittest’, and
sets up a population of individuals for a given problem, consisting of a stochastic strategy
which imitates the evolution of natural organisms (Holland, 1975; Lim et al., 2010).
Solving sophisticated problems effectively, the algorithm has been used for various areas
such as business, engineering, and science (Tog˘an and Dalog˘lu, 2008), and is deemed
as a powerful method to solve highly complex problems.
The alternative module to calibrate the fractions in equations 5.8-11 uses the bisection
method. The bisection method is a simple and straightforward numerical method, is
applicable to continuous functions, and has been applied to solve simple problems
(Ashkar and Mahdi, 2006; Hong et al., 2006; Neupauer and Brochers, 2001). The method
113
113
sets intervals and selects the midpoint which shows the least error during iteration,
narrowing the intervals. Initial intervals (e.g. 50% to 150% for CNs) need to be set, and
iterative computations are required until the error (e.g. difference between observed
annual direct runoff and estimated annual direct runoff) is zero or less than a specified
tolerance. The module in STEPL WEB performs the iterations until the intervals are in
the thousandth digits for the fractions in equation 5.8-11.
Eq. 5.8
Eq. 5.9
Eq. 5.10
Eq. 5.11
Where, Frcn is calibration parameter or fraction for CN, Frnt,i is calibration parameter for
pollutant coefficients, Frsdr is calibration parameter for SDR, and A is watershed area in
square kilometers.
114
114
Figure 5.2 Annual Direct Runoff, Groundwater, and Pollutant Load Computation
in STEPL WEB
5.3.4 Optimization of Best Management Practices
Since estimated annual cost of BMP implementation in a watershed is computed by BMP
cost per unit area and applied area of BMP (AREABMP), both BMP cost and AREABMP
need to be considered when identifying the most cost-effective BMP implementation. In
other words, the BMP with least cost per unit mass reduction (i.e. dollars per ton of
reduction) needs to be identified and applied, and then AREABMP needs to be minimized
as long as the estimated reduction meets the required reduction. In addition, use of a
BMP on 100% of landuse area may not be possible. For instance, it may not be possible
115
115
to apply a BMP to 90% of cropland, if the BMP is already applied on 30% of cropland. In
this circumstance, the BMP could only be applied to a maximum of 70% of cropland area.
STEPL WEB estimates BMP implementation cost based on establishment, maintenance,
and opportunity costs using a cost function (equation 5.12; Arabi et al., 2006). The model
computes the costs per unit of pollutant mass reduction for BMPs and establishes a
priority list of BMPs to apply based on the cost per unit mass of pollutant reduction.
[∑ ] Eq. 5.12
Where, ct is BMP implementation cost, c0 is establishment cost, rm is ratio of annual
maintenance cost to establishment cost, s is interest rate, and td is BMP design life.
After establishment of the BMP list, STEPL WEB optimizes AREABMP, increasing
AREABMP of the first BMP and estimating annual pollutant reductions for that BMP.
Iterative simulations are required with AREABMP increasing up to the allowable
maximum area (e.g. 100% or 70% for the circumstance stated above) until the estimated
pollutant reduction is greater than the required reduction. If estimated annual pollutant
reduction for the first BMP does not meet the required reduction, the second most cost
effective BMP needs to be simulated with AREABMP increased iteratively for the second
BMP. In brief, the BMP optimization process is an iterative simulation, adding BMPs in
turn and increasing AREABMP for each BMP.
116
116
5.4 Results
5.4.1 Annual Direct Runoff Computations
To explore differences in runoff computation approaches, a CN of 85 was selected as a
test case, and this represents cropland with a C hydrologic soil group in the EPA STEPL
model. Average annual direct runoff (depth, mm) values were computed with the five
approaches (Table 5.2, Figure 5.3). The annual direct runoff for the first approach (i.e.
General Use of SCS-CN method with daily precipitation data, GU) ranged from 117.4
mm (USC00120676) to 191.4 mm (USC00127125). However, the annual direct runoff
estimated by the second approach (i.e. Original EPA STEPL approach, OS) ranged from
242.5 mm (USW00014848) to 339.9 mm (USC00127125). Although identical
precipitation data were used for the approaches, the difference between the OS and GU
approaches was a minimum of 77.6 % (overestimated, USC00127125) and maximum of
111.8 % (overestimated, USC00120676).
The third approach (i.e. Corrected EPA STEPL with initial abstraction coefficient of 0.2,
CS) resulted in underestimation compared to the GU for all stations. Moreover, the
approach showed no direct runoff when the average rainfall per event for the stations was
smaller than 0.2S calculated by CN (equation 5.3), because average direct runoff depth
(equation 5.6) became 0.0 mm based on equation 5.2. Thus, CN needs to be greater than
a critical value (TC in table 5.2) to generate annual direct runoff greater than 0.0 mm. In
other words, there will be no annual direct runoff in the area with CN values smaller than
TC in Table 5.2.
117
117
Daily precipitation data generated by CLIGEN were used to compute daily direct runoff
in the fourth approach (CL). The difference between the CL and GU approaches was a
minimum of 0.9 % (overestimated, USC00121869) and maximum of 20.5 %
(underestimated, USC00128999). In addition, the annual direct runoff from the CL
approach demonstrated smaller differences than the annual direct runoff by the OS and
CS approaches.
The EPA STEPL model was used in the fifth approach (ST), and the estimated annual
direct runoff ranged from 190.7 mm (USW00014848) to 357.0 mm (USC00127125). The
difference between the GU and ST approaches was a minimum of 43.1 % (overestimated,
USC00128999) and maximum of 152.9 % (overestimated, USC00123547). Similar to the
OS approach, the ST results showed large differences relative to GU results. The OS and
ST approaches overestimated average annual direct runoff compared to the GU approach,
since the denominators in equation 5.1 for the OS and ST approaches were greater with
the initial abstraction coefficient of 0.0 than the denominator with the initial abstraction
coefficient of 0.2 for the GU approach.
The approaches were compared with other CNs for different landuses and HSGs.
Simulated runoff for other CN values displayed identical trends in average annual direct
runoff. In other words, annual direct runoff using the OS, CS, and ST approaches were
much greater compared to the annual direct runoff using the GU approach. The results
indicate that annual direct runoff needs to be the aggregate of daily direct runoff based on
daily precipitation data. In other words, the approach to compute annual direct runoff
118
118
using the SCS-CN method with average rainfall per event (i.e. the OS and CS approaches)
would lead to annual direct runoff that is much larger than values estimated when the CN
runoff method is applied as it was intended. Comparing the OS approach to the CS
approach, the OS approach overestimated annual direct runoff at all stations, because the
denominators in equation 6 for the OS approach were smaller than those for the CS
approach, since the initial abstraction coefficients were 0.0 for OS and 0.2 for CS. Even
though the approach using average rainfall per event was corrected (CS), the approach
resulted in underestimation compared to the GU approach. In addition, the CS approach
estimated annual direct runoff greater than 0 mm only when CN was greater than a
critical value (TC), and therefore there won’t be annual direct runoff in areas with small
CNs (e.g. forest). The ST approach resulted in not only overestimation of runoff but also
a large difference compared to runoff from the GU approach. Thus, it was concluded that
the approach using average rainfall per event currently used in EPA STEPL model is not
applicable for computing annual direct runoff.
The SCS-CN method was developed based on the relationship between rainfall and direct
runoff (USDA, 1985; USDA, 1986). SCS-CN is an empirical model composed of two
parameters which are CN and initial abstraction. The initial abstraction was empirically
determined to be 0.2S (Garen and Moore, 2005; USDA, 1986). The CN tables published
and currently used typically assume the initial abstraction coefficient is 0.2. The
equations in the SCS-CN method need to be modified when other initial abstractions are
used (Lim et al., 2006; Woodward et al., 2003). In addition, the SCS-CN method using an
initial abstraction coefficient of 0.2 is typically used to simulate daily-based direct runoff
119
119
using daily rainfall in hydrologic models (Arnold et al., 1998; Knisel, 1980; Lim et al.,
2006; Williams and LaSeur, 1976, Williams et al., 2000).
Since the SCS-CN method is an empirical model, its assumptions must be maintained if it
is to provide accurate runoff estimates. Therefore, the initial abstraction needs to be 0.2S
for the CN values commonly used, otherwise CNs need to be adjusted for other initial
abstraction values. The method must also be used for event- or daily-based direct runoff
with event- or daily-based rainfall. In other words, annual direct runoff needs to be
computed by aggregating daily direct runoff obtained using daily rainfall. If the
assumptions are not maintained in use of SCS-CN method, large differences in results
compared to those for general use of SCS-CN method may result. For instance, the OS
approach used a different initial abstraction (i.e. 0.0S) and did not adjust CNs, and thus
the approach resulted in overestimation compared to the GU approach. When the initial
abstraction coefficient was corrected (CS), annual direct runoff resulted in
underestimation compared to the GU approach, because the CS approach used a single
rainfall value (average rainfall per event) for annual direct runoff estimation.
120
120
Table 5.2 Annual Precipitation and Direct Runoff Computations
Station
Precipitation
(mm) Annual Direct Runoff Depth (mm)
TC8
PN1 PC
2 GU
3 CL
4 CS
5 OS
6 ST
7
USC00120676 963.8 940.7 117.4 107.4 41.6 248.6 216.0 78
USC00121747 1077.0 1027.1 164.9 147.9 68.0 309.4 327.8 76
USC00121869 1128.1 1146.3 187.7 189.3 76.1 334.4 331.8 76
USW00014848 970.7 950.5 121.1 114.5 40.2 242.5 190.7 78
USC00128999 1008.6 958.9 152.6 121.4 56.0 274.4 218.4 76
USC00129138 1025.7 925.5 143.8 120.8 54.5 280.4 215.2 77
USC00129430 996.0 935.1 145.3 125.1 55.6 274.8 215.2 77
USC00129678 931.3 951.8 124.8 122.6 45.8 250.8 216.2 77
USC00121229 1044.5 1027.8 141.1 130.3 55.6 284.4 249.1 77
USC00123547 994.8 1047.9 129.6 144.1 48.4 263.9 327.8 77
USC00127125 1080.4 1069.2 191.4 176.4 85.6 339.9 357.0 75
USC00127875 1097.9 1078.6 173.0 161.6 69.6 320.8 335.7 76 1PN: Annual Precipitation from NCDC
2PC: Annual precipitation from CLIGEN
3GU: Annual direct runoff by daily direct runoff computation
4CL: Annual direct runoff with daily precipitation data generated by CLIGEN
5CS: EPA STEPL with corrected initial abstraction of 0.2S
6OS: Original EPA STEPL
7ST: Annual direct runoff by EPA STEPL model
8TC: Curve number threshold to generate direct runoff by CS
121
121
Figure 5.3 Comparison of Annual Direct Runoff by Different Approaches
5.4.2 Application of STEPL WEB
To demonstrate the BMP simulation ability of STEPL WEB, a watershed named
Tippecanoe River at North Webster in northeastern Indiana (Figure 5.4) was selected.
The spatial input datasets to delineate the watershed and to prepare STEPL WEB inputs
were the 30 meter resolution Digital Elevation Model (DEM) from the United States
Geological Survey (USGS) National Elevation Dataset, the National Land Cover Dataset
2006 (NLCD 2006) from USGS, and Soil Survey Geographic Database (SSURGO) from
United States Department of Agriculture (USDA). The watershed area is 129.1 km2, with
61.3 % of the watershed landuse being cropland (Figure 5.4, Table 5.3).
122
122
The Web-based LDC Tool was used to collect flow data from the USGS station number
03330241 (Tippecanoe River at North Webster, Indiana). Total phosphorus data were
collected from the Indiana Department of Environmental Management (IDEM) at the
same location. The period selected to develop a load duration curve was from 1998-03-25
to 2010-11-17, based on water quality data period. The standard concentration for total
phosphorus was set to 0.08 mg/l (IDEM, 2013). Annual direct runoff, annual baseflow,
and annual sediment load were computed with the Web-based LDC Tool. Nutrient loads
in STEPL WEB are computed based on loads from runoff as well as sediment, and
therefore annual sediment load is required to calibrate model parameters for annual
nutrient loads. LOADEST (Runkel et al., 2004) was used for annual sediment load
calculation. The model parameters in STEPL WEB were calibrated (Table 5.4, Table 5.5)
using the auto-calibration module using the bisection method. The Web-based LDC Tool
identified the required phosphorus pollutant reduction percentage to be 11% to meet the
standard load. STEPL WEB then established cost-effective BMP lists based on least cost
per unit of pollutant reduction. The most cost-effective BMP was ‘filter strip’ for
cropland, ‘reduced tillage systems’ and ‘contour farming’ were the second and third most
cost-effective BMPs. The fourth and fifth most cost-effective BMPs were ‘vegetated
filter strips’ and ‘bioretention facility’ for urban.
Two scenarios for BMP application in the watershed were simulated. One was the
application of ‘filter strip’ on up to 79 km2 (100% of the cropland area). The other was
application of ‘filter strip’ on up to 10 km2 in cropland area, ‘reduced tillage systems’ on
up to 10 km2 of cropland, ‘contour farming’ was considered not applicable, and
123
123
‘vegetative filter strips’ were possible on up to 5 km2 in urban area. In the first scenario,
‘filter strip’ for cropland needed to be applied to 17 km2 of cropland to reach the pollutant
reduction goal, with an estimated annual cost of $12,870. In the second scenario, the
management practice ‘filter strip’ needed to be applied to 10 km2 of cropland, ‘reduced
tillage systems’ needed to be applied to 10 km2 of cropland, and ‘vegetative filter strips’
needed to be applied to 4 km2 of urban as the optimal solution. The estimated annual cost
was $17,400 which resulted from $7,650 for ‘filter strip’ which provided 147.5 kg/year
phosphorus reduction in cropland, $7,710 for ‘reduced tillage systems’ with 95.6 kg/year
phosphorus reduction in cropland, and $2,040 for ‘vegetative filter strips’ with 1.0
kg/year phosphorus reduction in urban. Both scenarios met the required reduction;
however, the estimated annual cost of the second scenario was more expensive than the
first scenario, but the first scenario may not be feasible if ‘filter strip’ application is not
possible to implement on 17 km2 or more of cropland area.
124
124
Figure 5.4 Landuses of Tippecanoe River at North Webster Watershed
Table 5.3 Landuse Distribution for Study Watershed
Landuse Area (km2) Percentage (%)
Urban 10.4 8.1
Cropland 79.2 61.3
Pasture 2.7 2.1
Forest 21.1 16.3
Water 15.7 12.2
Total 129.1 100.0
125
125
Table 5.4 STEPL WEB Parameters (Default / Calibrated)
HSG A B C D
Curve
Number
Urban 83 / 90 89 / 97 92 / 98 93 / 98
Cropland 67 / 73 78 / 85 85 / 92 89 / 97
Pastureland 49 / 53 69 / 75 79 / 86 84 / 91
Forest 39 / 42 60 / 65 73 / 79 79 / 86
Soil
Infiltration
Fraction
Urban 0.36 / 0.31 0.24 / 0.21 0.12 / 0.01 0.06 / 0.05
Cropland 0.45 / 0.39 0.30 / 0.26 0.15 / 0.13 0.08 / 0.07
Pastureland 0.45 / 0.39 0.30 / 0.26 0.15 / 0.13 0.08 / 0.07
Forest 0.45 / 0.39 0.30 / 0.26 0.15 / 0.13 0.08 / 0.07
Pollutant
Coefficient
Phosphorus
(mg/l)
Urban 0.30 / 0.18
Cropland 0.50 / 0.30
Pastureland 0.30 / 0.18
Forest 0.10 / 0.06
Sediment
Delivery
Ratio
/
Table 5.5 Annual Direct Runoff, Baseflow, and Sediment load
Measured Predicted
Direct Runoff 16.7 × 106 m
3/year 16.7 × 10
6 m
3/year
Baseflow 25.9 × 106 m
3/year 25.6 × 10
6 m
3/year
Sediment 237 ton/year 242 ton/year
Phosphorus 2.3 ton/year 2.1 ton/year
5.5 Conclusions
Protection of water quality in streams and rivers is important and one regulatory approach
to protecting these resources is use of TMDLs. Many models are used to develop TMDLs
and to simulate the ability of BMPs to reduce pollutant loads to meet TMDLs. The EPA
STEPL model is used for TMDL assessment and is capable of estimating sediment load,
nutrient loads, and BOD5. In addition, the model is used to estimate annual pollutant load
reductions for various BMPs. The model employs SCS-CN methods to estimate annual
direct runoff. However, the model uses an unconventional approach for annual direct
126
126
runoff calculation. The initial abstraction was empirically found to be 0.2S in the original
development of the CN runoff method, and thus the equations in the SCS-CN method
need to be modified when other initial abstraction coefficients are used. In addition, CNs
are typically used to compute event- or daily-based direct runoff. Annual direct runoff
using the EPA STEPL approach showed large differences compared to the annual direct
runoff computed by general use of the SCS-CN method. Annual direct runoff computed
from generated daily precipitation data from CLIGEN showed smaller differences than
values computed from EPA STEPL approaches.
A web-based model was developed based on the corrected EPA STEPL model, which
employs the CLIGEN model to generate daily precipitation data and to compute annual
direct runoff from daily direct runoff. The web-based model provides HTML interfaces
for watershed inputs and a map-based interface for CLIGEN stations. In addition, the
model was integrated with the Web-based LDC Tool so that suggested BMP scenarios
can be simulated. Since the BMPs suggested by the Web-based LDC Tool need to be
optimized, STEPL WEB establishes a priority list of BMPs based on implementation cost
per mass of pollutant reduction, and then the model performs iterative simulations to
identify the most cost-effective BMP implementation plans. The web-based model will
be useful for conducting BMP simulations to meet TMDL standard loads.
127
127
5.6 References
Arnold, J. G., Srinivasan, R., Muttiah, R. S., and J. R. Williams, 1998. Large area
hydrologic modeling and assessment – part 1: model development. Journal of the
American Water Resources Association 34(1): 73-89.
Arabi, M., Govindaraju R. S., Hantush, M. M., 2006. Cost-effective allocation of
watershed management practices using a genetic algorithm. Water Resources
Research 42, W10429, DOI: 10.1029/2006WR004931.
Ashkar, F. and S. Mahdi. 2006. Fitting the log-logistic distribution by generalized
moments. Journal of Hydrology 328. 694-703.
Baltas, E. A., Dervos, N. A., and M. A. Mimikou. 2007. Technical Note: Determination
of the SCS initial abstraction ratio in an experimental watershed in Greece.
Hydrology and Earth System Science. 11: 1825-1829.
Commonwealth Biomonitoring. 2009 January. Little River watershed diagnostic study.
Commonwealth Biomonitoring, Indianapolis, Indiana.
Elliot, W. J. and C. D. Arnold. 2001. Validation of the Weather Generator CLIGEN with
Precipitation Data from Uganda. Transactions of the ASAE. 44(1): 53-58.
FDEP (Florida Department of Environmental Protection). 2009 September. State of
Florida FY2010 section 319(h) grant work plan. 3900 Commonwealth Boulevard
M.S. 49 Tallahassee, Florida 32399.
Garai, G. and B. B. Chaudhuri. 2007. A distributed hierarchical genetic algorithm for
efficient optimization and pattern matching. The Journal of the Pattern Recognition
40. 212-228.
128
128
Garen, D. C. and D. S. Moore. 2005. Curve number hydrology in water quality modeling:
uses, abuses, and future directions. Journal of the American Water Resources
Association.41: 377-388.
Gusselin, L., Tye-Gingras, M., and F. Mathieu-Potvin. 2009. Review of utilization of
genetic algorithms in heat transfer problems. International Journal of Heat and Mass
Transfer 52, 2169–2188.
Holland, J. H., 1975. Adaptation in Natural and Artificial Systems. University of
Michigan Press, Ann Arbor, MI. 183.
Hong, Y., Yeh, N., and J. Chen. 2006. The simplified methods of evaluating detention
storage volume for small catchment. Ecological Engineering 26. 355-364.
Indiana Department of Environmental Management (IDEM), July 2007. Total maximum
daily load for Escherichia coli (E. coli) for the East Fork Whitewater River
Watershed, Wayne, Union, Fayette, and Franklin Counties.
Indiana Department of Environmental Management (IDEM), 2013. Water quality targets.
Available at < http://www.in.gov/idem/nps/3484.htm>. Accessed in October 2013.
Kang, M. S., Park, S. W., Lee, J. J., and K. H. Yoo. 2006. Applying SWAT for TMDL
Programs to a Small Watershed Containing Rice Paddy Fields. Agricultural Water
Management. 79, 72-92.
Keegstra, N., Parks, J., and L. V. Linden. 2012 May. Whiskey Creek final report. Calvin
College, 201 Burton SE Grand Rapids, MI 49546
Knisel, W. G. 1980. CREAMS, a field-scale model for chemicals, runoff, and erosion
from agricultural management systems. Conservation Report 26, USDA Agriculture
Research Service, Washington, DC
129
129
Lim, K. J. and B. A. Engel. 2003. Extension and Enhancement of National Agricultural
Pesticide Risk Analysis (NAPRA) WWW Decision Support System to Include
Nutrients. Computers and Electronics in Agriculture. 38: 227-236.
Lim, K. J., Engel, B. A., Tang, Z., Muthukrishnan, S., Choi, J., and K. Kim. 2006. Effects
of calibration on L-THIA GIS runoff and pollutant estimation. Journal of
Environmental Management. 78: 35-43.
Lim, K. J., Park, Y. S., Kim, J., Shin, Y. C., Kim, N. W., Kim, S. J., Jeon, J. H., and B. A.
Engel. 2010. Development of genetic algorithm-based optimization module in
WHAT system for hydrograph analysis and model application. Computers &
Geosciences. 36: 936-944.
Neupauer, R. M. and B. Borchers. 2001. A MATLAB implementation of the minimum
relative entropy method for linear inverse problems. Computers & Geosciences 27.
757-762.
Nicks, A. D. and L. J. Lane. 1989. Weather Generator. Chapter 2 in USDA-Water
Erosion Prediction Project: Hillslope Profile Version. L. J. Lane, and M. A. Nearing,
ed. NSERL Report No. 2. West Lafayette, Ind.: USDA–ARS National Soil Erosion
Research Laboratory.
Park, Y. S., Kim, J., Kim, N. W., Kim, S. J., Jeon, J. H., Engel, B. A., Jang, W., and K. J.
Lim. 2010. Development of new R, C, and SDR modules for the SATEEC GIS
system. Computers & Geosciences 36(6), 726-734.
Patil, A and Z. Deng. 2011. Bayesian Approach to Estimating Margin of Safety for Total
Maximum Daily Load Development. Journal of Environmental Management. 92,
910-918.
130
130
Pease, L. M., Oduor, P., and G. Padmanabhan. 2010. Estimating Sediment, Nitrogen, and
Phosphorus Loads from the Pipestem Creek Watershed, North Dakota, using
AnnAGNPS. Computers & Geosciences. 36, 282-291.
Richards, C. E., Munster, C. L., Vietor, D. M., Arnold, J. G., and R. White. 2008.
Assessment of a Turfgrass Sod Best Management Practice on Water Quality in a
Suburban Watershed. Journal of Environmental Management. 86, 229-245.
Runkel, R. L., Crawford, C. G., Cohn, T. A., 2004. Load estimator (LOADEST): A
Fortran program for estimating constituent loads in streams and rivers. U.S.
Geological Survey Techniques and Methods, Book 4, Chap. A5
Shi, Z., Chen, L., Fang, N., Qin, D., and C. Cai. 2009. Research on the SCS-CN initial
abstraction ratio using rainfall-runoff event analysis in the Three Gorges Area, China.
CATENA. 77(1-7).
Tedela, N. H., McCutcheon, S. C., Rasmussen, T. C., Hawkins, R. H., Swank, W. T.,
Campbell, J. L., Adams, M. B., Jackson C. R., E. W. Tollner. 2012. Runoff curve
numbers for 10 small forested watersheds in the mountains of the eastern United
States. Journal of Hydrologic Engineers. 17:1188-1198.
Tetra Tech, Inc. 2011. User’s Guide Spreadsheet tool for the estimation of pollutant load
(STEPL) version 4.1. Tetra Tech, Inc. 10306 Eaton Place, Suite 340 Fairfax, VA
22003.
Tog˘an, V., Dalog˘lu, T. A., 2008. An improved genetic algorithm with initial population
strategy and self-adaptive member grouping. Computers and Structures 86, 1204-
1218.
131
131
USDA-NRCS (U.S. Department of Agriculture, Natural Resources Conservation Service).
1983. Sediment sources, yields, and delivery ratios. In National Engineering
Handbook, Chapter 6, Section 3, Sedimentation.
USDA-NRCS (U.S. Department of Agriculture, Natural Resources Conservation Service).
1985. National Engineering Handbook, Section 4 Hydrology.
USDA-NRCS (U.S. Department of Agriculture, Natural Resources Conservation Service).
1986. Urban hydrology for small watersheds. Washington, DC
Williams J. R. and V. LaSeur. 1976. Water yield model using SCS CN curve numbers.
Journal of Hydraulic Engineering. 102(HY9): 1241-1253.
Williams J. R., Arnold, J. G., and R. Srinivasan. 2000. The APEX model. BRC Report
No. 00–06 Blackland Research and Extension Center, Texas Agricultural
Experiment Station, Texas A & M University System, Temple, TX.
Woodward, D.E., Hawkins, R.H., Jiang, R., Hjelmfelt, A.T., and J.A. Van Mullem. 2003.
Runoff Curve Number method: examination of the initial abstraction ratio. American
Society of Civil Engineers. Conference Proceeding Paper. World Water &
Environmental Resources Congress 2003 and Related Symposia.
Zhang, X. C. and J. D. Garbrecht. 2003. Evaluation of CLIGEN Precipitation Parameters
and Their Implication on WEPP Runoff and Erosion Prediction. Transactions of the
ASAE. 46(2): 311-320.
132
132
CHAPTER 6. SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS
6.1 Summary
Flow and load duration curves (FDC and LDC) have been used in many states for total
maximum daily load (TMDL) implementations. A web-based LDC Tool was developed
to facilitate development of FDC and LDC. The Web-based LDC Tool provided various
benefits in development of FDC and LDC, deriving streamflow data from the USGS data
server and integrating with LOADEST and LOADIN to generate daily pollutant loads
from intermittent water quality data. Therefore, there was a need to evaluate the
predictive ability of LOADEST and LOADIN for annual pollutant loads. The Web-based
LDC Tool requires that users prepare water quality data which are essential inputs in
LDC development. Moreover, a model capable of simulating BMPs was required for the
Web-based LDC Tool, since the Web-based LDC Tool is not capable of simulating
BMPs to meet standard pollutant loads.
Therefore, this research was conducted to explore pollutant load regression model
behaviors in estimating annual pollutant loads and to develop web-based tools supporting
TMDL implementations. The specific objectives of the dissertation were to:
1. Evaluate water quality sampling frequency strategies and 10 regression models to
estimate annual pollutant load. The regression models evaluated in this study were
133
133
nine regression models of LOADEST (numbered 1 to 9) and the LOADIN regression
model using a genetic algorithm.
2. Identify the correlation between LOADEST model behavior and water quality datasets
for various proportions of water quality data from storm events.
3. Improve the Web-based LDC Tool to allow collection of USGS and STORET/WQX
WQ data via web access.
4. Enhance the Web-based LDC Tool to identify BMPs to reduce annual pollutant load
against required reduction of pollutant load.
5. Develop a web-based model to estimate annual pollutant loads and to simulate BMPs
to meet the required reduction of annual pollutant load.
Ten pollutant load regression models from LOADEST and LOADIN were evaluated with
six water quality sampling strategies for sediment, nitrogen, and phosphorus, under the
first objective. A measured ‘true load’ was required to evaluate pollutant load regression
models. Daily water quality data were collected from the USGS Water-Quality Data for
the Nation and the National Center for Water Quality Research of Heidelberg University.
The collected water quality data were artificially degraded with six sampling strategies.
The first three sampling strategies were fixed interval sampling strategies, and the other
three sampling strategies were fixed interval with storm event sampling. The terms
affecting the pollutant regression models’ behaviors were investigated since the daily
water quality parameters used in the study showed different relationships with
streamflow. The predictive ability of pollutant load regression models was evaluated for
accurate and precise annual pollutant load estimation.
134
134
The second objective was to identify the correlation between LOADEST model behavior
and water quality datasets. The regression model numbered 3 in LOADEST was selected
which showed the most accurate and precise annual sediment load estimates. The first
step was to identify the correlation between annual sediment load estimates and statistics
of water quality datasets. A regression equation was developed to determine the required
mean flow in the calibration data (MFC), since MFC was correlated to the error in annual
sediment load estimates. The second step was to evaluate the regression equation;
therefore the regression equation was applied to improve the poorest annual sediment
load estimates from the first objective. The objective demonstrated several distinct
features in annual sediment load estimation using LOADEST. One is that high sampling
frequency does not necessarily improve the accuracy and precision of estimated loads.
The other feature observed was that a water quality dataset needs to represent the
distribution of given data and should not to be biased toward a specific flow regime.
The Web-based LDC Tool was enhanced to allow collection of water quality data via
web access and to suggest best management practices (BMPs). The tool allowed use of
USGS streamflow via web access, but users had to supply water quality data. Water
quality data are an essential input in load duration curve (LDC) development with flow
data. Water quality data can be prepared by the users, or the data can be collected from
USGS, Environmental Protection Agency (EPA), or the Water Quality Portal (WQP).
Therefore, the web tool was upgraded to automatically access water quality data from the
USGS and STORET/WQX for all states in the U.S. via web access. LDC is one approach
used to develop total maximum daily loads (TMDLs) by identifying specific flow
135
135
regimes violating water quality standards. Exceeding pollutant loads in each flow regime
implies potentially different sources of pollutant loads. A BMP scenario needs to be
based on the flow regime in which pollutant loads are exceeded. Therefore, the web tool
was improved to select the BMPs able to reduce pollutant loads corresponding to the flow
regime for which pollutant loads are exceeded.
Various models have been used to identify strategies to attain pollutant load limits by
facilitating creation of plans identifying BMPs to reduce pollutant loads. Spreadsheet
Tool for the Estimation of Pollutant Load (STEPL) is a spreadsheet model to compute
annual direct runoff, sediment load, nutrient loads, and 5-day biological oxygen demand
(BOD5). The model computes direct runoff in a watershed based on the Soil Conservation
Service Curve Number (SCS-CN) method, but the model uses processes that are not
scientifically based in calculating annual direct runoff. Therefore, four approaches to
estimate annual direct runoff with the SCS-CN method were explored, including the
STEPL approach. The first approach was to obtain annual runoff by aggregating daily
direct runoff computed by the SCS-CN method and measured daily precipitation data.
The second and third approaches represented current STEPL approaches using values of
0.0 and 0.2 for initial abstraction coefficients. In the fourth approach, daily precipitation
data were generated by the CLIGEN model. In addition, a web-based model was
developed to simulate BMPs, and two modules were developed and integrated to
calibrate model parameters and to optimize BMPs for cost-effective BMP
implementations.
136
136
6.2 Conclusions
The research included two major parts. One was to explore the predictive ability of
pollutant load regression models, and the other was to enhance and to develop web-based
tools. The findings from analysis of pollutant load regression model behavior contribute
to improved accuracy of pollutant load regression models, and the web-based tools
enhanced and developed in the research contribute to the advancement of TMDL plan
development.
The results from this research showed that:
Use of extensive water quality data in regression models to estimate pollutant
loads did not necessarily lead to precise and accurate annual pollutant load
estimates. For instance, higher sampling frequency led to better precision in
sediment load estimates, but this did not occur in phosphorus load estimates.
Water quality data to estimate annual pollutant loads needs to consist of an
appropriate proportion of water quality data from storm events, 20-30% of water
quality data from high-flow (i.e. the upper 10 percent of flows for a given
analysis period) provided the closest estimated sediment and phosphorus loads to
measured loads. Extrapolation needs to be avoided in use of pollutant
concentrations within regression models for annual pollutant load estimates,
since the fixed sampling frequencies supplemented with stratified sampling
strategies led to more accurate and more precise pollutant load estimates than the
fixed interval sampling strategies. The water quality parameters showed different
relationships with streamflow, and therefore a regression model needs to be
employed based on the behaviors of water quality parameters.
137
137
The mean of flow to calibrate (MFC) regression model coefficients were
correlated to the error in annual sediment load estimations. Use of the water
quality dataset with appropriate MFC showed smaller errors than use of a large
amount of data (e.g. daily water quality data). The load estimates differed
significantly from the measured loads if MFC was too small or too great; in other
words, use of water quality data biased toward low or high flows led to great
error. The research indicates that a water quality dataset needs to represent the
distribution of given data.
The Web-based LDC Tool was upgraded to allow use of water quality data from
USGS and WQP providing the water quality for all location in the United States.
LOADEST is used for annual pollutant load estimation in the Web-based LDC
Tool. A module to adjust MFC of a given water quality dataset was developed
and integrated into the web-based tool based on the results from the first and
second objectives. The web-based tool identifies the required pollutant
reductions for five flow regimes and makes lists of BMPs corresponding to the
flow regime for which pollutant loads are exceeded.
The approach to compute annual direct runoff in EPA STEPL was examined.
Annual direct runoff using the EPA STEPL approach showed large differences
compared to the annual direct runoff computed by general use of SCS-CN
method. However, annual direct runoff computed from generated daily
precipitation data from CLIGEN showed smaller differences than values
computed from EPA STEPL approaches. Therefore, a web-based model to
simulate BMPs, or STEPL WEB (https://engineering.purdue.edu/~ldc/STEPL/),
138
138
was developed employing the CLIGEN model to generate daily precipitation
data and to compute annual direct runoff from daily direct runoff. For cost-
effective BMP implementation, selection of BMPs needs to be optimized.
Therefore, STEPL WEB establishes a priority list of BMPs based on
implementation cost per mass of pollutant reduction, and then the model
performs cumulative and iterative simulations to identify the most cost-effective
BMP implementation plans.
6.3 Recommendations for Future Research
The research identified pollutant load regression models’ behaviors by water quality
parameters and various sampling strategies. Web-based tools were developed to identify
required pollutant reductions, to simulate BMPs, and to optimize BMPs. During the study,
opportunities for future research were identified.
LOADEST was integrated with Web-based LDC Tools to estimate annual
pollutant loads. LOADEST can be used if water quality data are insufficient to
develop LDC. In addition, LOADEST can be used if a TMDL plan needs to be
based on annual pollutant loads (e.g. ton/year). The study was focused on
pollutant load regression models’ behaviors in annual pollutant load estimates.
Identification of the predictive ability of LOADEST for daily pollutant loads is
needed. For instance, the evaluation of the estimated loads to the measured loads
can be based on criteria (e.g. Nash-Sutcliffe Efficiency and Coefficient of
Determination) but also 90th percentile estimated loads for five flow regimes.
Since TMDL development can be daily-based or annual-based, LOADEST needs
139
139
to be tested to know if the pollutant load estimates by LOADEST are appropriate
for daily-based TMDL.
Three water quality parameters were selected in the study, which were nitrogen,
phosphorus, and sediment. LOADEST model 3 showed the most accurate and
precise annual sediment load estimates, and LOADIN was able to consider
seasonality of pollutant loads required for nitrogen. Therefore, it is suggested to
select or use LOADEST model 3 for annual sediment load estimates and
LOADIN for annual nitrogen load estimates. The collected daily phosphorus data
showed both flow-proportional and seasonal behaviors. Of course, both
LOADEST and LOADIN showed reasonable annual load estimates with
appropriate portion of water quality data from high-flow, however, it might be
required to explore phosphorus data characteristics affecting annual load
estimates more than flow-proportional and seasonal behaviors. Also, selecting
and using other pollutant regression models might be considered in exploring
phosphorus load estimates.
A web-based model to simulate BMPs, STEPL WEB, is used with the Web-
based LDC Tool. STEPL WEB is a lumped model to estimate annual direct
runoff, shallow groundwater, and nutrient loads. One of benefits to use the model
is the database for BMP efficiencies. The database can be used for other models
such as L-THIA. Therefore, the Web-based LDC Tool can be integrated with
other models allowing BMP simulations to reduce pollutant loads.
140
APPENDICES
140
140
Appendix A Statistics of USGS Stations
Table A 1 provides the statistics of flow data collected from USGS stations for Chapters
2 and 3. Drainage area were collected from USGS. Dominant landuse for each station
was determined based on the USGS Gap Analysis Program
(http://dingo.gapanalysisprogram.com/landcoverv2/).
‘Min.’, ‘Average’, ‘Max.’, and ‘Standard Deviation’ are the minimum flow, average flow,
maximum flow, and standard deviation (S. D.) of flow in cubic meter per second (cms)
for the flow data period.
141
Table A 1. Statistics of USGS Stations
USGS
Station State
Drainage
Area (km2)
Dominant Landuse Period Min.
(m3/s)
Average
(m3/s)
Max.
(m3/s)
S. D.
(m3/s)
09474000 AZ 46,648 Semi-Desert 1973-1973 0.01 0.41 2.45 0.28
09380000 AZ 289,561 Nonvascular & Sparse Vascular Rock Vegetation 1963-1964 0.56 2.56 16.04 2.73
09402500 AZ 366,742 Forest & Woodland 1969-1971 2.15 10.23 18.36 3.04
10336645 CA 19 Semi-Desert 1982-1991 0.00 0.01 0.33 0.03
10336660 CA 29 Semi-Desert 1986-1986 0.00 0.05 0.74 0.07
11525600 CA 80 Forest & Woodland 1999-2001 0.01 0.04 0.44 0.05
11048500 CA 108 Developed & Other Human Use 1978-1978 0.00 0.02 0.69 0.07
10336610 CA 142 Semi-Desert 1990-1990 0.00 0.03 0.16 0.04
11481500 CA 175 Forest & Woodland 1977-1978 0.00 0.12 1.68 0.21
11151870 CA 293 Shrubland & Grassland 1969-1969 0.00 0.29 7.63 0.79
11465200 CA 420 Shrubland & Grassland 1982-1983 0.00 0.55 6.41 0.94
11118500 CA 487 Shrubland & Grassland 1980-1980 0.00 0.15 6.69 0.58
11482500 CA 717 Forest & Woodland 1973-1974 0.01 0.97 13.15 1.56
11481000 CA 1,256 Forest & Woodland 1970-1972 0.00 1.31 27.42 2.86
11472150 CA 1,368 Forest & Woodland 1975-1975 0.00 0.86 19.24 2.26
11042000 CA 1,443 Shrubland & Grassland 1975-1977 0.00 0.01 0.16 0.01
11179000 CA 1,639 Shrubland & Grassland 1967-1968 0.00 0.10 5.37 0.32
11473900 CA 1,930 Forest & Woodland 1972-1972 0.01 1.11 14.99 1.95
11525655 CA 2,098 Forest & Woodland 1983-1983 0.28 1.79 6.43 1.73
11467000 CA 3,465 Developed & Other Human Use 1976-1983 0.00 2.18 54.93 5.51
11475000 CA 5,457 Forest & Woodland 1967-1967 0.06 3.46 58.21 7.12
11530000 CA 7,389 Forest & Woodland 1969-1978 0.20 4.13 94.62 6.98
11477000 CA 8,063 Forest & Woodland 1978-1978 0.10 6.96 125.89 15.57
11407000 CA 9,386 Forest & Woodland 1969-1978 0.18 0.72 42.74 2.98
11407150 CA 9,521 Forest & Woodland 1983-1992 0.61 3.23 117.07 5.77
142
Table A 1. Statistics of USGS Stations (continued)
USGS
Station State
Drainage
Area (km2)
Dominant Landuse Period Min.
(m3/s)
Average
(m3/s)
Max.
(m3/s)
S. D.
(m3/s)
11407700 CA 10,293 Agricultural Vegetation 1966-1975 0.13 4.65 59.74 5.64
11523000 CA 21,950 Forest & Woodland 1969-1978 0.83 7.85 192.44 10.91
11303500 CA 35,058 Agricultural Vegetation 2001-2010 0.41 2.52 27.74 3.31
11447500 CA 60,883 Agricultural Vegetation 1960-1965 5.34 16.42 79.22 12.27
11447650 CA
Agricultural Vegetation 1994-2003 4.91 21.28 90.61 16.13
09306242 CO 82 Forest & Woodland 1978-1979 0.00 0.00 0.01 0.00
09306007 CO 458 Forest & Woodland 1979-1980 0.00 0.02 0.13 0.02
09306061 CO 800 Forest & Woodland 1975-1980 0.00 0.01 0.12 0.01
09306200 CO 1,311 Forest & Woodland 1973-1982 0.00 0.02 0.13 0.01
09251000 CO 8,762 Forest & Woodland 1951-1957 0.03 1.18 12.35 2.00
01481500 DE 813 Developed & Other Human Use 1973-1973 0.12 0.52 3.92 0.47
02269160 FL 5,271 Developed & Other Human Use 2008-2010 0.09 0.77 4.46 0.74
02383500 GA 2,152 Forest & Woodland 1961-1962 0.26 1.40 19.49 1.82
16270900 HI 1
1990-1990 0.00 0.00 0.01 0.00
16265600 HI 3
1987-1996 0.00 0.00 0.14 0.01
16272200 HI 10
1991-1992 0.00 0.01 0.26 0.01
16213000 HI 117
1986-1986 0.00 0.02 0.24 0.03
05455000 IA 8 Agricultural Vegetation 1973-1982 0.00 0.00 0.07 0.00
06809000 IA 67 Agricultural Vegetation 1962-1962 0.00 0.01 0.22 0.03
05389400 IA 88 Agricultural Vegetation 1999-2001 0.01 0.02 0.23 0.01
06817000 IA 1,974 Agricultural Vegetation 1983-1984 0.02 0.51 7.19 0.80
06809500 IA 2,315 Agricultural Vegetation 1967-1967 0.02 0.22 7.02 0.66
05418500 IA 4,022 Agricultural Vegetation 2001-2001 0.16 1.04 4.32 0.74
05454500 IA 8,472 Agricultural Vegetation 1977-1986 0.04 1.98 8.18 1.81
05474000 IA 11,168 Agricultural Vegetation 2001-2010 0.04 3.24 35.28 4.63
143
Table A 1. Statistics of USGS Stations (continued)
USGS
Station State
Drainage
Area (km2)
Dominant Landuse Period Min.
(m3/s)
Average
(m3/s)
Max.
(m3/s)
S. D.
(m3/s)
05481650 IA 15,128 Agricultural Vegetation 1994-2003 0.12 2.55 13.55 3.05
05464500 IA 16,861 Developed & Other Human Use 1944-1953 0.17 3.03 43.06 4.20
05465500 IA 32,375 Agricultural Vegetation 1996-2005 0.64 7.11 57.73 7.17
05389500 IA 174,824 Agricultural Vegetation 1994-2003 5.85 35.44 198.06 23.62
0660660 IA 6,475 Agricultural Vegetation 1960-1960 5.94 29.49 78.10 13.00
13341000 ID 6,327 Forest & Woodland 1967-1967 0.24 4.53 21.89 4.61
12318500 ID 34,706 Forest & Woodland 1967-1971 1.76 13.37 76.74 16.04
05570370 IL 107 Agricultural Vegetation 1977-1986 0.00 0.03 0.81 0.05
05591200 IL 1,225 Agricultural Vegetation 1987-1996 0.00 0.36 7.77 0.65
05532500 IL 1,632 Developed & Other Human Use 2003-2009 0.08 0.64 7.14 0.67
05570000 IL 4,237 Agricultural Vegetation 2004-2010 0.02 1.18 20.21 1.86
05594100 IL 11,378 Agricultural Vegetation 1987-1996 0.06 2.86 39.05 3.94
05586100 IL 69,264 Agricultural Vegetation 2001-2010 2.21 21.49 85.00 16.47
05587455 IL 443,665 Agricultural Vegetation 2001-2010 12.99 106.30 352.01 69.03
07020500 IL 1,835,265 Agricultural Vegetation 1996-2004 45.87 169.87 586.96 98.47
07022000 IL 1,847,179 Agricultural Vegetation 2001-2010 50.12 190.00 663.93 116.31
05588720 IL 22 Agricultural Vegetation 2010-2010 12.99 106.30 352.01 69.03
03340800 IN 360 Agricultural Vegetation 1975-1976 0.00 0.11 2.76 0.25
04182000 IN 1,974 Agricultural Vegetation 1965-1965 0.01 0.32 3.39 0.65
03365500 IN 6,063 Agricultural Vegetation 1969-1971 0.22 1.76 28.06 2.75
07147800 KS 4,869 Shrubland & Grassland 1971-1972 0.00 0.25 5.96 0.53
06869500 KS 7,304 Agricultural Vegetation 1968-1969 0.00 0.04 1.04 0.10
06877600 KS 49,883 Agricultural Vegetation 1965-1974 0.04 1.20 37.05 2.44
07140000 KS 85,641 Agricultural Vegetation 1967-1968 0.02 0.10 1.04 0.07
07146500 KS 113,216 Agricultural Vegetation 1965-1974 0.15 1.68 42.26 3.24
144
Table A 1. Statistics of USGS Stations (continued)
USGS
Station State
Drainage
Area (km2)
Dominant Landuse Period Min.
(m3/s)
Average
(m3/s)
Max.
(m3/s)
S. D.
(m3/s)
06887500 KS 143,175 Agricultural Vegetation 1965-1974 0.19 4.50 49.31 6.05
03217000 KY 627 Forest & Woodland 1965-1965 0.00 0.23 3.45 0.43
03308500 KY 4,333 Agricultural Vegetation 1983-1992 0.13 2.16 50.36 3.15
03212500 KY 5,553 Forest & Woodland 1967-1972 0.08 2.16 30.63 3.28
03251500 KY 6,024 Agricultural Vegetation 1967-1968 0.03 2.30 22.13 3.82
03287500 KY 14,014 Agricultural Vegetation 1968-1972 0.11 5.73 79.54 9.24
02489500 LA 17,024 Forest & Woodland 1985-1987 1.33 7.26 56.53 9.48
01614500 MD 1,279 Agricultural Vegetation 1977-1979 0.07 0.59 8.98 0.84
01603000 MD 2,271 Forest & Woodland 1969-1978 0.10 1.14 12.67 1.47
01638500 MD 24,996 Agricultural Vegetation 1982-1982 1.24 7.29 61.26 8.48
04102700 MI 217 Agricultural Vegetation 1981-1981 0.02 0.08 0.87 0.09
04102420 MI 805 Agricultural Vegetation 1981-1981 0.18 0.33 1.38 0.16
04176500 MI 2,699 Agricultural Vegetation 1983-2010 0.01 0.02 0.05 0.00
04125350 MI
Forest & Woodland 1969-1969 0.01 0.02 0.05 0.00
05293000 MN 1,189 Agricultural Vegetation 1978-1979 0.00 0.08 2.08 0.21
05325000 MN 38,591 Agricultural Vegetation 2001-2010 0.15 4.65 66.87 7.01
05288500 MN 49,469 Agricultural Vegetation 1986-1995 0.68 7.36 40.17 5.82
05378500 MN 153,327 Agricultural Vegetation 1976-1984 1.88 26.01 109.85 18.19
05506500 MO 922 Agricultural Vegetation 1992-1994 0.00 0.28 8.74 0.70
07010000 MO 1,805,222 Agricultural Vegetation 1995-2004 43.30 167.24 638.27 100.27
07273100 MS 91 Agricultural Vegetation 1992-1994 0.00 0.03 1.23 0.09
07287404 MS 161 Forest & Woodland 1990-1999 0.01 0.08 3.50 0.24
07287150 MS 247 Forest & Woodland 1993-2002 0.01 0.10 3.58 0.23
07274252 MS 251 Forest & Woodland 1987-1989 0.01 0.11 2.84 0.25
07277700 MS 313 Agricultural Vegetation 1992-1994 0.03 0.13 5.78 0.42
145
Table A 1. Statistics of USGS Stations (continued)
USGS
Station State
Drainage
Area (km2)
Dominant Landuse Period Min.
(m3/s)
Average
(m3/s)
Max.
(m3/s)
S. D.
(m3/s)
07285400 MS 622 Forest & Woodland 1992-1993 0.02 0.17 6.63 0.40
06088500 MT 813 Agricultural Vegetation 1976-1981 0.00 0.11 1.40 0.12
12324200 MT 2,577 Forest & Woodland 1993-2002 0.02 0.21 1.59 0.16
12340000 MT 5,931 Forest & Woodland 1989-1994 0.16 1.11 7.92 1.23
06018500 MT 9,373 Semi-Desert 1964-1973 0.02 0.37 1.38 0.20
12334550 MT 9,430 Forest & Woodland 1993-2002 0.16 1.08 7.64 1.02
12340500 MT 15,537 Forest & Woodland 2004-2010 0.36 2.10 13.87 2.07
06130500 MT 20,321 Shrubland & Grassland 1987-1987 0.00 0.07 2.26 0.14
06324500 MT 20,943 Shrubland & Grassland 1976-1976 0.02 0.37 2.49 0.39
06294700 MT 59,272 Shrubland & Grassland 1962-1971 0.32 3.41 20.21 2.60
06115200 MT 106,156 Agricultural Vegetation 1984-1990 2.41 6.30 19.49 2.48
06329500 MT 178,924 Shrubland & Grassland 1972-1980 1.12 11.07 83.39 9.33
06088300 MT 580 Semi-Desert 1972-1981 0.02 0.37 1.38 0.20
02119400 NC 13 Agricultural Vegetation 1959-1968 0.00 0.01 0.06 0.01
05099600 ND 8,676 Agricultural Vegetation 1971-1972 0.00 0.31 7.47 0.64
06486000 NE 814,810 Agricultural Vegetation 1992-1999 5.94 29.49 78.10 13.00
06610000 NE 836,048 Agricultural Vegetation 1993-2002 10.02 32.82 93.01 13.64
06807000 NE 1,061,895 Agricultural Vegetation 2001-2010 6.94 28.55 132.31 14.05
01463500 NJ 17,560 Developed & Other Human Use 1972-1981 1.52 10.79 87.40 10.04
08334000 NM 1,088 Forest & Woodland 2001-2010 0.00 0.01 0.81 0.05
09364500 NM 3,522 Forest & Woodland 1983-1992 0.05 0.75 6.50 0.85
08340500 NM 3,600 Forest & Woodland 1983-1984 0.00 0.02 0.87 0.05
08286500 NM 4,144 Forest & Woodland 1964-1973 0.01 0.28 4.96 0.43
08287000 NM 5,561 Forest & Woodland 1965-1974 0.01 0.31 2.23 0.37
08383000 NM 6,863 Shrubland & Grassland 1983-1984 0.00 0.09 0.91 0.23
146
Table A 1. Statistics of USGS Stations (continued)
USGS
Station State
Drainage
Area (km2)
Dominant Landuse Period Min.
(m3/s)
Average
(m3/s)
Max.
(m3/s)
S. D.
(m3/s)
08290000 NM 8,143 Forest & Woodland 1969-1974 0.00 0.33 2.35 0.39
09368000 NM 33,411 Forest & Woodland 1976-1985 0.06 1.75 10.99 1.59
08313000 NM 37,037 Forest & Woodland 1994-2003 0.21 1.07 6.94 0.95
08317400 NM 38,591 Forest & Woodland 1975-1983 0.00 1.05 5.48 1.20
08396500 NM 39,627 Shrubland & Grassland 1987-1989 0.00 0.18 0.86 0.22
08329500 NM 44,807 Forest & Woodland 1957-1959 0.00 1.16 9.30 1.63
08330000 NM 45,169 Forest & Woodland 2001-2010 0.08 0.75 5.22 0.79
08332010 NM 49,805 Forest & Woodland 1983-1992 0.00 1.20 7.18 1.17
08354900 NM 69,334 Forest & Woodland 1985-1994 0.00 1.28 7.55 1.17
08358400 NM 71,743 Forest & Woodland 2001-2010 0.00 0.46 4.61 0.63
08354800 NM
Forest & Woodland 1975-1984 0.00 0.40 1.52 0.46
08358300 NM
Forest & Woodland 1984-1993 0.00 0.31 1.43 0.23
01357500 NY 8,935 Forest & Woodland 2005-2010 0.24 5.93 71.93 6.71
4185440 OH 11 Agricultural Vegetation 2008-2011 0.00 0.01 0.26 0.01
4285400 OH 42 Agricultural Vegetation 2009-2011 0.00 0.01 0.26 0.01
04197170 OH 90 Agricultural Vegetation 1983-2011 0.02 4.33 59.10 6.83
4197100 OH 386 Agricultural Vegetation 1977-2011 0.00 0.01 0.26 0.01
04199500 OH 679 Agricultural Vegetation 2001-2007 0.02 0.89 19.00 1.67
4189000 OH 896 Agricultural Vegetation 2008-2011 0.00 0.01 0.26 0.01
4185000 OH 1,062 Agricultural Vegetation 2008-2011 0.00 0.01 0.26 0.01
4195500 OH 1,109 Agricultural Vegetation 2011-2011 0.00 0.01 0.26 0.01
03265000 OH 1,303 Agricultural Vegetation 1967-1973 0.01 0.38 6.23 0.69
04212100 OH 1,774 Introduced & Semi Natural Vegetation 1981-1990 0.00 0.88 12.27 1.36
04208000 OH 1,831 Introduced & Semi Natural Vegetation 1992-2001 0.09 0.74 7.54 0.79
4208000 OH 1,831 Introduced & Semi Natural Vegetation 1982-2011 0.00 0.01 0.26 0.01
147
Table A 1. Statistics of USGS Stations (continued)
USGS
Station State
Drainage
Area (km2)
Dominant Landuse Period Min.
(m3/s)
Average
(m3/s)
Max.
(m3/s)
S. D.
(m3/s)
04198000 OH 3,240 Agricultural Vegetation 1992-2001 0.02 0.89 19.00 1.67
04198000 OH 3,240 Agricultural Vegetation 1975-2011 0.02 0.89 19.00 1.67
3271500 OH 7,021 Agricultural Vegetation 1997-2011 0.00 0.01 0.26 0.01
3231500 OH 9,969 Agricultural Vegetation 1997-2011 0.00 0.01 0.26 0.01
03234500 OH 13,289 Agricultural Vegetation 1956-1965 0.24 3.48 101.84 6.38
03144500 OH 15,522 Agricultural Vegetation 1955-1964 0.38 4.82 29.59 5.52
04193500 OH 16,395 Agricultural Vegetation 1993-2002 0.02 4.33 59.10 6.83
03150000 OH 19,223 Forest & Woodland 1981-1990 0.45 6.81 35.36 6.32
14306810 OR 3 Forest & Woodland 1968-1968 0.00 0.01 0.05 0.01
01481000 PA 743 Developed & Other Human Use 1968-1969 0.08 0.25 2.81 0.26
01470500 PA 919 Agricultural Vegetation 1978-1980 0.05 0.63 11.23 0.84
01567000 PA 8,687 Agricultural Vegetation 1975-1984 0.36 3.78 69.84 4.63
01570500 PA 62,419 Agricultural Vegetation 1973-1978 3.86 32.88 412.15 34.00
50065500 PR 18 1993-2002 0.00 0.35 11.29 0.94
50048770 PR 19 1992-1992 0.00 0.35 11.29 0.94
50053025 PR 19 1995-1996 0.00 0.35 11.29 0.94
50058350 PR 20 1992-1992 0.00 0.35 11.29 0.94
50071000 PR 39 1999-1999 0.00 0.35 11.29 0.94
50136400 PR 47 2008-2008 0.00 0.35 11.29 0.94
50028000 PR 48 2002-2002 0.01 0.05 0.24 0.04
50055750 PR 58 1995-2004 0.00 0.35 11.29 0.94
50057000 PR 156 1994-1994 0.00 0.35 11.29 0.94
50055000 PR 233 1995-1996 0.00 0.35 11.29 0.94
50043800 PR 281 1995-1996 0.01 0.15 25.98 1.01
50059050 PR 541 1990-1999 0.00 0.35 11.29 0.94
148
Table A 1. Statistics of USGS Stations (continued)
USGS
Station State
Drainage
Area (km2)
Dominant Landuse Period Min.
(m3/s)
Average
(m3/s)
Max.
(m3/s)
S. D.
(m3/s)
05291000 SD 1,031 Agricultural Vegetation 1974-1980 0.00 0.04 2.93 0.15
06441500 SD 8,151 Shrubland & Grassland 1998-1998 0.01 0.19 2.47 0.32
06452000 SD 25,693 Shrubland & Grassland 2006-2007 0.00 0.27 3.53 0.40
03407876 TN 45 Forest & Woodland 1978-1979 0.00 0.04 0.68 0.08
07030137 TN 207 Agricultural Vegetation 1986-1986 0.00 0.04 2.00 0.14
03584500 TN 4,621 Agricultural Vegetation 1937-1937 0.22 2.75 32.07 4.39
08136500 TX 17,027 Shrubland & Grassland 1979-1980 0.00 0.08 19.08 0.78
08065000 TX 33,237 Agricultural Vegetation 1977-1979 0.31 2.85 29.11 4.67
08066500 TX 44,512 Forest & Woodland 1969-1970 0.27 5.79 38.49 8.56
08161000 TX 107,847 Agricultural Vegetation 1963-1972 0.09 1.88 37.37 2.56
09379500 UT 59,570 Forest & Woodland 1970-1979 0.06 1.71 27.82 1.82
09180500 UT 62,419 Forest & Woodland 1974-1983 0.88 5.48 48.51 5.99
09261000 UT 76,819 Forest & Woodland 1965-1973 0.65 3.65 17.72 2.67
09315000 UT 116,161 Forest & Woodland 1958-1967 0.34 3.88 30.39 4.17
01658500 VA 20 Developed & Other Human Use 2005-2005 0.00 0.01 0.21 0.02
01664000 VA 1,603 Agricultural Vegetation 1990-1991 0.01 0.52 7.27 0.65
02075500 VA 6,700 Agricultural Vegetation 1971-1980 0.36 2.66 50.84 3.13
02066000 VA 7,682 Forest & Woodland 1971-1980 0.23 2.91 56.53 3.81
13351000 WA 6,475 Agricultural Vegetation 1962-1962 0.01 0.32 4.78 0.53
054310157 WI 11 Agricultural Vegetation 1999-2008 0.00 0.00 0.08 0.01
05431016 WI 44 Agricultural Vegetation 1999-2008 0.00 0.01 0.66 0.03
05427948 WI 47 Agricultural Vegetation 2000-2009 0.00 0.01 0.45 0.02
05406500 WI 118 Agricultural Vegetation 1956-1964 0.01 0.02 0.56 0.02
05427718 WI 191 Agricultural Vegetation 2000-2009 0.01 0.02 1.03 0.04
05413500 WI 697 Agricultural Vegetation 1996-1999 0.06 0.15 2.29 0.12
149
Table A 1. Statistics of USGS Stations (continued)
USGS
Station State
Drainage
Area (km2)
Dominant Landuse Period Min.
(m3/s)
Average
(m3/s)
Max.
(m3/s)
S. D.
(m3/s)
05427965 WI 9 Agricultural Vegetation 2001-2010 0.00 0.01 0.45 0.02
03199000 WV 697 Forest & Woodland 1975-1977 0.01 0.31 7.04 0.47
03200500 WV 2,233 Forest & Woodland 1981-1983 0.04 0.82 11.23 1.10
06250000 WY 922 Semi-Desert 1950-1957 0.00 0.06 0.40 0.05
06253000 WY 1,083 Semi-Desert 1959-1968 0.01 0.13 0.70 0.09
06317000 WY 15,669 Semi-Desert 1975-1976 0.00 0.25 5.21 0.36
09217000 WY 36,260 Semi-Desert 1982-1991 0.16 1.44 11.47 1.70
06279500 WY 40,823 Semi-Desert 1954-1963 0.24 1.50 13.39 1.14
150
150
Appendix B 95% Confidence Intervals of Ratios
The six figures that follow (Figures B 1-6) show variations of 95% confidence intervals
of the ratios (estimated pollutant load / measured pollutant load) by different sampling
strategies and regression models of LOADIN (LD) and LOADEST (LT). The numbers
with LT indicate the LOADEST model number; for instance, LT(3) indicates LOADEST
model number 3. Each model in each graph has three 95% confidence intervals; the first
95% confidence interval is for the monthly (M) sampling strategy, the second 95%
confidence interval is for the biweekly sampling strategy (B), and the third 95%
confidence interval is for the weekly sampling strategy (W).
Figure B 1 Ratio Comparison of Sediment Estimation by Fixed Sampling Frequencies
151
151
Figure B 2 Ratio Comparison of Sediment Estimation by Fixed Sampling Frequencies
Supplemented with Stratified Sampling
152
152
Figure B 3 Ratio Comparison of Phosphorus Estimation by Fixed Sampling Frequencies
153
153
Figure B 4 Ratio Comparison of Phosphorus Estimation by Fixed Sampling Frequencies
Supplemented with Stratified Sampling
154
154
Figure B 5 Ratio Comparison of Nitrogen Estimation by Fixed Sampling Frequencies
155
155
Figure B 6 Ratio Comparison of Nitrogen Estimations by Fixed Sampling Frequencies
Supplemented with Stratified Sampling
156
156
Appendix C Relationships between Flow, Concentration, Logarithm Flow, Logarithm
Load, and Squared Logarithm Load
Figures C 1-7 demonstrate LOADEST model behavior by percentage of calibration data
in high flow (PCH; see the Chapters 2 and 3) and are supplementary figures for Chapters
2 and 3, which indicate that an appropriate portion of water quality data from high flow
regime is required and that the relationship between flow and concentration data affects
annual sediment load estimation.
The figures are scatter plots of (a) flow (cubic meter per second; cms) and sediment
concentration data (mg/L), (b) logarithm flow and logarithm sediment load (kg), and (c)
squared logarithm flow and logarithm sediment load.
Daily flow and sediment concentration data were collected from USGS station 10336610,
and annual measured load (kg; sum of measured daily load) was 491,387 kg/year. PCH of
the subsampled data was initially 0% (Figure C 1) and was increased up to 65% (Figure
C 7) by adding the water quality data collected from the high flow regime. The flow and
sediment concentration data (Figure C 1-7 (a)) were used to run LOADEST model 1 (the
simplest model in LOADEST) and 9 (the most complex model in LOADEST), and
estimated sediment loads increased with PCH increases. Error was computed with Eq. 3.6
from Chapter 3. LOADEST assumes that pollutant load is an exponential function of data
variables such as logarithm flow, squared logarithm flow, etc. Therefore, the plots of
logarithm flow (Figure C 1-7 (b)) and squared logarithm flow (Figure C 1-7 (c)) with
logarithm sediment load were created.
157
(a) (b) (c)
Figure C 1 Scatter plot of flow, concentration, and load with PCH of 0%
(a) flow and sediment concentration data, (b) logarithm flow and logarithm sediment load,
and (c) squared logarithm flow and logarithm sediment load
Number of water quality data: 12
Estimated load by LOADEST model 1: 364,360 kg/year (Error: -26%)
Estimated load by LOADEST model 9: 456,922 kg/year (Error: -7%)
158
(a) (b) (c)
Figure C 2 Scatter plot of flow, concentration, and load with PCH of 14%
(a) flow and sediment concentration data, (b) logarithm flow and logarithm sediment load,
and (c) squared logarithm flow and logarithm sediment load
Number of water quality data: 14
Estimated load by LOADEST model 1: 465,446 kg/year (Error: -5%)
Estimated load by LOADEST model 9: 520,523 kg/year (Error: 6%)
159
(a) (b) (c)
Figure C 3 Scatter plot of flow, concentration, and load with PCH of 25%
(a) flow and sediment concentration data, (b) logarithm flow and logarithm sediment load,
and (c) squared logarithm flow and logarithm sediment load
Number of water quality data: 16
Estimated load by LOADEST model 1: 509,446 kg/year (Error: 4%)
Estimated load by LOADEST model 9: 518,596 kg/year (Error: 6%)
160
(a) (b) (c)
Figure C 4 Scatter plot of flow, concentration, and load with PCH of 37%
(a) flow and sediment concentration data, (b) logarithm flow and logarithm sediment load,
and (c) squared logarithm flow and logarithm sediment load
Number of water quality data: 19
Estimated load by LOADEST model 1: 522,808 kg/year (Error: 6%)
Estimated load by LOADEST model 9: 498,494 kg/year (Error: 1%)
161
(a) (b) (c)
Figure C 5 Scatter plot of flow, concentration, and load with PCH of 46%
(a) flow and sediment concentration data, (b) logarithm flow and logarithm sediment load,
and (c) squared logarithm flow and logarithm sediment load
Number of water quality data: 22
Estimated load by LOADEST model 1: 544,870 kg/year (Error: 11%)
Estimated load by LOADEST model 9: 509,304 kg/year (Error: 4%)
162
(a) (b) (c)
Figure C 6 Scatter plot of flow, concentration, and load with PCH of 56%
(a) flow and sediment concentration data, (b) logarithm flow and logarithm sediment load,
and (c) squared logarithm flow and logarithm sediment load
Number of water quality data: 27
Estimated load by LOADEST model 1: 573,499 kg/year (Error: 17%)
Estimated load by LOADEST model 9: 516,050 kg/year (Error: 5%)
163
(a) (b) (c)
Figure C 7 Scatter plot of flow, concentration, and load with PCH of 65%
(a) flow and sediment concentration data, (b) logarithm flow and logarithm sediment load,
and (c) squared logarithm flow and logarithm sediment load
Number of water quality data: 34
Estimated load by LOADEST model 1: 563,115 kg/year (Error: 15%)
Estimated load by LOADEST model 9: 500,106 kg/year (Error: 2%)
VITA
164
164
VITA
Youn Shik Park was born in Wonju-si, South-Korea. He received his Bachelor of
Engineering degree in Agricultural Engineering from Kangwon National University,
South-Korea in 2007. He graduated with a Master of Engineering in Agricultural
Engineering from Kangwon National University, South-Korea in 2009. He joined
Agricultural and Biological Engineering Department’s Ph.D. program at Purdue
University in August 2010.