ABSTRACT
Many of the processes that affect dissolved oxygen concen-trations in the Tualatin River — solubility, sediment oxygendemand, photosynthesis, respiration, biochemical oxygendemand, and reaeration — are controlled to some extent byphysical and meteorological factors such as streamflow, airtemperature, and solar radiation. To test the extent of thatcontrol, an artificial neural network model was constructed topredict dissolved oxygen concentrations in the Tualatin Riverat the Oswego Dam using only air temperature, solar radia-tion, and streamflow as inputs. Hourly dissolved oxygen con-centrations have been collected at the Oswego Dam since1991; the available dataset spans more than 10 years.
Feedforward neural network modeling techniques, the mostwidely used type, were applied to this dataset. Data weresegregated into calibration, verification, and test subsets.Two neural network models were constructed in series: thefirst model simulated daily mean dissolved oxygen concen-trations, while the second superimposed the daily periodicsignals. The final calibrated neural network models predictedthe dissolved oxygen concentration with acceptable accu-racy, producing high correlations between measured and pre-dicted values (r=0.83, mean absolute error < 0.9 mg/L).
By some measures, neural network model performance wasbetter than that of a calibrated, mechanistic model of dis-solved oxygen in the Tualatin River. As expected, however,dissolved oxygen concentrations affected by factors otherthan the physical and meteorological factors used as modelinputs, such as large point-source ammonia releases, werenot predicted well by the neural network model. Neverthe-less, the neural network model demonstrated potential foruse as a river management and forecasting tool to predict theeffects of flow augmentation and near-term weather condi-tions on Tualatin River dissolved oxygen concentrations.
FACTORS AFFECTING DISSOLVED OXYGEN
Dissolved oxygen concentrations in the Tualatin River (fig. 1)are affected by many physical factors and biological pro-cesses:
• Solubility• Residence time• Reaeration• Algal respiration• Photosynthesis• Oxygen consuming reactions (BOD, SOD)
In addition, physical and meteorological factors such as tem-perature and residence time influence the effects of the bio-logical processes.
Photosynthesis and respiration affect DO only when sufficientlight energy is available and when streamflow is low enough(< 8.5 m3/s) to allow sufficient time for the phytoplankton togrow while in the backwater reach (fig. 2).
DEVELOPMENT OF A NEURAL NETWORK MODEL FOR DISSOLVED OXYGEN IN THE TUALATIN RIVER, OREGONBY STEWART A. ROUNDS, U.S. GEOLOGICAL SURVEY, OREGON WATER SCIENCE CENTER, PORTLAND, OREGON
The Tualatin River at Oswego Dam, river mile (RM) 3.4.
Input #1
Input #2
Input #3
Input #4
Output
Hidden Layer OutputLayer
InputLayer
Figure 1. Map of Tualatin River Basin.
CONCLUSIONS
Artificial neural network models were developed to simulate dailymean and hourly DO concentrations in the Tualatin River at theOswego Dam. The DO at that site is affected by its solubility as wellas biological processes such as algal photosynthesis and respira-tion, sediment oxygen demand, biochemical oxygen demand, andammonia nitrification. The effects of these biological processes onDO, however, are constrained by physical and meteorological fac-tors such as streamflow, air temperature, and solar radiation. Neuralnetwork and regression models were built to predict DO based onthese factors, using data from May-October of 1991-2000.
• Multiple linear regression models failed to capture the long-termpatterns in the DO data, producing poor results.
• Neural network models were successful in predicting patterns inthe DO data on daily, weekly, and seasonal time scales. Separatemodels were used to simulate the low- and high-frequency pat-terns in the data.
• ANN model performance was good, with mean absolute errorsless than 0.9 mg/L. Approximately 70% of the variation in the DOdata was captured by the final ANN model.
• ANN predictions often were better than those from a USGS pro-cess-based model of the Tualatin River (not shown). As applied tothe Tualatin River, however, ANN and process-based models havedifferent purposes. The process-based model is most useful forproviding insight into how the river works, identifying importantprocesses, and testing the effects of point-sources and manage-ment strategies. The ANN model has tremendous potential as aforecasting tool, but yields less insight into the specifics of riverineprocesses.
Future work will focus on incorporating these and other ANN modelsinto real-time water-quality forecasting tools. Such tools will provideimportant information to river managers, particularly as they makedecisions regarding the proper level of flow augmentation.
May Jun Jul Aug Sep Oct1995
5
7
9
11
13
15
17
Dis
solv
ed O
xyge
n (m
g/L)
MeasuredSimulated
468
101214161820
1991
1993
1995
1997
1999 2000
1998
1996
1994
1992MeasuredSimulated
468
101214
468
101214
Dai
ly M
ean
Dis
solv
ed O
xyge
n C
once
ntra
tion
(mg/
L)
468
101214
May Jun Jul Aug Sep Oct468
101214
May Jun Jul Aug Sep Oct
0 5 10 15 20 25 30Time Lag (days)
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Cor
rela
tion
Coe
ffici
ent (
r)
Solar RadiationStreamflow at West LinnAir TemperatureRainfall
Sig
nal M
agni
tude
Dissolved Oxygen
Sig
nal M
agni
tude
Streamflow
Sig
nal M
agni
tude Air Temperature
Sig
nal M
agni
tude Solar Radiation
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Frequency (1/day)
Sig
nal M
agni
tude Rainfall
The Tualatin River’s reservoir-like reach at Stafford, RM 5.5. Lee Falls on the Tualatin River in the Coast Range Mountains.
Tualatin River at Farmington Bridge, RM 33.3.
OBJECTIVES AND APPROACH
The purpose of this study was to determine the extent to which the DO con-centration in the Tualatin River at the Oswego Dam could be predicted solelyfrom physical and meteorological measurements such as streamflow, airtemperature, solar radiation, and rainfall, using multiple linear regression andartificial neural network modeling techniques. If successful, these modelswould be used to create a real-time DO forecasting tool.
ARTIFICIAL NEURAL NETWORKS
An artificial neural network (ANN) is a mathematical structure designed tomimic the information processing functions of a network of neurons in thebrain. ANNs are highly parallel systems that processinformation through many interconnected units thatrespond to inputs through modifiable weights, thresh-olds, and mathematical transfer functions. Each unitprocesses the pattern of activity it receives from otherunits, then broadcasts its response to still other units.
ANNs are particularly well suited for:
• large datasets• complex, nonlinear relations• pattern recognition
ANNs are able to find and identify complex patterns indatasets that may not be well described by a set ofknown processes or simple mathematical formulae.They are not constrained by any preconceived algo-rithms or relations among inputs.
Training an ANN is a mathematical exercise that optimizes all of the ANN’sweights and threshold values, using some fraction of the available data. Opti-mization routines can be used to determine the ideal number of units in thehidden layer and the nature of their transfer functions. ANNs “learn” byexample; as long as the input dataset contains a wide range of the types ofpatterns that the ANN will be asked to predict, the model is likely to find thosepatterns and successfully use them in its predictions.
Figure 8. Measured and simulated hourly DO concentrations for the summer of 1995 inthe Tualatin River at Oswego Dam. Simulated values were calculated by the final hourlyANN model (9 inputs, 1 hidden layer with 10 processing units, 1 output).
Figure 5. Correlations and time-lags between low-passfiltered DO and other low-pass filtered inputs
Figure 4. Typical power spectrums for DO, stream-flow, air temperature, solar, and rainfall data.
Figure 7. Measured and simulated daily-mean (low-pass) DO con-centrations for the Tualatin River at Oswego Dam. Simulated valueswere calculated by the low-frequency ANN model (8 inputs, 1 hiddenlayer with 7 processing units, 1 output).Figure 3. A representation of a 3-layer feedforward artificial neural
network with four inputs, 5 hidden nodes, and one output.
LOW-FREQUENCY (DAILY MEAN) MODELS
Long- and short-term patterns were simulated with separate models(fig. 6). After optimization, the low-frequency ANN model required only8 inputs:
• low-pass (lp) filtered data: lp-Q, lp-S, lp-AT• low-pass filtered data from 12 days ago: lp-Q-12, lp-AT-12• low-pass filtered & lagged data: lp-S-lag (1.75 days)• day-of-year, year
where Q, S, and AT stand for streamflow, solar radiation, and air tem-perature. Time-lagged inputs were calculated as differences.
MULTIPLE LINEAR REGRESSION
Multiple linear regression is a special case ANN model that uses lineartransfer functions and no hidden layers. Patterns in the data, however,were highly nonlinear and the regression did not perform well (table 1).
ARTIFICIAL NEURAL NETWORK
Optimization yielded an ANN with one hidden layer containing sevennodes. ANN predictions were markedly better than the linear modeland in many cases better than a USGS process-based model, with amean absolute error of only 0.83 mg/L and a correlation coefficient of0.837 (fig. 7, table 1).
The ANN model captured the most important patterns in the data, pro-ducing remarkable fits to the measured DO considering that the pre-dictions were based only on streamflow, air temperature, solarradiation, year, and day-of-year. The most important predictor vari-ables were lp-Q, day-of-year, lp-S, and lp-S-lag, respectively.
FINAL HOURLY MODEL
High-frequency signals in the data were separatedfrom low-frequency signals by subtracting the low-pass filtered data from the original data. High-pass ATand S inputs were included at several time lags tocapture their 12- and 24-hour signals; the streamflowdata had no useful high-frequency signals (fig. 4).
Final training and optimization yielded a high-fre-quency ANN model with nine inputs:
• output from the low-frequency ANN,• high-pass (hp) filtered AT & S (3 time lags each),• day-of-year, and year.
The high-frequency ANN used one hidden layer with10 nodes.
The final model captured both the long-term and dailypatterns in the measured DO data, producing a meanabsolute error of 0.86 mg/L and a correlation coeffi-cient of 0.831 (table 1).
Figure 8 illustrates the typical daily variations that themodel produced in the final hourly DO. These predic-tions are accurate enough to be useful and can formthe basis for a real-time DO forecasting tool.
0
50
100
150
200
250
300
350
Sol
ar R
adia
tion
(W/m
2)
0
5
10
15
20
25
30
35
Str
eam
flow
(m
3 /s)
May Jun Jul Aug Sep Oct1993
0
10
20
30
40
50
60
70
Chl
orop
hyll-
a (µ
g/L)
Figure 2. Favorable streamflow and light conditions arenecessary before sizable algal blooms can occur in theTualatin River. Shaded periods are unfavorable for algalgrowth due to high flow (red) or low light (blue) condi-tions.
DATA PREPARATION AND DECORRELATION
To maximize the signals in the input data that will help topredict the output, it is critical to examine the data for peri-odicity, cross-correlations, and important time lags.
PERIODICITY
Each parameter’s data were analyzed by Fourier transformto determine the presence of periodic signals. Solar radia-tion, air temperature, and DO all had strong periodic signalsat daily time scales; periods of 24 and 12 hours character-ized the most important signals. Streamflow appeared tohave useful signals at time scales longer than a day or two,but only weak patterns at shorter time scales. Figure 4 illus-trates typical power spectrums from these data.
Strong signals at daily time scales can obscure importantcorrelations and time lags in the data; therefore, the shortand long time scale signals in the data were separated. Alow-pass filter was used to remove the 24-hour and shorterperiodic signals from each time series, preserving any peri-odic signals at time scales longer than one day.
Long-term patterns and short-term periodicity in the datawere simulated with separate models.
CROSS-CORRELATIONS AND TIME LAGS
Multiple linear regression and ANN techniques work bestwith independent inputs. To test for interdependence, thedata were correlated against one another using linearregression techniques with an imposed time lag (fig. 5).
DO has its highest correlation with the solar insolation ratethat occurred about 2 days previous. That time lag has aphysical basis because the available solar energy affectsthe amount of DO produced by photosynthesis, and theeffects of very sunny or very cloudy days on algal growthare not immediate.
Many of the DO cross-correlations are minimized at a timelag on the order of 12 days, which is the typical summerresidence time in the backwater reach of the Tualatin River.
Table 1. Goodness-of-fit statistics for models predicting DO at the Oswego Dam.
Model Type Time Scale
Numberof DataPoints
Mean Absolute Error
(mg/L)
Root Mean Square Error
(mg/L)
CorrelationCoefficient
(r)
Multiple LinearRegression
low-frequency 40,388 1.29 1.69 0.589
Artificial NeuralNetwork (ANN)
low-frequency 40,388 0.83 1.14 0.837
final hourly 40,372 0.86 1.21 0.831
8:7:1
Low-Frequency ANN
9:10:1
High-Frequency ANN
Daily Mean DO
Final Hourly DO
Low-Frequencyfiltered & lagged
Q, AT, & S inputs,day-of-year, year
High-FrequencyAT & S signals,
day-of-year, year
outputs
inputshidden nodes
Step 1:
Step 2:Nyberg Cr
CO
LUM
B IAR IV E R
RI V
ER
WIL
LA
M
E T T E
LakeOswego
Tigard
Beaverton
Tualatin
Sherwood
Scholls
Farmington
Hillsboro
Banks
Cornelius
Dilley
ForestGrove
NorthPlains
CherryGrove
5
205
Basin boundary
C L A C KA MA S
C OL U MB I A WA S HI N GTON
T I L L A M O O K
YA MHI L L
Scoggin
Creek
He nryHaggLake
Gales
Creek
Wes
t
Eas t
Fork Fork
Dairy
Creek
McK
ay
Cre
ek
Fanno
Cre
ek
Beaverton
Creek
Creek
RockBronso
n
Butternut Cr
Chris tensen Cree
Burris Creek
McFee
Creek
He
ato
n
Cr
Cr
Bake
r
C h icke
n
Creek
Creek
Creek
Cedar Mill
Can
al
Osw
ego
RIVER
O R E G O N
W A S H I N GT O N
123°00' 122°37'30''122°45'123°15'123°22'30''
45°45'
45°30'
45°15'Base modified from U.S. Geological Survey
0
0 5 10 KILOMETERS
5 10 MILES
Portland
O R E G O N
Studyarea
CO
AS
T
RA
NG
E
TU
AL
AT
I N
MO
UN
TA
I NS
PA
RR
ET
TM
OU
NTA
IN
CH
EH
AL
EM
MO
U
NT
A I N S
RM60
RM70
RM30
RM40
RM50
RM10
RM20
RM0
s
k
TUALATIN
26
26
5
WestLinn
1:100,000 topographic quadrangles, 1978–84 Designated urban growth area from Metro, 1998 RM10
River mile
MU LT N O MA H
Tualatin River at Oswego Dam, river mile 3.4 (station 14207200)
Site Name and LocationSymbol
strong daily signals
information at time scaleslonger than one day
Figure 6. The two-step ANN model flow chart.