+ All Categories
Home > Documents > Panel regression techniques for identifying impacts of anthropogenic landscape change on hydrologic...

Panel regression techniques for identifying impacts of anthropogenic landscape change on hydrologic...

Date post: 25-Jan-2017
Category:
Upload: casey
View: 212 times
Download: 0 times
Share this document with a friend
13
Panel regression techniques for identifying impacts of anthropogenic landscape change on hydrologic response Scott Steinschneider, 1 Yi-Chen E. Yang, 1 and Casey Brown 1 Received 12 March 2013 ; revised 31 October 2013 ; accepted 1 November 2013 ; published 3 December 2013. [1] Statistical models relating anthropogenic modifications of the watershed landscape to alterations in streamflow are often plagued by heterogeneity in background watershed conditions and coinciding trends in climate that can complicate the interpretation of clear statistical relationships. This study introduces the use of panel regression as a modeling approach that can better accommodate basin heterogeneity and identify more robust signals between anthropogenic impacts on the watershed landscape and hydrologic response. Panel regression techniques pool multidimensional data recorded across individuals (i.e., watersheds) and through time to better characterize within- and between-variability in the hydrologic data set. The separate attribution of variability to both time and space dimensions enables the method to better identify response characteristics that are generalizable across watersheds. This study introduces in detail two broad classes of panel regression models (the fixed effects and random effects models) that may be particularly useful in the context of identifying and quantifying the impact of human-induced landscape alterations on hydrologic response. The technique is presented in a case study relating watershed urbanization to changes in the annual runoff coefficient for 19 watersheds in the Northeast United States. Comparisons are made against more standard cross-sectional and time-based regression analyses. The results show that the estimated relationship between urbanization and the annual runoff coefficient in this region is highly dependent on the dimensions of data examined (space or time) and model structure selected, with the most sophisticated and appropriate model suggesting that no significant relationship can be detected. Citation : Steinschneider, S., Y.-C. E. Yang, and C. Brown (2013), Panel regression techniques for identifying impacts of anthropogenic landscape change on hydrologic response, Water Resour. Res., 49, 7874–7886, doi:10.1002/2013WR013818. 1. Introduction [2] There is a growing need to improve our understand- ing of and ability to predict the effects of human activities on the hydrologic cycle [Vorosmarty et al., 2000; Wagener et al., 2010; Vogel, 2011]. One of the most influential of these human activities is land use/land cover changes at the watershed scale [DeFries and Eshleman, 2004]. Despite a wide array of past research showing various impacts of landscape change on hydrologic response, significant chal- lenges still face a comprehensive understanding of these impacts. In particular, methods for identifying and predict- ing landscape change signals in hydrologic records are often complicated by significant heterogeneity among indi- vidual watersheds and climate variability that can mask sig- nals over time. This study presents the technique of panel regression as one approach that may account for some of these statistical challenges. The approach pools multidi- mensional data (i.e., data recorded through both space and time) in a single regression framework that can distinguish ‘‘between-variability’’ (associated with the space dimen- sion across basins) from ‘‘within-variability’’ (associated with the time dimension for each basin). This enables the regression to better identify robust relationships between hydrologic response and watershed characteristics as they evolve over time across multiple catchments. [3] It is well recognized that watershed landscape changes can influence many characteristics of the hydro- logic response of a basin. In some instances, the hydrologic response to landscape change is well understood, such as with peak flow responses to urbanization. Here, theory sug- gests that greater amounts of impervious cover, compacted soils, storm drain networks, and river channelization serve to accelerate storm water through the stream channel, increasing the magnitude of flood peaks and quickening flow recession afterward [Leopold, 1968]. This theory is well supported with empirical data observed across many watersheds [Rose and Peters, 2001; Cheng and Wang, 2002; Burns et al., 2005 ; Konrad and Booth, 2005 ; Chang, 2007], with few exceptions [Kjeldsen, 2010]. Other shifts in hydrologic response due to land cover change are more difficult to identify. Continuing with the example of urban- ization, different studies in the literature have found Additional supporting information may be found in the online version of this article. 1 Department of Civil and Environmental Engineering, University of Massachusetts Amherst, Amherst, Massachusetts, USA. Corresponding author: S. Steinschneider, Department of Civil and Environmental Engineering, University of Massachusetts Amherst, 130 Natural Resources Rd., Amherst, MA 01002, USA. (scottsteinschneider@ gmail.com) ©2013. American Geophysical Union. All Rights Reserved. 0043-1397/13/10.1002/2013WR013818 7874 WATER RESOURCES RESEARCH, VOL. 49, 7874–7886, doi :10.1002/2013WR013818, 2013
Transcript

Panel regression techniques for identifying impacts of anthropogeniclandscape change on hydrologic response

Scott Steinschneider,1 Yi-Chen E. Yang,1 and Casey Brown1

Received 12 March 2013; revised 31 October 2013; accepted 1 November 2013; published 3 December 2013.

[1] Statistical models relating anthropogenic modifications of the watershed landscape toalterations in streamflow are often plagued by heterogeneity in background watershedconditions and coinciding trends in climate that can complicate the interpretation of clearstatistical relationships. This study introduces the use of panel regression as a modelingapproach that can better accommodate basin heterogeneity and identify more robust signalsbetween anthropogenic impacts on the watershed landscape and hydrologic response. Panelregression techniques pool multidimensional data recorded across individuals (i.e.,watersheds) and through time to better characterize within- and between-variability in thehydrologic data set. The separate attribution of variability to both time and space dimensionsenables the method to better identify response characteristics that are generalizable acrosswatersheds. This study introduces in detail two broad classes of panel regression models (thefixed effects and random effects models) that may be particularly useful in the context ofidentifying and quantifying the impact of human-induced landscape alterations on hydrologicresponse. The technique is presented in a case study relating watershed urbanization tochanges in the annual runoff coefficient for 19 watersheds in the Northeast United States.Comparisons are made against more standard cross-sectional and time-based regressionanalyses. The results show that the estimated relationship between urbanization and theannual runoff coefficient in this region is highly dependent on the dimensions of dataexamined (space or time) and model structure selected, with the most sophisticated andappropriate model suggesting that no significant relationship can be detected.

Citation: Steinschneider, S., Y.-C. E. Yang, and C. Brown (2013), Panel regression techniques for identifying impacts ofanthropogenic landscape change on hydrologic response, Water Resour. Res., 49, 7874–7886, doi:10.1002/2013WR013818.

1. Introduction

[2] There is a growing need to improve our understand-ing of and ability to predict the effects of human activitieson the hydrologic cycle [Vorosmarty et al., 2000; Wageneret al., 2010; Vogel, 2011]. One of the most influential ofthese human activities is land use/land cover changes at thewatershed scale [DeFries and Eshleman, 2004]. Despite awide array of past research showing various impacts oflandscape change on hydrologic response, significant chal-lenges still face a comprehensive understanding of theseimpacts. In particular, methods for identifying and predict-ing landscape change signals in hydrologic records areoften complicated by significant heterogeneity among indi-vidual watersheds and climate variability that can mask sig-nals over time. This study presents the technique of panel

regression as one approach that may account for some ofthese statistical challenges. The approach pools multidi-mensional data (i.e., data recorded through both space andtime) in a single regression framework that can distinguish‘‘between-variability’’ (associated with the space dimen-sion across basins) from ‘‘within-variability’’ (associatedwith the time dimension for each basin). This enables theregression to better identify robust relationships betweenhydrologic response and watershed characteristics as theyevolve over time across multiple catchments.

[3] It is well recognized that watershed landscapechanges can influence many characteristics of the hydro-logic response of a basin. In some instances, the hydrologicresponse to landscape change is well understood, such aswith peak flow responses to urbanization. Here, theory sug-gests that greater amounts of impervious cover, compactedsoils, storm drain networks, and river channelization serveto accelerate storm water through the stream channel,increasing the magnitude of flood peaks and quickeningflow recession afterward [Leopold, 1968]. This theory iswell supported with empirical data observed across manywatersheds [Rose and Peters, 2001; Cheng and Wang,2002; Burns et al., 2005; Konrad and Booth, 2005; Chang,2007], with few exceptions [Kjeldsen, 2010]. Other shiftsin hydrologic response due to land cover change are moredifficult to identify. Continuing with the example of urban-ization, different studies in the literature have found

Additional supporting information may be found in the online versionof this article.

1Department of Civil and Environmental Engineering, University ofMassachusetts Amherst, Amherst, Massachusetts, USA.

Corresponding author: S. Steinschneider, Department of Civil andEnvironmental Engineering, University of Massachusetts Amherst, 130Natural Resources Rd., Amherst, MA 01002, USA. ([email protected])

©2013. American Geophysical Union. All Rights Reserved.0043-1397/13/10.1002/2013WR013818

7874

WATER RESOURCES RESEARCH, VOL. 49, 7874–7886, doi:10.1002/2013WR013818, 2013

contradictory influences of urban land cover on annualflows, base flow, and extreme low-flow statistics. Whilesome work has suggested that increased urban land usedecreases these flow responses [Leopold, 1968; Rose andPeters, 2001; Chang, 2007; Simmons and Reynolds, 2007],other studies have found the opposite signal [Appleyardet al., 1999; Burns et al., 2005; Meyer, 2005]. Still othershave found no relationship between certain flow responsesand urbanization and attribute this lack of signal to offset-ting influences from various factors (e.g., decreased infiltra-tion versus reduced evapotranspiration) or insufficientimpacts [Ferguson and Suckling, 1990; Rose and Peters,2001; Burns et al., 2005]. Price [2011] presents a compre-hensive synthesis of these studies and others relating landuse change impacts to base flow hydrology.

[4] Many of the conflicting results can be in part attrib-uted to the difficulty of isolating a signal between land usechange and streamflow from background hydrologic noise.This noise can stem from natural or human-induced struc-tural differences between watersheds or trends in climatethat coincide with trends in land use [Jacobson, 2011; Price,2011; Price et al., 2011]. Streamflow responses betweeneven nearby catchments can differ significantly due to unob-servable (or poorly observed) factors such as subsurfacegeology, macroflow paths, highly variable soil composition,infrastructure systems, and water resource managementstrategies. Furthermore, trends in climate that coincide withtrends in land use can make it difficult to determine whichexogenous change is impacting hydrologic response. Thesetwo challenges plague the common statistical methods usedin the literature to examine land use change effects onstreamflow, of which there are two primary types.

[5] The first common statistical approach is to use cross-sectional regression analysis [Stedinger and Tasker, 1985;Burns et al., 2005; Price et al., 2011]. Here, streamflow sta-tistics are regressed against land use characteristics acrossmany watersheds for one segment of time over which theland use characteristics are relatively static. These studiescan make use of a large number of watersheds with widevariations in land use characteristics to compare differencesin streamflow behavior under different land cover regimes.Unfortunately, interpreting statistical land use-streamflowrelationships across watersheds can be severely complicatedby structural differences, such as those mentioned previ-ously, that may exist but for which no data are available.With substantial heterogeneity across watersheds that is dif-ficult to measure, cross-sectional regression analyses may beunable to identify significant signals between land use andhydrologic response, or alternatively they may mistakenlyattribute differences in hydrologic response between water-sheds to land use differences when other, unobservable fac-tors are in fact driving the discrepancies.

[6] The second approach often used is to employ timeseries analyses on individual watersheds that are known tohave undergone some level of land use change over theanalysis period [Rose and Peters, 2001; Meyer, 2005;Chang, 2007]. The analyst tries to determine whether landuse change trends in each watershed can be used to explainany trends in flow statistics for that watershed. Thisapproach has the benefit of controlling for static watershedcharacteristics, because the analysis is done on a single

watershed. However, climate variability, which often domi-nates streamflow, can mask the detection of any relation-ships between land use and flow [Chang, 2007]. By onlyexamining one watershed at a time, it is difficult to separateout the effects of trending climate and land use variables onstreamflow. This is the classic multicollinearity problem. Ina regression context, multicollinearity does not reduce thereliability of the fitted model, but it does reduce the abilityto isolate the effect of an independent variable (i.e., landuse/land cover) on the response variable (i.e., streamflow).

[7] This study suggests the use of panel regression to cir-cumvent some of the challenges faced by the two statisticalapproaches above. Panel regression is a statistical tech-nique that pools multidimensional data recorded acrossindividuals (i.e., watersheds) and through time to identifyresponse characteristics unique to each individual and thosecommon across individuals. The panel regression proce-dure essentially draws on the strengths of the two commonapproaches mentioned above and merges them into a singleregression framework. There are three primary benefits tothe panel regression approach for hydrologic studies,including (1) the provision of a regression framework inwhich all data across catchments and through time can beconsidered simultaneously, thus increasing the degrees offreedom in the model and improving the efficiency ofparameter estimation, (2) a means to handle unobservableheterogeneity between watersheds, and (3) the ability toexamine time series trends across several watersheds simul-taneously, which will help distinguish between land useand climate variability signals. Panel regression is widelyapplied for these purposes in econometric analyses [Brownet al., 2011], but has not been previously applied in hydro-logic assessments. The objectives of this study are to intro-duce the theory of panel regression in the context ofhydrologic applications and demonstrate how the benefitsof the method can help identify more robust and general-ized relationships between landscape alterations and hydro-logic response. The method is illustrated in a case studyexamining the influence of urbanization on the annual run-off coefficient for 19 watersheds in the Northeast U.S. overthe period from 1977 to 2007.

2. Panel Regression in Hydrologic Applications

2.1. Model Overview

[8] A panel regression model linking hydrologicresponse to land use and other predictors is described here.Only balanced panels (equal-record lengths at all sites) areconsidered, but methods for unbalanced panels are avail-able to handle omitted or missing data [Baltagi and Chang,1994]. Assume that streamflow data Qi,t is available at Nwatershed sites and T time steps, with i 51, 2, . . ., N andt 5 1, 2, . . ., T. Also let Xi,t 5 {x1

i,t, x2i,t, . . ., xK

i,t} be a setof K land use, climate, or other predictors for the ith siteand tth time step. The observed flow data Qi,t are assumedto be random variables, while the set of predictors X areassumed to be fixed in repeated sampling. Both Q and Xare assumed to be observed without measurement error. Amodel is desired that predicts streamflow response usingthe predictors X. The basic panel regression model consid-ered in this study is given by:

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7875

Qi;t 5 b0 1 li 1XK

k 5 1

bk 3 xki;t 1 ei;t (1)

where bk is an unknown response coefficient describing theinfluence of the kth predictor xk

i;t on flow Qi,t in the ithwatershed at time t, b0 is a mean intercept that applies toall watersheds, li is a watershed-specific term describingthe time-averaged differences in flow between watersheds(i.e., basin heterogeneity), and ei;t is a stochastic disturb-ance term with an expected value of zero, E½ei;t�5 0, andconstant variance, E½e2

i;t�5 r2e .

[9] In this model framework, the parameters bk and pre-dictors xk

i;t are used to account for variability in streamflowresponse within each watershed through time, i.e., thewithin-variability, as well as the variability between thetime-averaged streamflow responses across watersheds,i.e., the between-variability. However, the between-variability may not be well described by available predictorvariables because certain time-invariant characteristics ofeach watershed may be difficult to observe. In this situa-tion, the parameter estimates of the model can becomebiased. In a panel model, however, the watershed-specificterms li can be used to account for the differences in time-averaged streamflow responses across watersheds notaccounted for by the available predictors, removing param-eter estimation bias. This property is potentially verypowerful if there are omitted variables from the model thathave significant impacts on streamflow and lead to unob-servable heterogeneity between individual watershedresponses, a common situation with regional hydrologicregressions [Jacobson, 2011; Price et al., 2011]. One keyassumption implicit in this model formulation is that thestructural heterogeneity between watersheds, as quantifiedby li, does not change over time. Furthermore, we assumethat each of the K predictors has the same effect on stream-flow response for all watersheds. Finally, note that theinclusion of the watershed-specific terms li is only possiblebecause data in both the time and space domains are com-bined into a single data set; the estimation of these termswould be impossible in a standard cross-sectional regres-sion because the degrees of freedom available for estima-tion (N 2 (K 1 N)) would be negative.

[10] A further assumption needs to be placed on theterms li to define the panel model structure. In the trivialcase, we can restrict the model so that all li equal zero.The panel model then becomes a fully restricted pooledregression model with no heterogeneity between water-sheds. Equation (1) can then be written in vector form asQ ¼ Xb 1 e, with predictors stacked in the design matrixX across watersheds:

X 5

x11;1 � � � xK

1;1

� � � � �

x11;T � � � xK

1;T

� � � � �

x1N ;1 � � � xK

N ;1

� � � � �

x1N ;T � � � xK

N ;T

2666666666666664

3777777777777775

(2)

[11] Here, X is a matrix of dimension NT 3 K, b is a col-umn vector of length K, and Q and e are column vectors oflength NT. The design matrix can include a column of onesto account for an intercept term. Standard ordinary leastsquares (OLS) regression can be used to estimate themodel. We note that if only one time step of data is avail-able, this model becomes a cross-sectional regressionmodel. A Breusch-Pagan Lagrange multiplier test [Breuschand Pagan, 1980] can be used to test whether the residualsof this model are homogenous and this model structure isappropriate.

[12] If heterogeneity is found to exist between water-sheds, then an assumption of the characteristics of li isneeded. In one case, we assume that there are omitted vari-ables from our regression that are correlated with the pre-dictors we are including in the model. Under thisassumption, the terms li are treated as dummy predictorvariables in the regression and have to be estimated foreach watershed. This model is known as the fixed effects orcovariance model.

[13] If the model in equation (1) is designated as a fixedeffects model, then the design matrix X from equation (2)is augmented to include watershed-specific intercept termsas follows:

XFE 5

x11;1 � � � xK

1;1 1 � � � 0

� � � � � � � � � �

x11;T � � � xK

1;T 1 � � � 0

� � � � � � � � � �

x1N ;1 � � � xK

N ;1 0 � � � 1

� � � � � � � � � �

x1N ;T � � � xK

N ;T 0 � � � 1

2666666666666664

3777777777777775

(3)

[14] Here, the original predictors in the model composethe first K columns of the design matrix, followed by N col-umns of zeros and ones that specify a specific intercept foreach watershed. The vector of parameters bFE can then beaugmented to include intercept terms for each individual.OLS regression can be used to estimate the model.

[15] An alternative assumption to that of the fixed effectsmodel is that there are omitted variables from the regres-sion that are not correlated with our other predictors in thedesign matrix X and therefore can be incorporated into themodel through the stochastic component of the responsevariable. In this case, the terms li are considered randomvariables with mean E½li�5 0, variance E½l2

i �5 r2l, and

covariance E½lilj�5 0 for i 6¼ j. These random variables,which are eventually manifested in the covariance matrixof the error term, are by assumption not correlated with thepredictors xk

i;t or residuals ei;t in the model. This formula-tion is known as the random effects or error componentsmodel. The only parameters to be estimated here includethe variance associated with individual heterogeneity, r2

l,and the remaining variance associated with the model, r2

e .[16] If we assume the model in equation (1) exhibits ran-

dom effects, then we can lump the stochastic componentsli and ei;t into one error term ti;t. If we consider a vector ofthese errors Ui for the ith watershed only, the covariancematrix (of dimension T 3 T) for Ui is given as:

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7876

Xi 5var Uið Þ5var lij1eið Þ5E½ðlij1eiÞðlij1eiÞ0�5r2

ljj01r2

e IT

(4)

[17] Here, j denotes a vector of ones of length T, ei is avector of disturbances for the ith watershed of length T, thesymbol 0 denotes the transpose operation, and IT represents

the identity matrix of dimension T 3 T. The grand covari-ance matrix (of dimension NT 3 NT) for the entire regres-sion model across all N watersheds is then composed of ablock diagonal matrix with the covariance matrix for eachcatchment along the diagonal :

X 5

r2l 1 r2

e r2l � � � r2

l

r2l r2

l 1 r2e � � � r2

l

� � . ..

r2l r2

l � � � r2l 1 r2

e

266666666664

377777777775

0

. ..

0

r2l 1 r2

e r2l � � � r2

l

r2l r2

l 1 r2e � � � r2

l

� � . ..

r2l r2

l � � � r2l 1 r2

e

266666666664

377777777775

266666666666666666666666666666664

377777777777777777777777777777775

(5)

[18] Under this model formulation, the error term ishomoscedastic for each individual watershed. Furthermore,covariance terms for a given individual are identicalbetween disturbances across time (i.e., all off-diagonalterms equal r2

l). This time covariance term for each catch-ment accounts for a persistent shift of that catchment’sresiduals through time above or below the expected valueof zero, accounting for the heterogeneity betweenwatersheds.

[19] If estimates for r2e and r2

l are available, then esti-mated generalized least squares (GLS) regression can beused to solve for all model parameters :

b̂RE 5 X0X̂

21X

� �21X0X̂

21Y (6)

where X̂ is the estimated covariance matrix populated bythe estimates r̂2

e and r̂2l, and X is a design matrix of the

form presented in equation (2). The variance of the esti-mated parameters is given by:

varðb̂REÞ5 ðX0X̂

21XÞ21 (7)

[20] The elements of var(b̂RE) can be used to derivestandard errors for model parameters and perform signifi-cance testing. Methods for deriving the estimated parame-ters r̂2

e and r̂2l are given in the supporting information for

this article [Swamy and Arora, 1972].

2.2. Model Selection

[21] A choice needs to be made regarding whether afixed effects or random effects model is more appropriatefor the regional land use-hydrologic regression problem.This choice will be data set-specific. In certain instances,

the unobservable heterogeneity between watersheds willexhibit no correlation with the current predictors includedin the model. This can happen when unobserved (or poorlyobserved) subsurface or anthropogenic factors are ran-domly distributed across the sample of watersheds consid-ered. In this situation, a random effects model will likely bemore appropriate and is desirable because this modelrequires fewer parameters for estimation. Alternatively,unobservable subsurface or anthropogenic factors mayexhibit clustering across watersheds in similar patterns tothe predictor variables included in the model. For example,all watersheds in a very urban area (where urbanization is apredictor variable) may be located along a coastline with aspecific type of (unobservable) subsurface geology that isfundamentally different from nonurban watersheds locatedfarther inland. In this case, the heterogeneity betweenwatersheds will likely be correlated with the variables inthe design matrix. If a random effects model is fitted underthese circumstances, parameter estimates will be inconsis-tent because the error term will be correlated with compo-nents of the design matrix X. A fixed effects model,however, will still produce consistent estimators and there-fore should be used here. A Hausman test [Hausman, 1978]can be used to test for correlations between the unobserv-able heterogeneity and current predictors in the model todetermine which of these two model formulations is moreappropriate for the data set under consideration.

[22] We note that when applicable, a random effectsmodel is preferable over fixed effects approaches. Besidesrequiring fewer (and therefore more precise) parameters, arandom effects model can also accommodate time-invariant predictors (e.g., basin geology, soil composition,),an option not possible with fixed effects models. In thefixed effects formulation, the influence of time-invariant

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7877

predictors on flow will be absorbed by the dummy varia-bles. This issue is circumvented in the random effectsmodel because heterogeneity across watersheds is embed-ded in the error covariance matrix rather than the designmatrix. In addition, parameter estimates in the fixed effectsmodel can be imprecise if there is insufficient variability inthe predictors of the model through time, a problem notencountered by the random effects model. Despite the ben-efits of the random effects model, however, it is often use-ful to compare the results of both the fixed and randomeffects models to ensure they are consistent, especially ifthe results of the Hausman test do not strongly suggest onemodel over the other.

2.3. Accounting for Serially and Spatially CorrelatedResiduals

[23] The fixed and random effects models as presentedabove are not designed to account for serially or spatiallycorrelated residuals. This does not affect the consistency ofestimated regression parameters, but they will no longer beefficient. In the presence of positively correlated residuals,standard errors on model parameters will be under-predicted and inference on those parameter estimates willbe invalid. Previous work has shown that models predictinghydrologic response to exogenous factors often exhibitresiduals correlated across watersheds [Stedinger andTasker, 1985; Tasker and Stedinger, 1989] or through time[Schoups and Vrugt, 2010].

[24] There are several statistical tests available to detectthe presence of spatial or temporal dependence in panelmodel errors. Bhargava et al. [1982] extended the classicDurbin-Watson statistic [Durbin and Watson, 1950] to testfor an AR(1) process in the residuals of a fixed effectsmodel. Baltagi and Li [1995] proposed several tests toidentify serially correlated errors with AR(1) or MA(1)structures in both fixed and random effects models. ALagrange multiplier (LM) test [Breusch and Pagan, 1980]and a cross-sectional dependence (CD) test [Pesaran,2004] were proposed to detect spatial correlations in theerrors across individuals. The CD test can consider a spatialweights matrix that specifies the spatial connectionsbetween individuals in the panel. Recently, Baltagi et al.[2007] proposed a joint test for identifying serial and spa-tial correlations simultaneously in a random effects model.All of these tests can be used to determine whether theerror structures of standard fixed and random effects mod-els are sufficient in their original form.

[25] If the models need to be adjusted to accommodatecorrelations in the error term, alternative estimation proce-dure for both fixed and random effects models have beenproposed. For the case of serially correlated residuals,Lillard and Willis [1978] presents an adjustment to the ran-dom effects model, while Bhargava et al. [1982] provides acorresponding treatment for the fixed effects model. In bothof these studies serial correlations are accounted for byaugmenting the error covariance matrix with off-diagonalterms quantifying the effects of serial dependence and thenestimating the model parameters using generalized leastsquares procedures. Usually, only a single serial correlationcoefficient common to all individuals is considered. Thereare also procedures to adjust both fixed and random effectsmodels for spatially correlated residuals [Anselin, 1988;

Baltagi et al., 2003; Elhorst, 2003], but this issue is com-plicated by the spatial nature of the problem. That is, spa-tial regression models usually rely on an additional weightsmatrix that specifies the spatial dependence among thecross-section units. The estimation of the random effectsmodel with a spatial error structure has also been found tobe rather complicated [Elhorst, 2010]. Therefore, in thecase study presented below, we test for spatial correlationsin the residuals (as well as for serial correlations), but wedo not adjust the estimation procedure if spatial correla-tions are identified.

3. Watershed Urbanization and the AnnualRunoff Coefficient in the Northeast United States

3.1. Case Study Overview

[26] To demonstrate the utility of panel regression meth-ods in estimating robust relationships between streamflowand anthropogenic land use change, a case study is pre-sented examining the influence of urbanization on theannual runoff coefficient for 19 watersheds in the NortheastU.S. taken from the U.S. Geological Survey (USGS)GAGES II database [Falcone et al., 2010]. The annual run-off coefficient, defined as the ratio of cumulative stream-flow to cumulative precipitation for a hydrologic year (1October to 30 September), is affected by the amount ofwater lost to evapotranspiration from the watershed sur-face, which in turn is influenced by the availability of bothwater and energy throughout the year. The annual runoffcoefficient can also be influenced by anthropogenic factors,such as urbanization. In this study, we hypothesize thatincreased urbanization will increase the runoff volumesfrom individual storms, thus increasing the annual runoffcoefficient.

[27] To test this hypothesis, a simple panel regressionmodel is considered that relates the annual runoff coeffi-cient to annual precipitation, annual potential evapotranspi-ration (PET), and a measure of urbanized land cover:

Ci;t 5Qi;t

Pi;t5 b0 1 li 1 b1 3 Ui;t 1 b2 3 PETi;t 1 b3 3 Pi;t 1 ei;t

(8)

[28] Here, Ci,t is the annual runoff coefficient occurringin the ith watershed and tth year, Qi,t is the cumulativeannual streamflow (normalized to mm) for that watershedand year, Pi,t is the cumulative precipitation, PETi,t is thecumulative annual potential evapotranspiration, and Ui,t is ameasure of urban land cover. The two climate variables Pi,t

and PETi,t are used to quantify how the availability of waterand energy influence evapotranspiration and subsequentlyannual runoff. By including these two exogenous factorsthe model can be used to test whether urbanization influen-ces the runoff coefficient after climatic influences havebeen accounted for. In this model, 31 years (t 5 1977, . . . ,2007) of data across 19 watersheds (i 5 1, 2, . . .,19) wereused. Here, the year reported for each water year (WY) isassociated with the last 9 months of that water year (i.e.,WY 2000 spans October 1999 to September 2000).

[29] The model presented in equation (8) is relativelysimple; we recognize that a large set of other predictors

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7878

besides precipitation, PET, and urbanized area could likelybe used to explain the annual runoff coefficient. However,the simplicity of the model is chosen by design to highlighthow panel regressions can isolate heterogeneity betweenwatersheds caused by omitted predictors. Also, in order toprovide a clear comparison between the random and fixedeffects models, we omitted any time-invariant predictors(e.g., permeability, soil compositions) because the fixedeffects model does not permit such predictors. However,we do compare the estimated fixed effects (the fitted lifrom the fixed effects model) to some pertinent basin char-

acteristics to determine if the heterogeneity between water-sheds can be explained with additional predictors.

3.2. Data

[30] The final 19 watersheds used in this analysis (Figure1 and Table 1) were screened among hundreds available inthe GAGES II database to meet the following criteria: allwatersheds (1) had continuous, daily streamflow datarecords spanning from 1 October 1976 through 30 Septem-ber 2007, (2) were located in the southern New Englandportion of the Northeast United States, and (3) had meas-ures of urbanized area (data described later) that monotoni-cally increased across the period of record. The lastcriterion was imposed because certain watersheds showeddecreasing urban land cover earlier in the record, a patternthat seemed highly suspect given their proximity to devel-oping population centers.

[31] To calculate annual runoff coefficients for each ofthe 19 watersheds, daily streamflow data were gatheredfrom USGS gages at the outlet of each watershed for theperiod of 1 October 1976 through 30 September 2007. Run-off coefficients were calculated using cumulative stream-flow and cumulative precipitation for each hydrologic year.Cumulative precipitation and PET were calculated fromdaily data over the same time frame gathered from thegridded data set presented in Maurer et al. [2002]. Thegridded data has a 1/8 degree resolution and is interpolatedfrom station data using the SYMAP algorithm [Widmannand Bretherton, 2000] and scaled using the PRISM method[Daly et al., 1994]. Individual grid cells were assigned toeach watershed if they overlapped with any portion of thewatershed boundary. If multiple grid cells were assigned toa watershed then their average was taken and used as theclimate time series for that basin. PET was calculated usingthe Hargreaves method [Hargreaves and Samani, 1982].Estimates of solar radiation used as input to this calculationwere developed using the methods presented in Allen et al.[1998].

1

2

34

56

7

8

91011

12

131415

1617

18

19

Figure 1. Map of USGS streamflow gages used in theanalysis. The identification numbers correspond to thosefound in Table 1.

Table 1. Summary of USGS Gages Used in the Analysisa

IDGage

Number Latitude Longitude StateDrainage

Area (km2)Urban Coverin 1977 (%)

Urban Coverin 2006 (%)

Urban CoverChange (D%,1977–2006)

AverageAnnual

Precipitation(mm)

AverageAnnual

Temperature(�C)

1 01073500 43.103 270.953 NH 471.1 4.3 9.0 4.7 1161.2 8.12 01094500 42.502 271.723 MA 279.7 21.3 26.8 5.5 1231.1 8.33 01096000 42.634 271.658 MA 173.1 4.3 8.8 4.5 1211.2 8.44 01096500 42.668 271.575 MA 1125.7 12.6 18.6 6 1232.9 8.65 01097000 42.432 271.450 MA 299.3 15.7 33.4 17.7 1217.7 9.26 01097300 42.513 271.404 MA 30.9 7.0 31.7 24.7 1176.2 9.37 01102000 42.660 270.894 MA 316.4 22.2 37.7 15.5 1200.1 9.78 01105500 42.155 271.146 MA 60.7 41.0 55.6 14.6 1259.8 9.89 01105870 41.991 270.734 MA 55.1 10.5 22.5 12 1320.3 10.310 01109000 41.948 271.177 MA 112.7 15.4 30.7 15.3 1256.2 9.811 01109060 41.866 271.123 MA 220.1 18.1 34.8 16.7 1254.7 9.812 01111500 41.996 271.563 RI 237.2 5.5 11.8 6.3 1283.1 9.213 01162000 42.684 272.083 MA 212.6 3.4 8.0 4.6 1201.5 7.014 01163200 42.588 272.041 MA 88.4 18.5 23.4 4.9 1200.4 7.115 01166500 42.598 272.438 MA 965.5 5.9 9.7 3.8 1197.9 7.116 01172500 42.425 272.025 MA 142.7 1.6 6.4 4.8 1246.6 7.517 01173000 42.391 272.060 MA 249.0 1.1 6.0 4.9 1226.5 7.418 01189000 41.673 272.901 CT 116.4 28.3 38.9 10.6 1346.5 9.119 01196500 41.450 272.841 CT 285.7 36.2 56.0 19.8 1322.2 10.2

aAnnual precipitation and temperature averages were taken over water years between 1977 and 2007. Latitude and longitude are in the North AmericanDatum of 1983.

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7879

[32] Measures of urbanized land cover were available forfour different calendar years (1977, 1992, 2001, and 2006).These data were developed from two different USGS datasets. The 1977 land use data was collected from theEnhanced Historical Land-Use and Land-Cover Data Setsof the USGS. It was digitized from USGS 1:250,000 and1:100,000 scale-maps from previous published databetween 1970 and 1980 [Price et al., 2007]. The percentageof watershed area covered by the land use type ‘‘Urban orBuilt-up Land’’ was used to quantify urbanization for thistime period. The remaining years (1992, 2001, and 2006)of land use data were collected from the USGS NationalLand Cover Data (NLCD) set [Vogelmann et al., 2001;Homer et al., 2004; Fry et al., 2011]. The spatial resolutionof these data is 30 m. The percentage of watershed areacovered by the land use type ‘‘Developed’’ was used tomeasure urbanization. To ensure congruency between thetwo data sets, we inspected the subcategory classifications(e.g., residential, commercial, industrial, transportation) ofboth the ‘‘Urban or Built-up Land’’ and ‘‘Developed’’ cate-gories of the two data sets and made sure they were consist-ent with one another. We also visually inspected the landuse data for each watershed in a geographic informationsystem (GIS) to confirm that the information between thetwo data sets was consistent.

[33] A data extension procedure was used to augment theurbanization data set to be of the same length as the climateand streamflow data. First, the four calendar years of urban-ization data (1977, 1992, 2001, and 2006) were assigned awater year (1976–1977, 1992–1993, 2001–2002, and2006–2007). Linear interpolation was used to estimateurbanization for each watershed for all years between thefour water years of actual land use data available. This pro-duced a continuous urbanization data set for each water-shed between WY 1977 and 2007. Linear interpolation wasdeemed an acceptable method for estimating urbanizationgrowth between available years of data because the data,by selection, was monotonically increasing, and the trendacross the available years appears roughly linear.

[34] Finally, various physiographic basin characteristicsfrom the GAGES II database were collected for each of the19 watersheds to compare against the heterogeneitybetween watersheds estimated by the fixed effects model.We assume these characteristics have remained staticthroughout the 1977–2007 period. The catchment charac-teristics are listed in Table 3.

3.3. Modeling Experiments

[35] Five models are considered in this exercise toemphasize the benefits afforded by panel regression techni-ques. First, a cross-sectional regression for WY 2007 isconducted. Here, t 5 2007 in equation (8) and no heteroge-neity is considered (i.e., all li are set to zero). In the secondmodeling experiment, a time series regression is consideredfor a single watershed chosen at random (Gage ID#:01109000 in Massachusetts). This model examines timetrends in the dependent and independent variables for a par-ticular catchment, but no variability is considered acrossthe watersheds. These two models will demonstrate thepotential pitfalls of statistical models that only focus on asingle dimension of data (space or time).

[36] In the third model, a standard, fully restricted pooledregression is performed. Here, the panel model in equation(8) is used to consider data across space and time simulta-neously, but all li are set to zero. The design matrix is con-structed according to equation (2), and OLS is used toestimate the model parameters. By comparing this model tothe space- and time-only models, this experiment will showhow regression results can differ (and improve) by simulta-neously considering data in both dimensions. It will alsoact as a baseline against which to compare the fixed andrandom effects models, which account for heterogeneitybetween watersheds. The fixed effects model is estimatedin the fourth experiment using the augmented design matrixpresented in equation (3) and OLS regression. Here, theliare estimated as basin-specific intercept terms designatedby the columns of 0’s and 1’s in equation (3), enabling anestimate of the heterogeneity between catchments. Finally,a random effects model is fitted using the covariance matrixgiven in equation (5) and GLS estimation. This model willalso account for basin heterogeneity, but does so by consid-ering each li as a normally distributed random variablewith zero mean. Only a single parameter r̂2

l

� �must be

estimated to quantify basin heterogeneity in this model.The fixed and random effects models will show how panelregression models can be used to isolate heterogeneitybetween watersheds poorly explained by other predictorsand therefore estimate more robust relationships betweenthe included predictors and the runoff coefficients.

4. Results

4.1. Space- and Time-Only Models

[37] Cross-sectional and time-based regressions are con-sidered first to demonstrate the types of conclusions thatcan be drawn by examining the data in a single dimension.Figures 2a–2c show the between-variability across all 19watersheds in WY 2007 for the relationship between theannual runoff coefficient and associated measures of cumu-lative annual precipitation, cumulative annual potentialevapotranspiration, and urbanization. The cross-sectionalregression is fit to this data. Figures 2d–2f show the within-variability in these relationships for a single gage (ID#01109000) across all years (1977–2007). This data is asso-ciated with the time-based regression. Table 2 shows theresults of these regressions. Several interesting resultsemerge from Figure 2 and Table 2. When examined in across-section, the data suggest that there is no relationshipbetween precipitation and the runoff coefficient, nor isthere a significant relationship with PET. The p-valuesassociated with these variables are 0.47 and 0.49, respec-tively. Conversely, a cross-sectional perspective suggeststhat there is a significant, positive relationship between therunoff coefficient and urbanization, with the regressionreturning a p-value less than 0.01. This result initially sup-ports the original hypothesis that more urbanization leadsto large amounts of runoff per unit of precipitation.

[38] However, an alternative view is presented whenconsidering a single watershed in the time domain. Here, apositive relationship between precipitation and the runoffcoefficient becomes apparent and is statistically significantat the 0.01 confidence level. When compared against theresults of the cross-sectional analysis, this result suggests

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7880

that the annual runoff coefficient does not systematicallyvary with the total precipitation across catchments, butrather responds to deviations from average precipitation ina particular catchment. A slight negative relationshipbetween the runoff coefficient and PET also emerges, butthis relationship is not statistically significant (p-value of0.50). While this lack of significance is somewhat surpris-ing, it likely stems from the limited amount of data avail-able for a single watershed, as will be seen later. Finally,no significant relationship is seen between urbanization andthe runoff coefficient, suggesting that, for this watershed,

runoff as a proportion of precipitation does not changewhen urbanization increases over time. This result acts asevidence against our original hypothesis.

[39] The two perspectives of the data provided through aspatial and temporal lens suggest different conclusionsabout how precipitation, PET, and urbanization influencethe annual runoff coefficient. When spatial and temporalvariability are examined separately and then compared, it isdifficult for the analyst to confidently draw conclusionsabout the relationships under investigation using these sta-tistical analyses.

−0.2 0.0 0.2 0.4 0.6

0.0

0.5

1.0

1.5

Bet

wee

n V

aria

bilit

y (Y

ear

2007

)R

unof

f Coe

ffici

ent

Precipitation

(a)

−0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

Run

off C

oeffi

cien

t

Potential Evapotranspiration

(b)

−1

0.0

0.5

1.0

1.5

Run

off C

oeffi

cien

t

Urbanization

(c)

−2 −1

0 1 2 3

0 1 2

−2.

0−

1.0

0.0

1.0

With

in V

aria

bilit

y (I

D#0

1109

000)

Run

off C

oeffi

cien

t

Precipitation

(d)

−1 0 1 2

−2.

0−

1.0

0.0

1.0

Run

off C

oeffi

cien

t

Potential Evapotranspiration

(e)

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

−2.

0−

1.0

0.0

1.0

Run

off C

oeffi

cien

t

Urbanization

(f)

Figure 2. (a–c) Between and (d–f) within variation relationships between the annual runoff coefficientand annual cumulative precipitation, annual cumulative potential evapotranspiration, and urbanization.The between-variability relationships are shown across all 19 watersheds in WY 2007. The within-variation relationships are shown for a single watershed (ID# 01109000) across all years. Best fit linesare also shown for all plots.

Table 2. Regression Model Results for the Five Modeling Experimentsa

Model Independent VariableParameterEstimate

StandardError t-Value p-Value

Cross-sectional regression (WY 2007) Urbanization 0.383 0.103 3.729 <0.01Potential evapotranspiration 20.176 0.248 20.708 0.49Precipitation 0.333 0.454 0.734 0.47

Time-regression (Gage ID# 01109000) Urbanization 20.144 0.198 20.727 0.47Potential evapotranspiration 20.123 0.180 20.682 0.50Precipitation 0.438 0.152 2.891 <0.01

Fully restricted pooled model Urbanization 0.146 0.038 3.812 <0.01Potential evapotranspiration 21.392 0.040 23.487 <0.01Precipitation 0.321 0.040 8.055 <0.01

Random effects Urbanization 0.046 0.074 0.627 0.531Potential evapotranspiration 20.152 0.040 23.850 <0.01Precipitation 0.335 0.037 8.967 <0.01

Fixed effects Urbanization 20.065 0.101 20.639 0.52Potential evapotranspiration 20.158 0.040 23.955 <0.01Precipitation 0.337 0.037 9.001 <0.01

aSignificant predictors at the 0.05 confidence level are shown in bold.

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7881

4.2. Fully Restricted Pooled Regression Model

[40] Some of this ambiguity is reduced when examiningthe variability of the data simultaneously in both the spaceand time dimensions. The variability across all 19 gagesand 31 hydrologic years is shown in Figures 3a–3d. Notethat while the annual runoff coefficient, precipitation, andPET all exhibit natural variability through time, measuresof urban land cover show no variability besides that of amonotonic increase (an artifact of the linear interpolationprocedure). The fully restricted pooled regression modelcan account for this variability in both the time and spacedimensions by stacking the data for each watershed in asingle design matrix. The regression results from thispooled model suggest that precipitation has a highly signifi-cant, positive impact on annual runoff coefficients (Table

2). Also, PET now exhibits a significant, negative relation-ship with the runoff coefficient, a result that makes physicalsense but was not detected in either the cross-sectional ortime-based regressions. Furthermore, the estimated rela-tionships are more precise because there is more data avail-able to the pooled regression model. Standard errors are anorder of magnitude smaller than those obtained from thecross-sectional and time-based regressions. Finally, thefully restricted pooled model finds a statistically significant,positive relationship between the runoff coefficient andurbanization, with a p-value less than 0.01. Initially, itappears that when all of the available data is consideredsimultaneously, the original hypothesis of a positive rela-tionship between urbanization and the annual runoff coeffi-cient is confirmed.

Table 3. The Relationship Between Physiographic Watershed Characteristics and Heterogeneity Between Watershedsa

GAGES II Variable Name Description of Variable from GAGES II Database p-Value R2 Value

DRAIN_SQKM Watershed drainage area, sq km 0.21 0.089STREAMS_KM_SQ_KM Stream Density, km of stream per watershed sq km 0.37 0.048BFI_AVE Base flow index 0.82 0.003ELEV_MEAN_M_BASIN Mean watershed elevation (meters) 0.79 0.004SLOPE_PCT Mean watershed slope, percent 0.72 0.008ASPECT_DEGREES Mean watershed aspect 0.46 0.032AWCAVE Average range of available water capacity for the soil layer

(in. of water/in. of depth)0.75 0.006

PERMAVE Average permeability (in./h) 0.54 0.022

aR2 values and p-values are shown from linear regressions between the basin characteristics and the li fitted in the fixed effects model.

1976 1983 1990 1997 2004

0.0.

Run

off C

oeffi

cien

t

Years

(a)

1976 1983 1990 1997 2004

800

1200

1600

Pre

cipi

tatio

n (m

m)

Years

(b)

1976 1983 1990 1997 2004

900

950

1000

1050

Pot

entia

l Eva

potr

ansp

iratio

n (m

m)

Years

(c)

1976 1983 1990 1997 2004

20.

46

0.8

0.0

0.1

0.2

0.3

0.4

0.5

Urb

aniz

atio

n

Years

(d)

Figure 3. Boxplots of the (a) annual runoff coefficient, (b) annual cumulative precipitation, (c) annualcumulative potential evapotranspiration, and (d) urbanization across all 19 watersheds for each hydro-logic year. Lowess curves are also fitted through the median values of the variables through time.

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7882

[41] However, despite the improved (i.e., more accurateand precise) regression results of the pooled model over thespace- and time-only models, there is marked heterogeneityin the pooled regression residuals that casts doubt on themodel results. Figure 4 shows the distribution of residualsfrom the fully restricted, pooled regression model across alltime periods for each of the watersheds. The figure showsthat many of the residuals for individual watersheds are notcentered about zero. This suggests that there is significantbetween-variability in the residuals that is not beingaccounted for by the predictors in the current regression

model. The Breusch-Pagan Lagrange multiplier test[Breusch and Pagan, 1980] confirms this, reporting ap-value less than 0.01. Therefore, a fully restricted pooledregression panel model, which cannot account for unob-served heterogeneity, is inappropriate for this data.

4.3. Fixed and Random Effects Regression Models

[42] The fixed and random effects approaches are imple-mented to separate out unobservable heterogeneity acrossthe watersheds and then identify robust relationshipsbetween runoff coefficients and predictors common to allwatersheds. The Hausman test suggests that a random effectsmodel is appropriate (p-value equal to 0.27), although thisp-value is not high enough to strongly suggest one modelover the other. The fixed and random effects models separateout heterogeneity in the data set by distinguishing thewithin-variability relationships from the between-variabilityrelationships. This is show in Figure 5. The between-grouprelationships are shown in Figures 5a–5c as scatter plots oftime-averaged runoff coefficients against time-averaged pre-dictors for each of the 19 watersheds. To show within-grouprelationships (Figures 5d–5f), deviations of the runoff coeffi-cient about its time-average for each watershed are plottedagainst deviations of each predictor about its time-average.The 31 data points for each of the 19 watersheds are pooledtogether in Figures 5d–5f.

[43] Figures 5a and 5b suggest that there is little relation-ship between time-averaged runoff coefficients and time-averaged precipitation or PET, but Figures 5d and 5e indi-cate that some of the within-variability in runoff coeffi-cients can be described by within-variations in these two

11 13 15 17 19

−2

−1

1 3 5 7 9

01

23

Gage

Res

idua

ls

Figure 4. Boxplots of residuals from the fully restricted,pooled regression model across all time steps for the 19gages. Each boxplot shows the T 5 31 residuals associatedwith a particular watershed.

−0.4 −0.2 0.0 0.2 0.4

−1.

0−

0.5

0.0

0.5

Bet

wee

n V

aria

bilit

yT

ime−

Ave

rage

d R

unof

f Coe

ffici

ent

Time−Averaged Precipitation

(a)

−0.5 0.0 0.5

−1.

0−

0.5

0.0

0.5

Tim

e−A

vera

ged

Run

off C

oeffi

cien

t

Time−Averaged PET

(b)

−1.0 0.0 0.5 1.0 1.5 2.0

−1.

0−

0.5

0.0

0.5

Tim

e−A

vera

ged

Run

off C

oeffi

cien

t

Time−Averaged Urbanization

(c)

−2 −1

−2

−1

With

in V

aria

bilit

yR

unof

f Coe

ffici

ent D

evia

tions

Precipitation Deviations

(d)

−2 −1

−2

−1

Run

off C

oeffi

cien

t Dev

iatio

ns

PET Deviations

(e)

−0.5 0.0 0.5 1.0

−2

−1

0 1 2 3 4

01

23

0 1 2

01

23

01

23

Run

off C

oeffi

cien

t Dev

iatio

ns

Urbanization Deviations

(f)

Figure 5. (a–c) Between and (d–f) within variation relationships between the annual runoff coefficientand annual cumulative precipitation, annual cumulative potential evapotranspiration, and urbanization.Between-group relationships are shown as scatter plots of time-averaged runoff against time-averagedpredictors for each of the 19 watersheds. Within-group relationships are shown as scatter plots of devia-tions of the runoff coefficient about its time-average for each watershed against deviations of each pre-dictors about its time-average. The data is pooled across all 19 watersheds. Best fit lines are also shownfor all plots.

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7883

climate variables. It is important to note that these figuresshow the raw relationships between the runoff coefficientand the explanatory variables, rather than the isolated effectof a predictor on the runoff coefficient after the influencesof the other variables have been subtracted out. Visually,this can obscure the relationship with additional noise.However, the regression results of both the fixed and ran-dom effects models support the conclusion that the runoffcoefficient is positively related to precipitation and nega-tively related to PET (Table 2). Both of these results arestatistically significant at the 0.01 confidence level andagree with the results of the fully restricted pooled regres-sion model. Also, similar to the fully restricted pooledmodel, the fixed and random effects models have substan-tially smaller standard errors than the time- and space-onlyregressions, although these standard errors are marginallylarger for the fixed effects model because of the additionalbasin-specific intercept terms that require estimation.

[44] While all of the panel models detect consistent sig-nals between runoff coefficients, precipitation, and PET,the agreement between the fully restricted pooled modeland the fixed and random effects models does not hold forthe relationship between runoff coefficients and urbaniza-tion. Unlike the fully restricted pooled model, both thefixed and random effects models do not find a statisticallysignificant relationship between these two variables. Figure5c shows that catchments exhibiting high urbanizationthrough all time steps tend to have higher runoff coeffi-cients on average, but catchments that exhibit changes intheir urbanized area through time do not tend to exhibit aconsistent pattern of change in their runoff coefficientthrough time (Figure 5f). The fixed and random effectsmodels detect the lack of within-variation signal and distin-guish it from the between-variation signal. In fact, the slopeparameters of a fixed effects model are equivalent to thosethat would result from a regression on the within-variabilityrelationships (Figures 5d–5f), while the parameters of arandom effects model are equal to a weighted average ofregression coefficients fit to the within-variability relation-ships and those fit to the between-variability relationships(Figures 5a–5c) [Kmenta, 1986] (see supporting informa-tion, for more detail). Because the data in the time domainstrongly suggest that changes in urbanization do not influ-ence the runoff coefficient, the fixed and random effectsmodels determine that the between-variability in the runoffcoefficient more likely stems from unobservable factor(s)(quantified by the li terms) than from urbanization effects.The fully restricted pooled regression model, which doesnot include the basin-specific li terms, cannot make thisdistinction and assigns a significant coefficient to urbaniza-tion in the regression. Therefore, when we consider all ofthe available data simultaneously and select a model struc-ture that can account for unobservable heterogeneitybetween watersheds, we must reject our hypothesis thaturbanization increases the annual runoff coefficient inNortheastern watersheds.

5. Discussion

[45] There are a few issues regarding the fixed and ran-dom effects models that require discussion. First, we notethat the urbanization changes (and the overall urbanization

levels) of the catchments under examination are notextremely large (see Table 1). The response of streamflowcharacteristics to urbanization have been shown to followthreshold behavior [Beighley and Moglen, 2002], so it ispossible that the lack of a statistically significant relation-ship between the annual runoff coefficient and urbanizationmay reverse if urbanization levels increase above somethreshold.

[46] Second, we recognize that many static basin charac-teristics were left out of equation (8) and might be thesource of heterogeneity in the regression models. To testthis, the fitted li values from the fixed effects model wereregressed separately again eight different basin characteris-tics across the 19 watersheds to see if the heterogeneitycould be explained with other predictors. If the heterogene-ity did arise from one or more of these basin attributes,theliterms should relate well to the relevant catchmentdescriptors. The R2 and p-values for each of the eightregressions are shown in Table 3. None of these regressionswere significant, with the largest R2 value (<0.09) andsmallest p-value (0.21) associated with the regression withdrainage area. This suggests that the heterogeneity in thedata set detected and extracted by the panel regressionmodels is difficult to explain with popular and availablebasin characteristics, supporting the use of fixed and ran-dom effects models in this application.

[47] Also, the fitted fixed and random effects modelsused in this application implicitly assumed no serial or spa-tial correlations in the model residuals. We tested for serialand spatial correlations using the tests proposed in Baltagiand Li [1995] and Breusch and Pagan [1980], respectively.Both tests were conducted on the random effects modelsince it was considered more appropriate using the Haus-man test. While no serial correlations were detected in themodel residuals (p-value of 0.99), positive spatial correla-tions were detected at the 0.01 significance level. This sug-gests that the standard errors on model parameters areoverly precise in the fixed and random effects models.While this result does not change the primary conclusionthat urbanization and the annual runoff coefficient do notexhibit a significant relationship, it does suggest that furtherwork is needed extending panel models to include spatialcorrelative structures in hydrologic applications.

[48] Finally, one issue not addressed in this paper is thepotential utility of panel models to better accommodatemulticollinearity in model predictors. In standard regres-sion models, correlations between predictors make it diffi-cult to identify how the predictand responds to unit changesin specific independent variables. In the context of hydro-logic regressions, this situation may arise when both humaninfluences (e.g., urbanization) and climate influences (e.g.,temperature, precipitation) both exhibit time trends. Whilepanel regression models are not immune to multicollinear-ity, they do have advantages when trying to parse theeffects of two correlated predictors. For instance, panelregression models have the ability to examine time seriestrends across several watersheds simultaneously, whichhelps them distinguish between land use and climate vari-ability signals. If runoff and climate trends are seen in allwatersheds but urbanization trends are only seen in some,then the panel model will likely be able to identify this pat-tern and assign a more significant response coefficient to

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7884

the climate variables. The issue of multicollinearity did notarise in the data used in this study and therefore was notinvestigated further, but additional research is needed totest this benefit of panel models in hydrologic applications.

6. Conclusion

[49] Standard statistical methods for identifying relation-ships between human-induced alterations to the watershedlandscape and hydrologic response are plagued by signifi-cant noise in hydrologic data that complicates the detectionof clear statistical relationships. There is a pressing need todevelop more sophisticated statistical models that can cir-cumvent some of these challenges. A modeling approachthat can help toward this end was presented in this paper.Panel regression techniques are designed to make fuller useof information embedded in multidimensional data sets thatspan both space and time and separate out the between-group and within-group variability in order to converge onmore robust statistical relationships. The approach providesa mechanism to synthesize time series data available atmany watersheds into a consistent regression frameworkthat can be used for both signal identification andprediction.

[50] The objectives of this study were to introduce thetheory of panel regression in the context of hydrologicapplications and demonstrate how the benefits of theregression can lead to significantly different and often morerobust results than those presented by standard regressiontechniques. These objectives were demonstrated in a casestudy relating the annual runoff coefficient to climate andurbanization predictors. The results showed that:

[51] 1. The fully restricted pooled regression model (thesimplest panel model considered) was able to leverage allavailable data and identify stronger relationships betweenthe annual runoff coefficient and both climate and urban-ization predictors than either the time- or space-onlyregression models.

[52] 2. Despite the improved performance of the fullyrestricted pooled model, the assumption of homogeneityimplicit in that model structure was not justified, requiringthe use of more sophisticated panel models.

[53] 3. The fixed and random effects models were betterable to handle the heterogeneity in the data set by distin-guishing the between-variability from the within-variabilityand formally testing whether the predictors or unobservablefactors were more likely the cause of variations in the run-off coefficients.

[54] 4. In this case study, the oft hypothesized relation-ship between urbanization and runoff was not support bythe most rigorous (i.e., fixed and random effects) models.This was in contrast to the standard, pooled regressionmodel, and the cross-sectional regression, which both iden-tified a significant relationship.

[55] 5. The heterogeneity identified by the more sophisti-cated panel models did not relate well to a set of popularand widely available basin characteristics, supporting theidea that the heterogeneity is difficult to observe andrequires the use of fixed or random effects.

[56] 6. Despite the more robust results of fixed and ran-dom effects models, spatial correlations in the residuals

reduce the accuracy of the standard errors associated withthose models.

[57] The availability of distributed hydrologic and landuse data taken at regular increments continues to grow asearth monitoring systems and data collection efforts aremade more available to the broader scientific community.Panel regression techniques present a consistent statisticalframework that can make use of this data in a novel way tohelp analyze land use change impacts on hydrologicresponse and sift through the noise often present in envi-ronmental data. The approach is widely used in economet-ric studies as a baseline standard, yet is grosslyunderutilized in hydrologic applications. We believe thatthese statistical methods can be very useful as a compli-ment to physically based studies examining anthropogenicimpacts at the watershed scale. Beyond land use changeanalyses, the approach is likely useful in any hydrologicregression context where streamflow responses are beingrelated to predictor variables in a setting of rich data setsbut uncontrollable environmental conditions, a commoncondition in the hydrologic sciences.

[58] Acknowledgments. We thank three anonymous reviewers fortheir thoughtful criticisms and advice that helped to significantly improvethis article. The work of the authors was partially supported by theNational Science Foundation grant number CBET-1054762 and theDepartment of Defense Strategic Environmental Research and Develop-ment Program (SERDP) project RC-2204.

ReferencesAllen, R., L. Pereira, D. Raes, and M. Smith (1998), Crop evapotranspira-

tion guidelines for computing crop water requirements, FAO Irrig.Drain. Pap. 56, Food and Agric. Organ., Rome.

Anselin, L. (1988), Spatial Econometrics: Methods and Models, KluwerAcad., Dordrecht.

Appleyard, S., W. Davidson, and D. Commander (1999), The effects ofurban development on the utilisation of groundwater resources in Perth,Western Australia, Int. Contrib. Hydrogeol., 21, 97–104.

Baltagi, B. H., and Y. J. Chang (1994), Incomplete panels: A comparativestudy of alternative estimators for the unbalanced one-way error compo-nent regression model, J. Econometrics, 62(2), 67–89.

Baltagi, B. H., and Q. Li (1995), Testing AR(1) against MA(1) disturbancesin an error component model, J. Econometrics, 68, 133–151.

Baltagi, B. H., S. H. Song, and W. Koh (2003), Testing panel data regres-sion models with spatial error correlation, J. Econometrics, 117, 123–150.

Baltagi, B. H., S. H. Song, B. C. Jung, and W. Koh (2007), Testing for serialcorrelation, spatial autocorrelation and random effects using panel data,J. Econometrics, 140, 5–51.

Beighley, R. E., and G. E. Moglen (2002), Trend assessment in rainfall run-off behavior in urbanizing watersheds, J. Hydrol. Eng., 7(1), 27–34.

Bhargava, A., L. Franzini, and W. Narendranathan (1982), Serial correla-tion and the fixed effects model, Rev. Econ. Stud., 49(4), 533–549.

Breusch, T. S., and A. R. Pagan (1980), The Lagrange multiplier test and itsapplications to model specification in econometrics, Rev. Econ. Stud.,47(1), 239–253.

Brown, C., R. Meeks, K. Hunu, and W. Yu (2011), Hydroclimatic risk toeconomic growth in sub-Saharan Africa, Clim. Change, 106(4), 621–647.

Burns, D., T. Vitvar, J. McDonnell, J. Hassett, J. Duncan, and C. Kendall(2005), Effects of suburban development on runoff generation in the Cro-ton River basin, New York, USA, J. Hydrol., 311(1), 266–281.

Chang, H. (2007), Comparative streamflow characteristics in urbanizingbasins in the Portland Metropolitan Area, Oregon, USA, Hydrol. Proc-esses, 21(2), 211–222.

Cheng, S., and R. Wang (2002), An approach for evaluating the hydrologi-cal effects of urbanization and its application, Hydrol. Processes, 16(7),1403–1418.

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7885

Daly, C., R. P. Neilson, and D. L. Phillips (1994), A statistical–topographicmodel for mapping climatological precipitation over mountainous ter-rain, J. Appl. Meteorol., 33, 140–158.

DeFries, R., and K. Eshleman (2004), Land-use change and hydrologicprocesses: A major focus for the future, Hydrol. Processes, 18(11),2183–2186.

Durbin, J., and G. S. Watson (1950), Testing for serial correlation in leastsquares regression I, Biometrika, 37, 409–428.

Elhorst, J. (2003), Specification and estimation of spatial panel data models,Int. Reg. Sci. Rev., 26(3), 244–268.

Elhorst, J. (2010), Spatial panel data models, in Handbook of Applied Spa-tial Analysis, edited by M. Fischer and A. Getis, pp. 377–407, Springer,Berlin.

Falcone, J. A., D. M. Carlisle, D. M. Wolock, and M. R. Meador (2010),GAGES: A stream gage database for evaluating natural and altered flowconditions in the conterminous United States, Ecology, 91(2), 621.

Ferguson, B. K., and P. W. Suckling (1990), Changing rainfall runoff rela-tionships in the urbanizing Peachtree Creek watershed, Atlanta, Georgia,Water Resour. Bull., 26(2), 313–322.

Fry, J., G. Xian, S. Jin, J. Dewitz, C. Homer, L. Yang, C. Barnes, N. Herold,and J. Wickham (2011), Completion of the 2006 National Land CoverDatabase for the conterminous United States, PE&RS, 77(9), 858–864.

Hargreaves, G. H., and Z. A. Samani (1982), Estimating potential evapo-transpiration, J. Irrig. Drain. Eng., 108(3), 225–230.

Hausman, J. A. (1978), Specification tests in econometrics, Econometrica,46(6), 1251–1271.

Homer, C., C. Huang, L. Yang, B. Wylie, and M. Coan, (2004), Develop-ment of a 2001 National Landcover Database for the United States, Pho-togramm. Eng. Remote Sens., 70(7), 829–840.

Jacobson, C. R. (2011), Identification and quantification of the hydrologicalimpacts of imperviousness in urban catchments: A review, J. Environ.Manage., 92(6), 1438–1448.

Kjeldsen, T. R. (2010), Modelling the impact of urbanization on flood fre-quency relationships in the UK, Hydrol. Res., 41(5), 391–405.

Kmenta, J. (1986), Elements of Econometrics, 2nd ed., Macmillan, NewYork.

Konrad, C. P., and D. B. Booth (2005), Hydrologic changes in urbanstreams and their ecological significance, in Effects of Urbanization onStream Ecosystems, vol. 47, Am. Fish. Soc. Symp., edited by L. R. Brown,R. H. Gray, R. M. Hughes, and M. R. Meador, pp. 157–177, Am. Fish.Soc., Bethesda, Md.

Lillard, L. A., and R. J. Willis (1978), Dynamic aspects of earning mobility,Econometrica, 46(5), 985–1012.

Leopold, L. B. (1968), Hydrology for Urban Land Planning: A Guidebookon the Hydrologic Effects of Urban Land Use, U.S. Dep. of the Inter.,U.S. Geol. Surv., Reston, Va.

Maurer, E., A. Wood, J. Adam, D. Lettenmaier, and B. Nijssen (2002), Along-term hydrologically based dataset of land surface fluxes and statesfor the conterminous United States, J. Clim., 15(22), 3237–3251.

Meyer, S. C. (2005), Analysis of base flow trends in urban streams, north-eastern Illinois, USA, Hydrogeol. J., 13(5), 871–885.

Pesaran, M. H. (2004), General diagnostic tests for cross section depend-ence in panels, Working Pap. in Econ. 435, University of Cambridge,Cambridge, U. K.

Price, K. (2011), Effects of watershed topography, soils, land use, and cli-mate on baseflow hydrology in humid regions: A review, Prog. Phys.Geogr., 35(4), 465–492.

Price, C. V., N. Nakagaki, K. J. Hitt, and R. M. Clawges (2007), Enhancedhistorical land-use and land-cover data sets of the U.S. GeologicalSurvey, U.S. Geol. Surv. Digital Data Ser. 240. [Available at http://pub-s.usgs.gov/ds/2006/240/.]

Price, K., C. R. Jackson, A. J. Parker, T. Reitan, J. Dowd, and M. Cyterski(2011), Effects of watershed land use and geomorphology on stream lowflows during severe drought conditions in the southern Blue RidgeMountains, Georgia and North Carolina, United States, Water Resour.Res., 47, W02516, doi:10.1029/2010WR009340.

Rose, S., and N. E. Peters (2001), Effects of urbanization on streamflow inthe Atlanta area (Georgia, USA): A comparative hydrological approach,Hydrol. Processes, 15(8), 1441–1457.

Schoups, G., and J. A. Vrugt (2010), A formal likelihood function forparameter and predictive inference of hydrologic models with correlated,heteroscedastic, and non-Gaussian errors, Water Resour. Res., 46,W10531, doi:10.1029/2009WR008933.

Simmons, D. L., and R. J. Reynolds (2007), Effects of urbanization on baseflow of selected south shore streams, Long Island, New York, J. Am.Water Resour. Assoc., 18(5), 797–805.

Strong, et al., (2011), Urban carbon dioxide cycles within the Salt LakeValley: A multiple-box model validated by observations, J. Geophys.Res., 116, D15307, doi:10.1029/2011JD015693.

Stedinger, J. R., and G. D. Tasker (1985), Regional hydrologic analysis. 1:Ordinary, weighted, and generalized least squares compared, WaterResour. Res., 21(9), 1421–1432.

Swamy, P., and S. S. Arora (1972), The exact finite sample properties of theestimators of coefficients in the error components regression models,Econometrica, 40, 261–275.

Tasker, G. D., and J. R. Stedinger (1989), An operational GLS model forhydrologic regression, J. Hydrol., 111(1), 361–375.

Vogel, R. M. (2011), Hydromorphology, J. Water Resour. Plann. Manage.,137(2), 147–149.

Vogelmann, J. E., S. M. Howard, L. Yang, C. R. Larson, B. K. Wylie, andJ. N. Van Driel (2001), Completion of the 1990’s National Land CoverData Set for the conterminous United States, Photogramm. Eng. RemoteSens., 67, 650–662.

Vörösmarty, C. J., P. Green, J. Salisbury, and R. B. Lammers (2000),Global water resources: Vulnerability from climate change and popula-tion growth, Science, 289(5477), 284–288.

Wagener, T., M. Sivapalan, P. A. Troch, B. L. McGlynn, C. J. Harman, H.V. Gupta, P. Kumar, P. S. C. Rao, N. B. Basu, and J. S. Wilson (2010),The future of hydrology: An evolving science for a changing world,Water Resour. Res., 46, W05301, doi:10.1029/2009WR008906.

Widmann, M., and C. S. Bretherton (2000), Validation of mesoscale precip-itation in the NCEP reanalysis using a new gridcell dataset for the north-western United States, J. Clim., 13, 1936–1950.

STEINSCHNEIDER ET AL.: PANEL REGRESSION TECHNIQUES IN HYDROLOGY ALTERATION STUDIES

7886


Recommended