+ All Categories
Home > Documents > Climatic and physical factors that influence the homogeneity of regional floods in southeastern...

Climatic and physical factors that influence the homogeneity of regional floods in southeastern...

Date post: 23-Feb-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
13
WATER RESOURCES RESEARCH, VOL. 34, NO. 12, PAGES 3369-3381, DECEMBER 1998 Climatic and physical factors that influence the homogeneity of regional floods in southeastern Australia Bryson C. Bates Cooperative ResearchCentre for CatchmentHydrology and CSIRO Land and Water, Wembley, Western Australia, Australia Ataur Rahman, Russell G. Mein, and P. Erwin Weinmann Cooperative ResearchCentre for CatchmentHydrology,Monash University,Clayton, Victoria, Australia Abstract. The identification of hydrologically "homogeneous" regions is a primary step in many regionalflood frequency techniques. Although statistical procedures for identifying homogeneous regionsaboundin the literature, there has been relatively little effort directedtoward the identification of the physical and climaticbasincharacteristics that cause similarityin standardized flood peak distributions. We use regionalflood peak and basin characteristics data for 104 basins in southeastern Australia to investigate the physical and climaticfactorsassociated with the statistical distributions of flood peaks; the degreeof geographical coherence of homogeneous regions; and the extent to which informationon flood peak statistics is encapsulated by the physical and climatic characteristics of basins that are commonlyused in regionalizationstudies. Our results show that (1) membership of a particularhomogeneous region is determined largelyby a combination of climaticcharacteristics that is related to runoff generation; (2) supergroups consisting of sites within aggregated homogeneous regionsthat have reasonably similar flood responses show somedegreeof spatialcoherence; and (3) the basincharacteristics that influence regionalhomogeneity can be identifieddespite the use of indices that smooth spatialand temporalvariability. 1. Introduction Regional flood frequencyanalysis is an important method for the estimation of flood peaks with specified probabilities of exceedanceat ungauged sites or enhancing estimation at gauged sites where historical records are short. It is a means of transferring flood frequencyinformation from gaugedbasins to anothersite on the basis of similarity in basincharacteristics. Many proceduresrely on the concept of a "homogeneous" region within which the regional flood peaks have a common probability distribution with similarparameter values. The im- portance of grouping sites into statistically (and presumed physically) homogeneous regions has been established by Kuc- zeta [1983],Lettenmaier and Potter [1985], Wiltshire [1986a,b, c], Lettenmaier et al. [1987], and Lu and Stedinger [1992]. The questions of what constitutes homogeneity in terms of flood and basin characteristics and how it can be best defined are problematic and havebeen the subject of statistically based research [e.g.,Mosley, 1981; Acremanand Sinclair,1986; Wilt- shire, 1986a, b, c; Burn, 1988, 1989, 1990a, b; Nathan and McMahon, 1990; Zrinji and Burn, 1994, 1996;Ribeiro-Correa et al., 1995]. Most methods for judging the homogeneity of can- didate regions consider the between-basin variabilityof one or more statistical propertiesof annual maximum flood series. These include the conventional moment or L moment equiv- alents of the coefficientof variation [e.g., Wiltshire, 1986a, b; Hoskingand Wallis, 1993, 1997; Fill and Stedinger, 1995] and other statistics [e.g.,Bhaskar and O'Connor, 1989; Chowdhury Copyright 1998 by the American Geophysical Union. Paper number 98WR02521. 0043-1397/98/98 WR-02521 $09.00 et al., 1991]or of standardized flood quantiles [e.g.,Dalrymple, 1960;Lu and Stedinger, 1992]. Many of these procedures are based on the assumption that annual maximum flood series follow a particular probabilitydistribution. Although statistical procedures for assessing the degree of regionalhomogeneity aboundin the literature, there hasbeen little effort directed toward the identification of the physical and climatic characteristics of basins that lead to similarity in regionalflood frequency [Potter, 1987;Guptaet al., 1994; Bobee and Rasmussen, 1995]. To address this need, we describe an empiricalstudy of the relationship between flood statistics and basincharacteristics in southeastern Australia. The study pro- vides an opportunity to investigate flood frequency in a large geographical region that has climatic regimes ranging from subalpine to semiarid and terrain that varies from rugged mountain ridgesto flat riverine plains. 2. Methods 2.1. Overview We examined annual maximum flood series and basin char- acteristics data for 104 gaugedbasins in southeastern Austra- lia. The locations of the gauged sites are illustrated in Figure 1. The basins are not subject to extensive urbanizationor artifi- cial storage, and there were no major land use changes over their periods of record. Drainage areas range from 3 to 956 km2: thefirst, second, and third quartiles are128, 308, and 509 km 2, respectively. Thusthe range of drainage areas for the majority of the basins is small. We identifiedcandidate homogenous regions on the basis of similarityin the statistics of standardized annual flood series. 3369
Transcript

WATER RESOURCES RESEARCH, VOL. 34, NO. 12, PAGES 3369-3381, DECEMBER 1998

Climatic and physical factors that influence the homogeneity of regional floods in southeastern Australia

Bryson C. Bates Cooperative Research Centre for Catchment Hydrology and CSIRO Land and Water, Wembley, Western Australia, Australia

Ataur Rahman, Russell G. Mein, and P. Erwin Weinmann Cooperative Research Centre for Catchment Hydrology, Monash University, Clayton, Victoria, Australia

Abstract. The identification of hydrologically "homogeneous" regions is a primary step in many regional flood frequency techniques. Although statistical procedures for identifying homogeneous regions abound in the literature, there has been relatively little effort directed toward the identification of the physical and climatic basin characteristics that cause similarity in standardized flood peak distributions. We use regional flood peak and basin characteristics data for 104 basins in southeastern Australia to investigate the physical and climatic factors associated with the statistical distributions of flood peaks; the degree of geographical coherence of homogeneous regions; and the extent to which information on flood peak statistics is encapsulated by the physical and climatic characteristics of basins that are commonly used in regionalization studies. Our results show that (1) membership of a particular homogeneous region is determined largely by a combination of climatic characteristics that is related to runoff generation; (2) supergroups consisting of sites within aggregated homogeneous regions that have reasonably similar flood responses show some degree of spatial coherence; and (3) the basin characteristics that influence regional homogeneity can be identified despite the use of indices that smooth spatial and temporal variability.

1. Introduction

Regional flood frequency analysis is an important method for the estimation of flood peaks with specified probabilities of exceedance at ungauged sites or enhancing estimation at gauged sites where historical records are short. It is a means of transferring flood frequency information from gauged basins to another site on the basis of similarity in basin characteristics. Many procedures rely on the concept of a "homogeneous" region within which the regional flood peaks have a common probability distribution with similar parameter values. The im- portance of grouping sites into statistically (and presumed physically) homogeneous regions has been established by Kuc- zeta [1983], Lettenmaier and Potter [1985], Wiltshire [1986a, b, c], Lettenmaier et al. [1987], and Lu and Stedinger [1992].

The questions of what constitutes homogeneity in terms of flood and basin characteristics and how it can be best defined

are problematic and have been the subject of statistically based research [e.g., Mosley, 1981; Acreman and Sinclair, 1986; Wilt- shire, 1986a, b, c; Burn, 1988, 1989, 1990a, b; Nathan and McMahon, 1990; Zrinji and Burn, 1994, 1996; Ribeiro-Correa et al., 1995]. Most methods for judging the homogeneity of can- didate regions consider the between-basin variability of one or more statistical properties of annual maximum flood series. These include the conventional moment or L moment equiv- alents of the coefficient of variation [e.g., Wiltshire, 1986a, b; Hosking and Wallis, 1993, 1997; Fill and Stedinger, 1995] and other statistics [e.g., Bhaskar and O'Connor, 1989; Chowdhury

Copyright 1998 by the American Geophysical Union.

Paper number 98WR02521. 0043-1397/98/98 WR-02521 $09.00

et al., 1991] or of standardized flood quantiles [e.g., Dalrymple, 1960; Lu and Stedinger, 1992]. Many of these procedures are based on the assumption that annual maximum flood series follow a particular probability distribution.

Although statistical procedures for assessing the degree of regional homogeneity abound in the literature, there has been little effort directed toward the identification of the physical and climatic characteristics of basins that lead to similarity in regional flood frequency [Potter, 1987; Gupta et al., 1994; Bobee and Rasmussen, 1995]. To address this need, we describe an empirical study of the relationship between flood statistics and basin characteristics in southeastern Australia. The study pro- vides an opportunity to investigate flood frequency in a large geographical region that has climatic regimes ranging from subalpine to semiarid and terrain that varies from rugged mountain ridges to flat riverine plains.

2. Methods

2.1. Overview

We examined annual maximum flood series and basin char-

acteristics data for 104 gauged basins in southeastern Austra- lia. The locations of the gauged sites are illustrated in Figure 1. The basins are not subject to extensive urbanization or artifi- cial storage, and there were no major land use changes over their periods of record. Drainage areas range from 3 to 956 km2: the first, second, and third quartiles are 128, 308, and 509 km 2, respectively. Thus the range of drainage areas for the majority of the basins is small.

We identified candidate homogenous regions on the basis of similarity in the statistics of standardized annual flood series.

3369

3370 BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS

34S

36S

38S - • Otway Ra. N

i i i o 1 oo 200 km

Strzelecki Ra. Over 1000 rn

500 - 1000 rn

200 - 500 rn

0 - 200 rn

I • I • I • I • I 142E 144E 146E 148E 150E

Figure 1. Location of 104 gauged sites in southeastern Australia. Numerals denote group membership. The letter T denotes gauged sites used for split-sample validation. (Schematic topographic map after Duncan [1982] and Wright [1989]. Copyright by Australian Bureau of Meteorology; reprinted with permission).

This involved an ad hoc partition of the sample L moment space. We used L moments because they are more robust than conventional moments to extremes in the data and yield pa- rameter estimates that are more efficient than maximum like-

lihood estimates [Hosking, 1990]. We used principal compo- nent analysis and cluster analysis to reveal any "natural groupings" in the basin characteristics variables and to assess the degree of agreement between these groups and the candi- date regions; canonical variates (discriminant) analysis to high- light the differences existing between the mean basin charac- teristics of the candidate regions and to reveal the combination of characteristics that makes the best discrimination; canonical correlation analysis to examine the degree of linear association between flood and basin characteristics and to identify the combinations of basin characteristics that contribute most to

this association; and tree-based modeling to assess the extent to which membership of the candidate regions is determined by differences in linear combinations of basin characteristics.

2.2. Description of Study Area

2.2.1. Topography. The study area comprises the south- eastern corner of the Australian mainland, between latitudes 33 ø and 39 ø south and longitudes 141 ø and 150 ø east, and encompasses all of the summits in Australia that exceed 1650 m (Figure 1). The area is divided into two principal drainage divisions (Murray-Darling and South-East Coast) by a series of mountain ridges of considerably varying relief

known as the Great Dividing Range. The Divide acts as an important climatic divide. Streams to the north and west of the Divide form part of the Murray-Darling Basin, and the south- ern streams flow southwards to the South Pacific and Southern

Oceans. The Australian Alps include the Snowy Mountains and adjacent parts of the Divide to the southwest. Snow is usually confined to elevations above 600 m, but there is no permanent snowpack. Three isolated features, the Grampians on the far western edge of the Divide, the Otway Ranges, and the Strzelecki Ranges to the south, have a marked influence on local rainfall despite their moderate elevations (500-1200 m, 500-700 m, and 500-700 m, respectively). There are extensive plains in the west and northwest.

2.2.2. Climatology. The weather of the study region is driven largely by the west-to-east passage of extratropical high- pressure systems [Duncan, 1982]. The tracks of these systems cross the continent from latitudes 30 ø to 35 ø south during May to October. Strong cold fronts and associated depressions from the Southern Ocean penetrate the area during this time of year. These bring winter-spring rainfalls to the western parts and to those areas with a westerly aspect, and heavy rainfalls and snowfalls at higher elevations in the east. Flood events are usually associated with wet basins in late winter to early spring (P.M. Fleming, personal communication, 1996).

From November to April the tracks of the high pressure systems cross from latitudes 35 ø to 40 ø south. These bring dry

BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS 3371

air from the interior but tropical air masses, sometimes with significant precipitable water, can penetrate southwards to the rear of the high-pressure systems. This is a source of summer thunderstorms and rainfall to areas north of the Divide. On the

east coast, moist subtropical air masses can be brought on shore by low-pressure systems in the Tasman Sea, generating orographic rainfall on the uplands to the north. Thus the rain- fall in this area tends to be relatively uniform throughout the year.

Mean annual rainfall for the study basins varies from 440 to 1420 mm: the first and third quartiles are 690 and 1060 mm. Basins with mean annual rainfalls exceeding 1000 mm are located to the southwest of the Grampians, to the south and southwest of the Otway Ranges, to the southwest and south- east of the Strzelecki Ranges, around the part of the Australian Alps that lies south of the Murray River, to the north of the Snowy Mountains, and to the south of the Kybeyan Range (Figure 1). The mean number of rain days per year varies from 212 in the vicinity of the Otways to 48 between the Snowy Mountains and the Kybeyan Range. Basins with the highest number of mean annual rain days are concentrated largely around the coast to the south and west of the Otways and the hinterland, and areas to the immediate west and southeast of Melbourne. Mean annual Class A pan evaporation varies from 1000 mm in the Australian Alps to 1700 mm on the east coast. Basins associated with the Otway and Strzelecki Ranges stand out as isolated areas with relatively low evaporation (about 1200 mm).

2.2.3. Flood and basin characteristics data. The length of the annual flood series for the study basins varies between 24 and 59 years. Sixty basins have record lengths ranging from 24 to 33 years: 12 basins have record lengths between 43 and 59 years. We assessed the stability of the sample estimates of the L moment ratios analogous to the coefficients of variation (L CV) and of skewness (L CS) for the 10 sites with the largest record lengths by computing the estimates at different record lengths ranging from 15 years to the length of record. In this case the maximum percentage deviation in the sample esti- mates was 15 % or less for each site when the record length was 25 years or more. For the special case of the generalized extreme value distribution, estimates of the standard errors of L CV and L CS can be obtained from Chowdury et al. [1991, Table 1]. If this distribution was applicable to the data, the estimated standard errors would be much greater than 15%.

We used p = 12 basin characteristics that are potentially relevant and readily obtainable: mean annual rainfall (RAIN, mm); mean annual rain days (RDAYS); mean annual Class A pan evaporation (EVAP, mm); rainfall intensity of 12-hour duration and 2-year average recurrence interval (112:2, 3.3-8.7 mm h-•); drainage area (AREA, km2); lemniscate shape (SHAPE, 0.19-1.69), a measure of the rotundity of a basin; slope of the central 75% of the mainstream (SLOPE, 0.8-55.3 m km-•); river bed elevation at the gauging station (ELEV, 20-1075 m); maximum elevation difference in the basin (RE- LIEF, 83-1720 m); stream density (SDEN, 0.46-2.93 km km-2); fraction of basin covered by medium to dense forest (FOREST, 0-1.0); and fraction quaternary sediment area (QSA, 0-0.99). QSA is an indicator of the extent of alluvial deposits and has been found to be a useful indicator of flood- plain extent in the study area. RAIN, RDAYS, EVAP, and I12:2 were defined at the basin centroid. We used 112:2 be-

cause it was found to be highly correlated with rainfall inten-

sities with rarer exceedances (e.g., cor (I12:2, I12:50) = 0.8) and it is more reliably estimated.

2.3. Approach

2.3.1. Identification of candidate homogeneous regions. Hosking and Wallis [1993] note that a potential problem with distribution-based tests for homogeneity is that when the hy- pothesis of homogeneity is rejected, the question of whether the region is heterogeneous or whether it is homogeneous but has some other probability distribution remains doubtful. A similar problem exists with tests based on a particular flood quantile in that the at-site distributions may exhibit sizeable and systematic departures at lower annual exceedance proba- bilities [Benson, 1962].

We used the discordancy, heterogeneity, and goodness-of-fit measures of Hosking and Wallis [1993] to screen the standard- ized annual flood series data and judge the degree of homo- geneity of candidate regions. This facilitated assessment of the assumption that the floods within a candidate region have a common parent distribution (or "growth curve"), or more spe- cifically, whether the distributional parameters related to the three L moment ratios analogous to the coefficients of varia- tion, skewness, and kurtosis (L CV, L CS, and L kurtosis) are constant within the region. The discordancy measure (Di) is used to indicate those basins with L moment ratios that differ

markedly from a group as whole. The heterogeneity measure (H) is based on the spread of L CV values among the sites within a candidate region. Hosking and Wallis [1993] declare a basin to be "unusual" ifD i _> 3 and a region to be "acceptably homogeneous" if H < 1, "possibly heterogeneous" if 1 -< H < 2, and "definitely heterogeneous" if H -> 2. They suggest that these limits should only be treated as useful guides. The H statistic is based on samples generated from a four-parameter kappa distribution that is fit to the group average L moments. The kappa distribution includes as special cases the general- ized logistic, generalized extreme value and generalized Pareto distributions. Thus it is less restrictive than homogeneity tests based on two- or three-parameter distributions. Hosking and Wallis [1993] constructed two other heterogeneity measures, one based on L CV and L CS and the other on L CS and L

kurtosis, but found that they had insufficient power to discrim- inate between homogeneous and heterogeneous regions. We used these subsidiary measures for confirmatory purposes only.

The Monte Carlo experiments of Hosking and Wallis [1988] suggest that there is little improvement in the accuracy of flood quantile estimates to be gained by using regions comprising more than about 20 sites. Thus we set a tentative upper limit on the number of candidate regions for the 104 basins at 5. Wilt- shire [1986c], Burn [1989], and Bhaskar and O'Connor [1989] used flood statistics as clustering variables. However, we used a manual approach for subdividing the L moment space be- cause of the lack of a natural distance metric and clustering method for L moment ratios.

2.3.2. Split-sample validation. To provide for an inde- pendent test of the inferences made herein, 10 basins (desig- nated T1 to T10) were selected using a random number gen- erator and put aside. The test basins are well distributed with respect to geographical location (Figure 1) but not with respect to candidate regions (section 3.3). SIX of the remaining basins were ultimately discarded leaving n = 88 basins for the initial analysis (section 3.1).

3372 BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS

2.4. Multivariate Statistical Analysis

Inferences in multivariate statistics generally require the as- sumption of multivariate normality. In this study, variables that exhibited departures from normality were transformed prior to analysis. Thus the square root transformation was applied to AREA and RELIEF, the log transformation to EVAP and SLOPE, the generalized logistic transformation to FOREST and QSA, and simple power transformations to ELEV, 112:2, and SHAPE. Transformed variables are hereafter prefixed by "T." Although univariate normality does not guarantee multi- variate normality, it is usually a reasonable assumption in prac- tice [Jobson, 1992]. Bivariate scatter plots indicated that the transformations captured the few nonlinear relationships that were apparent among the variables. As the original and trans- formed variables are not commensurable in terms of magni- tude or variability, we followed Morrison [1990, p. 386] and standardized the variables to zero mean and unit variance prior to analysis. Although this can dilute the differences between groups with respect to the variables that are the best discrim- inators, we could not claim any a priori knowledge of the relative contributions of the variables to regional homogeneity.

2.4.1. Principal component analysis. Principal compo- nent analysis transforms an original set of variables to a new set of mutually uncorrelated variables (principal components) that are arranged in decreasing order of importance. The first prin- cipal component is the linear combination of the original vari- ables that captures as much of the variation in the original data as possible. The second component captures the maximum variability that is uncorrelated with the first component, and so on. The values of the principal components for each observa- tion are called principal component scores. Score plots can be useful for identifying groups of observations.

2.4.2. Cluster analysis. The aim of cluster analysis is to allocate objects to a set of mutually exclusive and exhaustive groups such that group members are similar to each other and dissimilar to members of other groups. Almost all methods require a measure of similarity or dissimilarity between objects. We used three hierarchical agglomeration methods: Ward's method, average linkage between groups, and average linkage within groups [Norusis, 1989; Kaufman and Rousseeuw, 1990]. Let the (n x p) matrix of standardized basin characteristics (X) be partitioned so that X r = (x•, ..., x,•) where x)" - (Xi•, '", Xip) and superscript T denotes the transpose of a vector or matrix. The dissimilarity measures used were the squared Euclidean distance from basin r to basin s,

ars = (Xr- Xs)(Xr-

and the cosine metric defined by

cos Ors = (XSXs)/(XSXr)(XsXs) (2)

where Ors is the angle between x r and x s. The 10 test basins were used to assess the veracity of the clusters found.

Several workers have advocated the weighting of basin at- tributes in distance metrics according to their perceived im- portance in defining similarity [e.g., Nathan and McMahon, 1990; Zrinji and Burn, 1994]. We chose not to estimate weights on the basis of subjective judgment, or the results from step- wise regression analyses involving possibly heterogeneous sam- ples, or by iteratively refining weights to improve the similarity of the standardized flood peak statistics associated with each cluster.

Although the set of clustering techniques used herein is not

exhaustive, little would be gained by trying all of the available methods and dissimilarity measures. Our choice incorporates the agglomeration methods and distance metrics that Nathan and McMahon [1990] found to be the most successful for re- gionalization in southeastern Australia, and the distance met- ric used by Acreman and Sinclair [1986] and Zrinji and Burn [1994, 1996]. Nevertheless, we recognize that (1) there is no completely satisfactory way of defining a cluster (i.e., there is no obvious distance metric or agglomeration method for the regional flood frequency problem); (2) there is no formal mea- sure of an "optimal" number of clusters; (3) the application of different clustering methods to the same data set often pro- duce structures that are substantially different; and (4) clus- tering methods implicitly impose a structure on the data even if it is not possible to classify objects in a useful way [Chatfield and Collins, 1980; Nathan and McMahon, 1990; Everitt, 1993].

2.4.3. Canonical variates analysis. Although principal component analysis and cluster analysis enable searches for any evidence of grouping among the n basins, their starting point is equivalent to a null hypothesis of no structure in the basin characteristics data. Canonical variates analysis looks for linear combinations Yi = arx/ that maximize the ratio of the between-groups variance and the within-groups variance. The maximum dimensionality for a canonical variate representa- tion is s = min (p, # - 1) where # denotes the number of presumed groups. The discriminant coefficients a•, a2, .-' , a s give the directions in the p-dimensional data space along which the between-group variability relative to within-group variabil- ity is first, second,..., sth greatest and are mutually orthogo- nal. The Y•. = (afx•, afx2,'", afx,•), (j = 1,-.., s), are termed the first, second,..., sth canonical variates (or canon- ical discriminant functions or linear discriminants). The signif- icance of the canonical variates was tested using a likelihood ratio test, based on the assumption that the distribution of X is multivariate normal [Krzanowski, 1988; Jobson, 1992].

2.4.4. Canonical correlation analysis. Canonical correla- tion analysis focuses on the correlations between linear com- binations of the q response (L moment) variables (Z) and linear combinations of thep explanatory (basin characteristics) variables (W). The pair of linear combinations having the high- est correlation (Z• and W•) are determined first. The next pair to be considered (Z2 and W2) has the largest correlation among all pairs that are uncorrelated with the first pair, and so on. The pairs of linear combinations are called canonical vari- ables, and their correlations are called canonical correlations. For the data at hand the number of canonical variables and

correlations is min (p, q) = q = 3. Thus the analysis attempts to condense a high-dimensional relationship between two sets of variables into a few pairs of canonical variables. The signif- icance of the canonical correlations was tested using a likeli- hood ratio test, on the basis of the assumption that the distri- butions of the response and explanatory variables are multivariate normal [Kshirsagar, 1972; Johnson and Wichern, 1988; Jobson, 1992].

2.4.5. Tree-based modeling. Tree-based modeling is an exploratory technique for uncovering structure in data consist- ing of a set of classification or predictor variables and a single numeric or factor response variable, and for expressing knowl- edge and assisting decision making [Clark and Pregibon, 1992; Venables and Ripley, 1994]. We used tree-based models to uncover structure in canonical variate and canonical variable

plots. The procedures use binary recursive partitioning succes- sively to divide the data space, splitting it along the coordinate

BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS 3373

>

i I I

0 0.3 0.6

l CS

Figure 2. L moment ratio diagram (L CV versus L CS) for 104 gauged basins. Numerals denote group membership for 94 basins. The letter T denotes 10 gauged sites used for split-sample validation. Broken lines denote boundaries of candidate "homogeneous" groups.

axes of the predictor variables to give increasingly homoge- neous groups and hence the maximal separation of the re- sponse variable until it is infeasible to continue. For a contin- uous predictor variable x i the splits are of the form xi < t versus x• -> t. If the response variable is a factor (such as group number), the partitioning leads to a set of decision or classifi- cation rules in the form of a binary tree. The top node is called the "root," and a terminal node a "leaf." The measure of node heterogeneity is the "deviance." Let ni denote the number of cases assigned to the i th node of the tree and k the number of classes. The deviance for the tree is defined as

= t)i (3) i

where

Dri = -2 • Ilik log Pik (4) k

in which the unknown probabilities are estimated from the node proportions (pi/• = ni/•/ni). (A perfectly homogeneous node has zero deviance.) The tree construction process chooses the next split by taking the maximum reduction in deviance over all allowable splits of every leaf. It continues until the leaves of the tree are sufficiently homogeneous (i.e., their deviances are a small fraction of root node deviance) or contain too few observations. The stopping criterion and its threshold value can be set by the user. An important issue is the extent to which a fitted tree-based model can be "pruned" or "shrunken" without sacrificing goodness of fit. This involves a trade-off between a parsimonious model and accurate pre- diction. We determined an appropriate tree size through cross validation [Clark and Pregibon, 1992].

3. Results

3.1. Identification of Candidate "Homogeneous" Regions

Figure 2 shows the L CV versus L CS plot for the 104 basins and reports the results obtained during the identification pro- cess. The H statistic indicates a very high degree of heteroge- neity when the 94 sites to be used in the multivariate analyses are treated as a single region. Investigation of the six basins with the lowest L CV and L CS values (group 0 basins in Figure 2) revealed that they are unusual in that they are either subject to artificial flow diversions or have unusually large natural swamps relative to their drainage areas. Two of these basins had the highest values of the discordancy measure (3.17 and 3.50). On these grounds we removed the six basins from the sample and recalculated the discordancy and heterogeneity measures. This led to a marginal reduction in the H statistic (Table 1). Only one basin stood out as being mildly discordant, with D i - 3.74. Nevertheless, the basin was retained because there is .no evidence of gross errors in its flood data. We constructed separate maps of the sample L moment ratios for the study area. These maps did not reveal any notable spatial patterns or trends in the spatial distributions of L CV, L CS, and L kurtosis.

Following Hosking and Wallis [1997, section 9.2.3], we ini- tially divided the set of 88 basins into four groups of approxi- mately equal size according to their drainage areas. We named these groups A-D (Table 1). L moment ratio plots indicated that the groups were spread randomly across the ranges of L CV, L CS and L kurtosis. Two sites stood out as being mildly discordant (D• - 3.08 and 3.27), but the H statistics indicate that the regions are "definitely heterogeneous." Adjustments to the area class limits did not lead to a marked improvement in regional homogeneity.

3374 BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS

Table 1. Details of Candidate Region

Area, Number H Region km 2 of Sites Statistic*

All sites 3-956 94 11.79 Subset 3-956 88 8.81

Group A 0-150 25 6.89 Group B 151-300 18 3.64 Group C 301-500 21 3.85 Group D 501-956 24 7.12 Group 17 88-865 20 - 1.76 Group 27 3-622 20 - 1.48 Group 37 . 14-943 20 0.99 Group 47 15-956 28 0.42 Supergroup 1 + 47 15-956 48 3.72 Supergroup 2 + 37 3-943 40 1.89

*Heterogeneity measure of Hosking and Wallis [1993]. ?Groupings based on sample L moment ratios.

Experiments with various regions in L moment space led eventually to the partition illustrated in Figure 2. The four regions are "acceptably homogeneous," encompass all of the available sites, and exhibit small between-basin variations in L CS as well as L CV, and each region contains 20 basins or more. (Alternative partitions with two, three, or four groups failed to meet all of these criteria.) We named the four regions groups 1, 2, 3, and 4 (Table 1). The values of the subsidiary heterogeneity measures were less than 1 for all four groups and D i < 3 for every basin. The best fit distributions (as deter- mined by the goodness-of-fit statistic of Hosking and Wallis [1993]) for groups 1-4 were the generalized Pareto, general- ized logistic, generalized normal, and generalized Pareto, re- spectively. The percent differences between the estimates of the three distributional parameters for groups 1 and 4 were 33%, 50%, and 108% respectively. The regional flood quan- tiles obtained from the four best-fit distributions show marked

differences for exceedance probabilities less than 0.10. We make no claim that this partition of the L moment space

is optimal in some sense or unique. For example, the group 3 site with the largest L CV and L CS could be moved to group 2, the group 1 site with the lowest L CV and L CS could be moved to group 2, and the group 2 site closest to the group 1 boundary could be moved to group 1. If these changes are made independently, the H statistics for the target groups change by -18%, 11%, and -4.5%, respectively, and Di -< 3 for every basin.

We recognize that the partition is not consistent with the intended use of the H statistic. The use of this statistic assumes

that the sites cluster with respect to basin characteristics rather than sample L moment ratios. Thus the H statistic intervals specified by Hosking and Wallis [1993] do not strictly apply. Rather, our partition is used as a device for assessing the extent to which differences in L CV and L CS are associated with

differences in basin characteristics.

3.2. Statistical Analyses

3.2.1. Principal component analysis. The first four com- ponents give a reasonable summary of the transformed basin characteristics data as they account for 71% of its variance. Seven components are needed to explain 90% of the variation. Thus the analysis is only mildly successful in reducing the dimensionality of the data.

Table 2 lists the variable loadings for the first four compo- nents. For the first principal component, there is a moderate

positive loading on TQSA, and moderate negative loadings on RAIN, TSLOPE, TRELIEF, and TFOREST. This component appears to measure a contrast between the extent of alluvial deposits (or by inference floodplain extent) and measures of terrain ruggedness and rainfall. The second component has a relatively high positive loading on RDAYS, a moderate posi- tive loading on RAIN, and moderate negative loadings on TAREA and TRELIEF. This component appears to measure a contrast between basin rainfall regime and the three- dimensional nature of basin shape. Interpretation of the third component is not obvious. The fourth component has moder- ate positive loading on SDEN and relatively high negative loading on TSHAPE. This component appears to measure a contrast between stream density and basin shape.

Pairwise scatter plots of the first to fourth principal compo- nent scores did not indicate any distinct groupings. Figures 3a and 3b show the distributions of the first and second principal component scores. Although the overall differences between the distributions is slight, the median scores for group 2 and group 3 basins appear to be similar to each other and dissimilar to those for groups 1 and 4. The median score for group 2 basins in Figure 3a is different to the median score for either the group 1 or group 4 basins at roughly the 0.05 significance level. Similarly, the median score for group 2 basins in Figure 3b is different to the median score for group 1 basins at roughly the 0.05 level. These results suggest the aggregation of groups 1 and 4 to form supergroup 1 + 4, and groups 2 and 3 to form supergroup 2 + 3. Figure 2 shows that supergroup 1 + 4 basins have higher L CV relative to L CS than supergroup 2 + 3 basins. The H statistics for supergroups 1 + 4 and 2 + 3 suggest that they are "definitely heterogeneous" and "possibly heterogeneous," respectively (Table 1). Thus supergroup 2 + 3 is not far from being homogeneous.

3.2.2. Cluster analysis. Table 3 summarizes the details of the four clusters obtained using the average linkage within groups method and the squared euclidean distance metric. Cluster 1 is not a feasible region in that it contains too few basins. Nevertheless, 80% of the basins within clusters 1 and 2 (supercluster 1-2) are from groups 1 and 4 while 73% of the basins within cluster 4 are from groups 2 and 3. Cluster 3 consists of a mix of basins from all four groups. Clusters 3 and

Table 2. Principal Component Loadings

Component

Variables* 1 2 3 4

RAIN -0.349 0.340 0.000 0.000

RDAYS 0.000 0.628 -0.274 0.000

TEVAP 0.217 - 0.285 0.527 0.104 TI12:2 0.041 0.000 -0.123 0.000

TAREA 0.142 -0.394 -0.582 -0.281

TSHAPE -0.118 0.000 0.160 -0.741 TSLOPE - 0.390 0.100 0.286 0.210

TELEV - 0.236 - 0.239 0.159 - 0.188 TRELIEF -0.330 -0.340 -0.361 0.166 SDEN -0.198 -0.101 0.000 0.424 TFOREST - 0.394 - 0.115 0.000 0.000

TQSA 0.337 0.187 0.000 0.238

*TAREA = AREAø'S; TELEV = ELEVø'4; TEVAP = log (EVAP); TI12:2 = 112:2-2; TFOREST = log [(FOREST - 0.005)/(1 - (FOR-

O5 EST - 0.005))]' TRELIEF : RELIEF'; TQSA = log [(QSA - 0.005)•'s/(1 - (QSA- 0.005)•'s)]; TSHAPE = SHAPE-ø'•; TSLOPE = log (SLOPE).

BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS 3375

1 2 3 4

Candidate Group

(a)

O -

(b)

1 2 3 4

Candidate Group

(c) (d)

1 2 3 4 1 2 3 4

Candidate Group Candidate Group

Figure 3. Notched box plots illustrating distributions of (a) first principal component; (b) second principal component; (c) first canonical variate; and (d) first basin characteristics canonical variable. Edges of box mark upper and lower quartiles, and horizontal white bar depicts median. Distance between quartiles is the interquartile range (IQR). If notches about two medians do not overlap, display indicates that medians are different at a rough 0.05 level of significance. End points of whiskers denote either data extremes (no adjacent horizontal lines), or adjacent values defined by upper quartile plus 1.5 x IQR and lower quartile minus 1.5 x IQR. Inverted end point near upper or lower quartile denotes notch that lies above or below that quartile, respectively. Horizontal lines mark data that lie beyond adjacent values.

4 (supercluster 3-4) contain 66% of the basins within groups 2 and 3.

The results from the three other clustering procedures dif- fered in terms of the number of clusters suggested at each level

Table 3. Results of Cluster Analysis' Average Linkage Within Groups Method With Squared Euclidean Distance Metric

Candidate Group Membership Number

Cluster of Basins P[G =1] P[G =2] P[G =3] P[G =4]

1 11 0.27 0.09 0.18 0.46 2 29 0.35 0.10 0.07 0.48 3 22 0.23 0.32 0.27 0.18 4 26 0.08 0.35 0.38 0.19 1-2 40 0.325 0.10 0.10 0.475

3-4 48 0.15 0.33 0.33 0.19

P[G = 1], .-., P[G = 4] denote the estimated probabilities that basins within a given cluster or partition belong to groups 1-4, respec- tively.

of their dendrograms and cluster membership. The average misclassification rates for these procedures ranged from 27% to 50% at the supergroup level. However, in every case basins within supergroup 1 + 4 tend to be clustered separately from basins in supergroup 2 + 3. Thus the general results of the cluster analyses are consistent with those for principal compo- nent analysis.

3.2.3. Canonical variates analysis. Application of the likelihood ratio test revealed that the canonical variates are

significant at the 0.002, 0.70, and 0.77 levels. Assuming that the data distribution is multivariate normal, this suggests that only Y• is significant at the 0.05 level. This variate explains 77.1% of the between-group variance.

Pairwise scatter plots of the canonical variates revealed con- siderable overlap between the basins of different candidate groups. Nevertheless, Figure 3c shows that the distributions of Y• for groups 1 and 4 basins are markedly different to those for groups 2 and 3. Comparison of the box plot notches for groups 1 and 4 gives marginal though unconvincing support for the null hypothesis that the group medians are equal at roughly the

3376 BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS

Table 4. Standardized Coefficients for First Two Canonical

Variates, Y•, Y2, and Basin Characteristics Canonical Variables, W•, W2

Variables Y1 Y2 •gl •g2

RAIN 0.858 0.537 0.079 0.004

RDAYS -0.130 0.192 -0.003 -0.040

SDEN -0.156 0.181 0.004 0.034

TI12:2 0.633 0.117 0.051 -0.038

TSHAPE 0.027 0.160 0.001 0.034

TELEV -0.256 0.392 -0.013 -0.018 TRELIEF -0.096 0.258 -0.007 0.066

TEVAP -0.624 0.306 -0.059 -0.027

TFO REST 0.080 0.395 0.005 0.021

TQSA 0.292 0.292 0.029 0.014 TSLOPE 0.447 -0.700 0.013 -0.139

TAREA -0.063 0.502 -0.007 -0.081

0.05 level. Also, the group 2 median is less than the first quartile for group 3. Thus the analysis differentiates between supergroups 1 + 4 and 2 + 3, and to this extent it is consistent with the principal components and cluster analyses. It also has some success in differentiating between groups 1 and 4.

Table 4 lists the standardized coefficients for the first two

canonical variates. Y• contains relatively large positive coeffi- cients for RAIN and TI12:2, a relatively large negative coeffi- cient for TEVAP and moderate positive coefficients for TSLOPE and TQSA, and a moderate negative coefficient for TELEV. Therefore Y1 is primarily a function of climate vari- ables and contrasts the variables RAIN and TI12:2 with the

variable TEVAP. This contrasts with the first principal com- ponent which is dominated by physical basin characteristics.

Figure 4 shows notched box plots for the basin variables. Plots for RDAYS, SDEN, TSHAPE, TRELIEF, TFOREST, and TAREA are omitted because of their small standardized

coefficients in Y•. The distributions of RAIN for the group 2 and group 3 basins are similar to each other in terms of loca- tion and dissimilar to the group 1 and group 4 distributions. The RAIN values for group 4 are similar to those for group 1 and dissimilar to those of groups 2 and 3. The median RAIN for group 1 basins is significantly different to the group 3 median at roughly the 0.05 level. Comparison of the notch for group 2 with the notch for group 1 or group 4 gives marginal though unconvincing support to the null hypothesis that the group medians are equal at roughly the 0.05 level. The TI12:2 distributions for groups 1, 3, and 4 are very similar to each other and marginally different to the distribution for group 2. The distributions of TEVAP for the group 2 and group 3 basins are similar to each other in terms of location and dissimilar to

those for the group 1 and group 4 basins. In particular, the TEVAP distribution for group 2 basins is very dissimilar to the group 1 and group 4 distributions. Comparison of the notched boxes for the group 1 and group 3 basins gives marginal though unconvincing support to the null hypothesis that their medians are equal at roughly the 0.05 level. The remaining notched box plots suggest that the between group differences in the TSLOPE, TQSA, and TELEV distributions are not large. Thus the climatic characteristics dominate the between-group variation.

3.2.4. Canonical correlation analysis. The estimated ca- nonical correlations are 0.81, 0.39, and 0.28, respectively. Thus Z1 has a strong correlation with W• while the correlations between Z2 and W2 and between Z 3 and W3 are relatively

weak. Application of the likelihood ratio test revealed that the correlations are significant at the 10 -7 , 0.62, and 0.79 levels. Assuming that the data distribution is multivariate normal, this suggests that only the first canonical correlation is nonzero. However, our cross validation experiments with tree-based models suggested the retention of the second canonical vari- able (see below).

Figure 3d reveals that the W• values for basins in groups 2 and 3 are markedly different to those for basins in groups 1 and 4. Thus the principal components, cluster, canonical variates, and canonical correlation analyses suggest sizeable between- supergroup differences in basin characteristics. The median for group 1 is significantly different to the group 4 median at roughly the 0.05 level. This is a stronger result than that ob- tained in the canonical variates analysis. The differences be- tween the W2 medians for groups 1 to 4 are not significant at roughly the 0.05 level.

Table 4 reports the standardized coefficients for W• and W2. W• contains relatively large positive coefficients for RAIN and TI12:2, a relatively large negative coefficient for TEVAP, and a moderate positive loading on TQSA. The coefficients for TSLOPE and TELEV are relatively small. Thus the variables that dominate W1 are similar to those for Y• despite the absence of a presumed group structure in the canonical cor- relation analysis. W2 contains a relatively large negative coef- ficient for TSLOPE, a moderate negative coefficient for TAREA, and a moderate positive coefficient for TRELIEF.

The L moment canonical variables are given by

Z1 = -0.131L CV + 0.076L CS- 0.016L kurtosis

Z2 = 0.081L CV- 0.175L CS + 0.216L kurtosis

Z • measures a contrast between the variables L CS and L CV. Thus the higher the value of L CS relative to L CV, the greater the value of Z1. The strong positive correlation between Z• and W• suggests this contrast is higher when RAIN and TI12:2 are large relative to TEVAP. Thus there is a strong association between climatic characteristics and similarity in standardized flood peak distributions. Although Z2 measures a contrast between L kurtosis and L CS, the correlation between Z2 and W2 is relatively weak.

3.2.5. Tree-based modeling. Cross validation experi- ments indicated that trees with two to three "leaves" produced minimum deviance for the canonical variates. Thus a three-leaf

tree provided a reasonable trade-off between misclassification rate and model parsim'ony. The basins are split at Y1 -- 0.063, -0.894. About 87% of basins with Y1 < 0.063 belong to supergroup 1 + 4, and 79% of basins with Y• >_ 0.063 belong to supergroup 2 + 3. This indicates that a high degree of structure in the canonical variate space and marked differences between linear combinations of the characteristics of super- group 1 + 4 and supergroup 2 + 3 basins. Moreover, Table 5 shows that the second split is reasonably successful in separat- ing basins in groups 1 and 4. Thus there are reasonable differ- ences between the basin characteristics of the candidate 'ho-

mogeneous' groups. Cross validation experiments indicated that trees with three

to five leaves produced minimum deviance for the W variables. Thus a four-leaf tree provided a reasonable trade-off between misclassification rate and model parsimony. The basins are split at W1 = 0.050, -0.091 and W2 = -0.0069. About 80% of basins with W1 < 0.050 belong to supergroup 1 + 4,

BATES ET AL.' CLIMATIC AND PHYSICAL FACTORS 3377

I

dVA=:11

I

cjo' 0

•:•11

A393ñ

0 01. 0 0 - 0•-

VSO_L

I 010 I I I I OOCj I. 0 I. OOCj ff • 0

NlVbl :ld09Sl

3378 BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS

Table 5. Results of Tree-Based Modeling on First Canonical Variate and on First and Second Basin

Characteristics Canonical Variables

Candidate Group Membership* Number

"Leaf" of Basins P[G = 1] P[G = 2] P[G = 3] P[G = 4]

1 22

2 43

3 23

1 17

2 18

3 14 4 39

Y• Tree 0.64 0 0.04 0.32 0.07 0.35 0.44 0.14 0.13 0.22 0 0.65

W1 - W2 TreeP 0.71 (0.76) 0 (0.06) 0 (0) 0.29 (0.18) 0 (0) 0.56 (0.59) 0.44 (0.41) 0 (0) 0.07(0.07) 0.14(0.33) 0.58(0.47) 0.21(0.13) 0.18 (0.31) 0.20 (0.13) 0.10 (0.20) 0.52 (0.36)

*P[G = 1], .--, P[G = 4] denote the estimated probabilities that basins within a given cluster or partition belong to groups 1-4, respec- tively.

?Numbers in parentheses denote estimated probabilities obtained from sensitivity analysis.

and 88% of basins with W• -> 0.050 belong to supergroup 2 + 3. This is an important result because the tree-based model has confirmed marked differences between basin characteristics of

the supergroups despite the absence of presumed group struc- ture in canonical correlation analysis. Table 5 reveals that the second split provides reasonable separation of basins in groups 1 and 4, but the attempt by the third split to separate basins in Groups 2 and 3 is only partially successful. This is not surpris- ing given the slight contrast between the W2 distributions for these groups. The failure of the analysis to discriminate ade- quately between groups 2 and 3 is not too serious given that supergroup 2 + 3 is possibly heterogeneous.

We assessed the sensitivity of the W• - W2 tree to uncer- tainty in group membership. Consider Figure 2: the three group 1 sites on the group 1 boundary that are closest to the group 2 boundary were reassigned to group 2; the four group 2 sites closest to the group 1 boundary were assigned to group 1; the two group 3 sites closest to the group 2 boundary were assigned to group 2; the three group 3 sites adjacent to the group 4 boundary were assigned to group 4; and seven sites along the group 4 boundary were assigned to group 3. These reassignments had no effect on the splits but did affect the discrimination of group 3 and group 4 sites (Table 5). This result is not surprising because the changes imposed on group membership have no effect on a site's location in canonical variable space, and a large number of changes in group mem- bership were imposed on group 4 relative to other groups.

3.3. Split-Sample Validation

The 10 basins that had been put aside were used to assess whether a hydrologic signal had been captured rather than a noise pattern. Their "true" group membership was determined by comparing the at-site standardized quantiles against the quantile functions for the four best-fit distributions. For each site the mean squared error of the theoretical quantiles was used to quantify the goodness of fit. Thus T6, T7, and T10 were assigned to group 1; T2 to group 2; T4 and T5 to group 3; and T1, T3, T8, and T9 to group 4. The allocation of the test basins to the candidate regions had a marginal impact on their D i and H statistics.

Consider supergroups 1 + 4 and 2 + 3. Only one site (T4)

is misclassified by the tree for the W• - W2 data. Thus the misclassification rate for supergroups is about 10%. This en- genders confidence in the general inferences made above. Comparison of the at-site and best-fit quantile functions and perusal of Figure 2 suggests that T4 should be a group 3 site. However, T4 is classified as a group 4 site by the W• - W2 tree. The basin is located between two spurs of the Grampians (Victoria and Serra Ranges) that have a southwest-to- northeast orientation. This area experiences steep, local rain- fall gradients, and there are no rain gauges in the upland of the basin. It is conceivable that mean annual rainfall at the cen-

troid of the basin may underestimate the mean rainfall over the entire basin in this instance. Consequently, the value of W• for T4 may be unduly low.

The misclassification rate obtained when the W• - W2 tree is used to assign test sites to the candidate regions is 50%. Although this misclassification rate is high, the misclassifica- tion rate obtained at the supergroup level when the test basins were assigned to the superclusters described in section 3.2.2 was 60%. This compares poorly with the corresponding rates of 10% for the W• - W2 tree and 50% for a random guess. (Recall that superclusters 1-2 and 3-4 are dominated by su- pergroups 1 + 4 and 2 + 3 basins, respectively, and that the candidate groups were not distinguishable at the individual cluster level (Table 3).) Thus the misclassification rate for the W• - W2 tree is far lower than that for cluster analysis at either the supergroup or individual group level.

3.4. Physical Interpretations

3.4.1. Role of runoff generation. Although the analyses indicate a relationship between membership of candidate ho- mogeneous regions and a basin's climatic characteristics, they do not suggest its functional form. The data transformations used merely rescale the distributions of the original variables to approximate normality. Nevertheless, the contrast between the variables RAIN and TI12:2 and the variable TEVAP indi-

cated by the canonical variates and canonical correlation anal- yses suggests a contrast between rainfall and evaporation re- gimes. This may be interpreted as a measure of basin water availability (i.e., soil moisture storage, deep seepage, and run- off). This finding supports recent theoretical formulations of the link between the physical processes that affect runoff gen- eration in channel network-hillslope systems and regional flood statistics [Gupta and Waymire, 1998].

3.4.2. Geographical coherence of candidate homogeneous regions. Consider the spatial distribution of supergroup 2 + 3 basins (Figure 1). These basins are aligned along the north- west-to-southwest edges of the Great Dividing Range, the Grampians, Blue Mountain, and the Otway and Strzelecki Ranges. This suggests that there is an association between the supergroup membership and the degree to which exposed ba- sins benefit from orographic uplift of frontal systems. Super- group 2 + 3 basins are found also in areas exposed to moist prevailing winds around the southwest corner of the study area. The annual flood series for these basins exhibit low L CV

relative to L CS (Figure 2) and have high RAIN and low TEVAP values relative to those for supergroup 1 + 4 (Figure 4). There are five adjacent supergroup 1 + 4 basins that may have been misclassified during the ad hoc partitioning of the L moment space (Figure 2): the group 1 basins immediately east of the source of the Murray River and to the southeast of the Strzelecki Ranges lie close to the ad hoc boundary for group 2; the group 4 basin immediately to the east of Melbourne lies

BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS 3379

close to the conjunction of the boundaries for groups 1 and 2; and the group 4 basins close to the Otways and adjacent to the supergroup 2 + 3 basins on the extreme southwest coast lie close to the boundary for group 3.

The supergroup 1 + 4 basins show a reasonable degree of spatial coherence around the Divide and in the inland western portion of the study area (Figure 1) and have low RAIN and high TEVAP values relative to those for supergroup 2 + 3 (Figure 4). The annual flood series for these basins exhibit high L CV relative to L CS (Figure 2). There are two adjacent supergroup 2 + 3 basins that may have been misclassified during the ad hoc partitioning of the L moment space: the westernmost group 2 basin (Figure 1) lies close to the group 1 boundary (Figure 2) and the group 3 basin to the immediate northeast of Blue Mountain (Figure 1) lies close to the bound- ary for group 4 (Figure 2).

Although the spatial coherence of either supergroup reflects a general level of similarity in climatic characteristics, it is readily apparent that the candidate homogeneous groups within a supergroup are not geographically contiguous (Figure 1). The canonical variates and canonical correlation analyses also suggest that the relatively slight differences in climatic characteristics between groups 1 and 4 are sufficient to cause discernible differences in the L moment ratios in annual flood

series. Also, the spatial configuration of the basins within these groups suggests that differences in the seasonal pattern of rainfall between areas west and east of the Divide (winter- dominated versus uniform seasonal pattern, respectively) have little influence on the statistics of annual flood series.

4. Discussion

Hosking and Wallis [1993] state that the delineation of re- gions on the basis of sample L CV would reflect only the pattern of noise in the data and have no physical significance. They advocate the assignment of sites to regions on the basis of physical explanatory variables or geographical location. How- ever, this is difficult when the key variables and their relation- ships with geographical location are not known a priori and clustering techniques fail to provide groupings that are mean- ingful in a hydrologic sense.

The purpose of our work was to identify the basin charac- teristics that cause similarity in standardized flood peak distri- butions in southeastern Australia. The groups identified herein are based on sample estimates of flood statistics, and highly accurate groupings are unlikely given the record lengths of the annual flood series involved. While we cannot claim that a

misclassification rate of 50% during split-sample testing is ac- ceptable, it is conceivable that the use of additional basin characteristics or better descriptors of the characteristics used may provide a more satisfactory result. Nevertheless, the H statistic for the 88 gauged sites (Table 1) indicates a very high degree of heterogeneity and the spatial variability in the phys- iography and climate of the associated basins is also high. Maps of sample L CV, L CS, and L kurtosis did not reveal any notable spatial patterns or directional trends (e.g., from north to south or east to west) or contiguous areas with usually low or high L moment ratios. If our candidate groups (or super- groups) reflected nothing more than sampling variability, we would expect the notched box plots in Figures 3c and 3d to exhibit levels of overlap similar to those in Figures 3a and 3b. In the canonical variates analysis (Figure 3c), our assumed group structure was necessarily imposed on the basin charac-

teristics data prior to the analysis. In the canonical correlation analysis (Figure 3d), our assumed group structure was imposed on the first basin canonical variable a posteriori. Yet the be- tween-group and between-supergroup differences depicted in Figure 3c bear a striking resemblance to those in Figure 3d, and perusal of the discriminant and standardized coefficients lead to essentially the same physical interpretation. Thus the evidence for two or more populations (regions) with overlap- ping distributions in the basin characteristics space is strong.

Alternatives to our approach are weighted least squares (WLS) [Tasker and Stedinger, 1986] and generalized least squares (GLS) [Tasker and Stedinger, 1989] regression. Tasker and Stedinger [1986] used WLS to regress unbiased estimates of the at-site skewness coefficient on basins characteristics data

for 62 sites in Illinois. Two dummy variables were used to capture a trend in the spatial distribution of the skewness coefficient across three river basins that contained the 62 sites.

The key basin variables were identified using an all-possible- regressions approach. The use of dummy variables is consistent with statistical practice in that different regression models were derived for different populations [Draper and Smith, 1981, section 5.4]. However, it is not clear how such a model could be effectively exploited in our study as there are no notable spatial patterns or trends in our sample statistics and there is no other a priori information about population mem- bership. Our results provide some guidance on population membership for sites within southeastern Australia, and these preliminary groupings could be investigated further using WLS or GLS regression. Here a supermodel that bridged across the populations could be defined in which group membership probabilities could be used to weight the different regression functions for each population. Although automated variable selection (forward, backward, and stepwise) techniques are commonly available for WLS regression, this is not the case for GLS regressions involving more than one response variable (L moment ratios or flood quantiles). Thus the use of GLS re- gression for the identification of key basin variables will present some difficulty when more than one response variable is considered simultaneously.

5. Conclusions and Recommendations

Our exploratory study of regional flood frequency in south- eastern Australia was motivated by increasing recognition of the need to identify basin characteristics that lead to similarity in flood response [e.g., Potter, 1987; Gupta et al., 1994; Bobee and Rasmussen, 1995]. Despite the empirical nature of the study and the number of statistical techniques involved, we obtained a reasonably consistent view of the basin character- istics that control the statistical distributions of flood peaks. Our main findings are as follows:

1. For southeastern Australia, similarity in standardized flood peak distributions is determined primarily by similarity in climatic characteristics of basins. Conventional approaches to the index flood method directly attribute the effects of varia- tions in basin characteristics to the index flood; the link be- tween these variations and the form of the regional parent distribution has not been fully appreciated.

2. For southeastern Australia, basins that are relatively wet (high rainfall relative to evaporation) tend to have annual flood series that exhibit low L CV relative to L CS. Basins that

are relatively dry (low rainfall relative to evaporation) exhibit high L CV relative to L CS.

3380 BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS

3. For the (somewhat limited) range of drainage areas considered herein (3-956 km 2) we found no evidence of a link between drainage area and the form of the regional parent distribution. However, regressions of mean annual flood on the basin characteristics data for our candidate homogeneous re- gions indicate that drainage area is an important explanatory variable. Thus drainage area influences flood magnitude but has little impact on regional homogeneity within the study area.

4. Differences in the seasonal pattern of rainfall for south- eastern Australia appear to have little influence on the statis- tical distributions of flood peaks.

5. Supergroups consisting of aggregated homogeneous re- gions with reasonably similar flood characteristics show some degree of spatial coherence. This appears to be related to the extent to which basins benefit from orographic uplift of frontal systems and moist coastal winds.

6. Much can be learned about the climatic and physical factors that influence regional homogeneity despite the use of basin characteristics that smooth spatial and temporal variabil- ity. For southeastern Australia the linear combination of cli- matic characteristics that is highly associated with similarity in standardized flood peak distributions can be interpreted as basin water availability. This interpretation supports recent theoretical formulations of the link between the physical pro- cesses that affect runoff generation in channel network- hillslope systems and regional flood statistics. Nevertheless, it is likely that capture of the spatial and temporal variability of within basin characteristics (e.g., storm duration and interar- rival time statistics, actual evaporation) could produce deeper insight into influential factors and better discrimination of ho- mogeneous regions.

7. Our findings apply to one geographical region where snowmelt makes a minor contribution to flood peaks and flood-producing rainfall is caused typically by frontal systems in basins with drainage areas greater than a few square kilo- meters. Physiographic attributes may be more influential in regions where flood regimes are dominated by snowmelt pro- cesses. Thus the generality of our findings needs to be tested in other geographical regions with different climatic regimes.

8. Our approach may be useful in regionalization studies when the key basin variables and their relationships with geo- graphical location are not known a priori and clustering tech- niques fail to provide groupings that are meaningful in hydro- logic sense. Thus it could be used to make a preliminary determination of group membership for the regression ap- proach or some other regionalization technique.

Acknowledgments. The authors thank the Cooperative Research Centre for Catchment Hydrology for funding the research; the Bureau of Meteorology and the Rural Water Corporation for supplying the rainfall and streamflow data, respectively; Tam Hoang and Phillip Yap for their assistance in the collection of basin characteristics data; and Rory Nathan for his constructive suggestions and support throughout this work. The first author benefited from discussions with Vijay K. Gupta, University of Colorado, Boulder, and Edward P. Campbell, CSIRO Division of Mathematical and Information Sciences, Perth. Comments from Jery Stedinger, Stephen Burges, Donald Burn, and an anonymous reviewer were extremely helpful and are appreciated.

References

Acreman, M. C., and C. D. Sinclair, Classification of drainage basins according to their physical characteristics: An application for flood frequency analysis in Scotland, J. Hydrol., 84, 365-380, 1986.

Benson, M. A., Evolution of methods for evaluating the occurrence of floods, U.S. Geol. Surv. Water Supply Pap., 1580-A, 30 pp., 1962.

Bhaskar, N. R., and C. A. O'Connor, Comparison of method of resid- uals and cluster analysis for flood regionalization, J. Water Resour. Plann. Manage., 115(6), 793-808, 1989.

Bobee, B., and P. Rasmussen, Recent advances in flood frequency analysis, U.S. Natl. Rep. Int. Union Geod. Geophys. 1991-1994, Rev. Geophys., 33, 1111-1116, 1995.

Burn, D. H., Delineation of groups for regional flood frequency anal- ysis, J. Hydrol., 104, 345-361, 1988.

Burn, D. H., Cluster analysis as applied to regional flood frequency, J. Water Resour. Plann. Manage., 115(5), 567-582, 1989.

Burn, D. H., An appraisal of the "region of influence" approach to flood frequency analysis, Hydrol. Sci. J., 35, 149-165, 1990a.

Burn, D. H., Evaluation of regional flood frequency analysis with a region of influence approach, Water Resour. Res., 26(10), 2257-2265, 1990b.

Chatfield, C., and A. J. Collins, Introduction to Multivariate Analysis, 246 pp., Chapman and Hall, New York, 1980.

Chowdhury, J. U., J. R. Stedinger, and L.-H. Lu, Goodness-of-fit tests for regional generalized extreme value flood distributions, Water Resour. Res., 27(7), 1765-1776, 1991.

Clark, L. A., and D. Pregibon, Tree-based models, in Statistical Models in S, edited by J. M. Chambers and T. J. Hastie, pp. 377-419, Wadsworth and Brooks, Pacific Grove, Calif., 1992.

Dalrymple, T., Flood frequency analysis, U.S. Geol. Surv. Water Supply Pap., 1543-A, 11-51, 1960.

Draper, N. R., and H. Smith, Applied Regression Analysis, 2nd ed., 709 pp., John Wiley, New York, 1981.

Duncan, J. S., Atlas of Victoria, 239 pp., Victorian Gov. Print. Off., Melbourne, Australia, 1982.

Everitt, B. S., Cluster Analysis, 3rd ed., 170 pp., Edward Arnold, Lon- don, 1993.

Fill, H. D., and J. R. Stedinger, Homogeneity tests based upon Gumbel distribution and a critical appraisal of Dalrymple's test, J. Hydrol., 166, 81-105, 1995.

Gupta, V. K., and E. Waymire, Scale invariance and regionalization of floods, in Scale Dependence and Scale Invariance in Hydrology, edited by G. Sposito, Cambridge Univ. Press, New York, 1998.

Gupta, V. K., O. J. Mesa, and D. R. Dawdy, Multiscaling theory of flood peaks: Regional quantile analysis, Water Resour. Res., 30(12), 3405-3421, 1994.

Hosking, J. R. M., L moments: Analysis and estimation of distributions using linear combinations of order statistics, J. R. Stat. Soc., Set. B, 52(2), 105-124, 1990.

Hosking, J. R. M., and J. R. Wallis, The effect of interbasin depen- dence on regional flood frequency analysis, Water Resour. Res., 24(4), 588-600, 1988.

Hosking, J. R. M., and J. R. Wallis, Some statistics useful in regional frequency analysis, Water Resour. Res., 29(2), 271-281, 1993.

Hosking, J. R. M., and J. R. Wallis, Regional Flood Frequency Analy- sis-An Approach Based on L-Moments, 224 pp., Cambridge Uni- versity Press, New York, 1997.

Jobson, J. D., Applied Multivariate Data Analysis, 731 pp., Springer- Verlag, New York, 1992.

Johnson, R. A., and D. W. Wichern, Applied Multivariate Statistical Analysis, 607 pp., Prentice-Hall, Englewood Cliffs, N.J., 1988.

Kaufman, L., and P. J. Rousseeuw, Finding Groups in Data: An Intro- duction to Cluster Analysis, 342 pp., John Wiley, New York, 1990.

Krzanowski, W. J., Principles of Multivariate Analysis: A User's Perspec- tive, 563 pp., Claredon Press, Oxford, 1988.

Kshirsagar, A.M., Multivariate Analysis, 534 pp., Marcel Dekker, New York, 1972.

Kuczera, G., Effect of sampling uncertainty and spatial correlation on an empirical Bayes procedure for combining site and regional infor- mation, J. Hydrol., 65, 373-398, 1983.

Lettenmaier, D. P., and K. W. Potter, Testing flood frequency estima- tion methods using a regional flood generation model, Water Resour. Res., 21(12), 1903-1914, 1985.

Lettenmaier, D. P., J. R. Wallis, and E. F. Wood, Effect of regional heterogeneity on flood frequency estimation, Water Resour. Res., 23(2), 313-323, 1987.

Lu, L.-H., and Stedinger, J. R., Sampling variance of normalized GEV/ PWM quantile estimators and a regional homogeneity test, J. Hy- drol., 138, 223-245, 1992.

BATES ET AL.: CLIMATIC AND PHYSICAL FACTORS 3381

Morrison, D. F., Multivariate Statistical Methods, 3rd ed., 560 pp., McGraw-Hill, New York, 1990.

Mosley, M.P., Delimitation of New Zealand hydrological regions, J. Hydrol., 49, 173-192, 1981.

Nathan, R. J., and T. A. McMahon, Identification of homogeneous regions for the purpose of regionalisation, J. Hydrol., 121,217-238, 1990.

Norusis, M. J., SPSS: SPSS for Windows, Professional Statistics, Release 6.0, 375 pp., SPSS Inc., Chicago, Ill., 1989.

Potter, K. W., Research on flood frequency analysis: 1983-1986, U.S. Natl. Rep. Int. Union Geod. Geophys. 1983-1986, Rev. Geophys., 25, 113-118, 1987.

Ribeiro-Correa, J., G. S. Cavadias, B. Clement, and J. Rousselle, Identification of hydrological neighborhoods using canonical corre- lation analysis, J. Hydrol., 173, 71-89, 1995.

Tasker, G. D., and J. R. Stedinger, Regional skew with weighted LS regression, J. Water Resour. Plann. Manage., 112(2), 225-237, 1986.

Tasker, G. D., and J. R. Stedinger, An operational GLS model for hydrologic regression, J. Hydrol., 111,361-375, 1989.

Venables, W. N., and B. D. Ripley, Modem Applied Statistics with S-Plus, 462 pp., Springer-Verlag, New York, 1994.

Wiltshire, S. E., Identification of homogeneous regions for flood fre- quency analysis, J. Hydrol., 84, 287-302, 1986a.

Wiltshire, S. E., Regional flood frequency analysis I: Homogeneity statistics, Hydrol. Sci. J., 31(3), 321-333, 1986b.

Wiltshire, S. E., Regional flood frequency analysis, II, Multivariate classification of drainage basins in Britain, Hydrol. Sci. J., 31(3), 335-346, 1986c.

Wright, W. J., A synoptic climatological classification of winter pre- cipitation in Victoria, Aust. Meteorol. Mag., 37(4), 217-229, 1989.

Zrinji, Z., and D. H. Burn, Flood frequency analysis for ungauged basins using a region of influence approach, J. Hydrol., 153, 1-21, 1994.

Zrinji, Z., and D. H. Burn, Regional flood frequency with hierarchical region of influence, J. Water Resour. Plann. Manage., 122(4), 245- 252, 1996.

B.C. Bates, CSIRO Land and Water, Private Bag P. O., Wembley WA 6014, Australia. (email: [email protected]).

R. G. Mein, A. Rahman, and P. E. Weinmann, CRC for Catchment Hydrology, Department of Civil Engineering, Monash University, Clay- ton, VIC 3168, Australia. (email: [email protected]; [email protected]; [email protected]. edu.au).

(Received August 25, 1997; revised July 23, 1998; accepted July 29, 1998.)


Recommended