Options to Select Rain Gauging Stations when Modelling Streamflow for Water Resources Management

: Conversion of point rainfall measurements as areal rainfall is one of the key factors that should be carefully considered during streamflow modelling. Research publications indicate that distribution of gauging stations influences the representativeness of spatially averaged rainfall. Many guidelines provide contrasting recommendations and hence available literature is inadequate to provide specific guidance on the appropriate number of gauging stations, density and averaging method. Past research emphasises that high station densities do not always provide the best streamflow estimates, and point to a threshold density range between 2.6 and 2442 km 2 /station. Ongoing modelling practices reflect that the median station density values used respectively for small, medium and large catchments are 19, 117 and 470 km 2 /station. The most rational option is to select a suitable station density considering both time-tested guidelines and median values as appropriate for watershed size. Commonly used spatial averaging methods are the Arithmetic Average, Thiessen, Kriging, IDW and Spline. Past studies point out that the best is to capture the most contributing stations by optimising the station influence using modelled streamflow.


Introduction
Rainfall is the most important variable that defines water resources in a region. Spatial distribution of rainfall and temporal distribution of rainfall are key factors when estimating watershed runoff using mathematical models for water resources management [1]- [7]. The major factors which influence rainfall and its spatial variability are latitudinal location, orographic effects and wind fields [8], [9]. The spatial resolution of rainfall measurements depends on the location of gauges, while the temporal resolution of the same varies between events, daily, monthly and annually. Planning of water resources with water balance requires the determination of mean areal rainfall in monthly, seasonal and annual scales, while the same for flood studies are on mean areal rainfall of an event [10]. Representativeness of areal rainfall is a very important factor in the development and application of hydrologic models for sustainable water resources management because a poor determination of rainfall spatial variability poses challenges, such as planning of water storage and conveyance structures which in turn threatens sectors dealing with water and food security [11]. Measurement of rainfall is based on gauges placed at selected locations in a desired geographic area. The pattern and magnitude of precipitation depend upon the density of gauging stations and the adopted procedure for analysis [12]. It has been observed that average annual rainfall increases with increasing ground elevation [13].
Typically, rainfall values from the most dense station network are considered as the closest input that would enable the computation of actual areal rainfall on a watershed [14], [15]. In contrast, the work by Anctil et al. [16] has presented the existence of an optimum gauging station density for the forecasting of watershed mean areal rainfall. Work by Wijesekera and Musiake [17]- [19] shows that identification of the influence of rainfall at gauging stations by optimising station weights, provides the opportunity to arrive at the most representative streamflow estimations. Selection of the station network and densities also influence the outputs from a selected spatial interpolation technique [20]. Point rainfall measurements require conversion to areal rainfall for streamflow estimations using watershed models. Therefore, computation of watershed rainfall depends not only on the location of gauging stations and temporal resolution of rainfall, but also on the method of interpolation used to obtain rainfall values in between the gauge locations.
Therefore, the station network used for watershed model calibrations also has a considerable impact on the derived streamflow [21]. It is important to note that station location identification has to consider the spatial and temporal distribution of rainfall while on the other hand, rainfall spatial and temporal distribution identification depends on the location of selected gauging stations. Although it is mentioned that rainfall measurements play an important role in streamflow modelling, hydrologic modellers do not find adequately conclusive guidance on the number of the gauging stations, the station density or the method of data interpolation.
Hence the prime objective of the present paper is to carryout a detailed review of the available practices, in order to select the suitable rain gauging stations for the computation of areal average rainfall when modelling streamflow for water resources management.

Method
A literature survey using scientific search engines and peer-reviewed research publications was carried out to capture available guidance to select appropriate rainfall stations. Use of keywords encompassing station density, station distribution, data resolution etc., related to estimation of watershed streamflow with mathematical models, was the basis for the selection of reviewed publications. Five guidelines on rainfall station selection, 36 publications specifically related to rainfall station selection and 94 watershed rainfallrunoff case studies were reviewed. Key factors affecting the areal rainfall input for a hydrological model were captured, discussed and then an evaluation criterion was developed. Evaluation included a comparison of practices in relation to watershed sizes, the number of stations used and their distribution, areal rainfall estimation methods, and the existence of stations within the catchment or outside etc. The criterion was then used to obtain a numerical indication of the availability and adequacy of methods available for the selection of rainfall stations to compute areal rainfall input for hydrologic models.

General
Although the available guidelines quantitatively recommend station densities while providing an indication that stations must be well distributed, the reviewed applications indicated many deviations. Very few research works appeared to consider that it is important to be careful when selecting the gauging stations and choosing an appropriate averaging method for meaningful streamflow modelling. This is reflected by the research which mention that high rainfall station densities do not always provide the best streamflow estimations.
Literature also shows that, among many methods available for the computation of areal average rainfall, the Thiessen weighted averaging is the mostly used method.

Guidelines and Inclination
Existence of a national guideline for rainfall station selection and spatial density determination is very important, especially when water infrastructure planning and design depends on streamflow estimations using mathematical models. A guideline for gauging station selection is expected to guide the identification of station locations that suitably capture the temporal and spatial variability rainfall. Normally, the optimum density of rain gauges can only be obtained through enough sampling of rainfall within a region [22]. Many research works had attempted to identify suitable gauging station network densities and appropriate locations to capture rainfall data to represent watershed hydrological processes [4], [7], [15], [16], [31], [32]. Rainbird [22] concluded that at least one representative catchment (1295 km 2 or less) with a network density exceeding the WMO recommended minimum for the region by a factor of at least 3 to 5 should be established in each principal climatic and or physiographic region. Reviewing the developments in hydrometric network design, Mishra and Coulibaly [5] support the prevailing general impression that finer temporal and spatial resolution of hydrometric data stations enables the achievement of higher streamflow prediction accuracies. Masih et al. [33], by their work, had also shown that a higher density of rainfall stations provides improved streamflow model performances, particularly in small watersheds with extent varying from 600-1600 km 2 [33].

Recommendations
The common inclination is to capture and use the highest number of rainfall gauging stations within and around the study area, thus leading to the highest possible station density [14], [15]. Lebel et al. [14], using the most number of available stations for their work, classified a very dense gauge network as setting with one gauge approximately representing a watershed area of 16 km 2 . On some occasions, areal rainfall has been captured by selecting a sufficiently dense gauging station network to represent different homogenous zones within the project extent [34]. Presently, there are climate models which are capable of generating rainfall variations at high spatial resolutions, thus making dense rainfall inputs available for catchment model computations. Use of a Regional Climate Model (RCM) has shown that a spatial resolution with a grid of 12 km for Morocco has provided a good representation of catchment hydrology [35]. In a recent study for Seoul, South Korea, Yoon & Lee [36] demonstrated the need for high-density rainfall data at 3 km 2 /station for urban runoff analysis because of the high spatial variation that can occur even in small urban areas.

Threshold Station Density
Although it is obvious that a very closely spaced rainfall station network would be the best option, resource constraints with respect to the establishment and maintenance of gauging stations compel watershed managers to restrict the selection to optimum numbers. Many studies indicate that, instead of attempting to achieve the highest possible station densities, water managers must target to capture the most representative rainfall spatial distribution which makes the best contribution to the watershed streamflow estimation using mathematical models.
Higher station densities demonstrate a tendency to even out the spatial variability by reaching a threshold density beyond which there is insignificant change to areal average rainfall. Lopez et al. [37] reported that the areal rainfall reaches a threshold density of approximately 24 rain gauges per 1000 km 2 beyond which there is levelling off of the interpolation errors with no or negligible contribution from further increases to the station density. Otieno et al. [20], while showing that high station densities provide improved areal rainfall estimations, arrived at an approximate threshold density of 4.82 km 2 /station for a 135.2 km 2 catchment. In a watershed of 8 km 2 with 5 rain gauges, areal mean rainfall improvement with higher station densities had reached a threshold density of 2.6 km 2 /station beyond which improvements in peak flows and total runoff volumes were marginal [32]. Two studies in the Xiangjiang River basin, China,illustrated that 740 -1018 km 2 /station provided the best model performance irrespective of configuration [15] and 1000-1667 km 2 /station for sub-basins with a drainage area greater than 50000 km 2 while 67-100 km 2 /station for sub-basins with a drainage area between 1000 to 50000 km 2 [7]. Another study in the Qingjiang River basin of China using 26 stations concluded that a threshold of 5 stations (corresponding to a denstity of 2442 km 2 /station) would be sufficient to satisfactorily perform streamflow modelling [4]. Also, comparison of errors in streamflow estimations has shown that the high-density station network does not always achieve good performance and lesser densities perform well due to the topographical variations and the orographic rainfall [15], [16].

Effects of Spatial Distribution
Guidance materials on rainfall station selection recommend a well-distributed gauge network [23]-[27], [30], [36] but such documentation lacks the description for a well-distributed network. There are cases which demonstrate that better areal rainfall estimates can be achieved by considering the spatial distribution of gauging stations [32], [38]. By using error indices of precipitation estimations, Lopez et al. [37] confirmed that an increment in the rain gauge density considerably improved the performance of the sensor network while low densities in high elevated upper catchment showed a decline in performance.
As many would obviously expect, the improvements in rainfall estimation and hydrological model performance had been either small or none when the outside catchment gauges were used [21]. Similarly, Morrissey et al. [39] demonstrated that not only the density but also the spatial distribution should be accounted for. Adhikary et al. [38], proposing a method to identify gauging station redundancies for appropriate station relocation, presented a case of 4044 km 2 catchment, where the achieved optimum station density after varying station number and station relocation was 212 km 2 /station.
Using a study in Sangamon River, Illinois, Chow [40] concluded that "the precipitation record at one station only is sufficient for the description of the precipitation influence on streamflow". Shaghaghian and Abedini [41] mention that a single gauge scenario should have the gauge at the centroid of the watershed. It is known that, in Soil & Water Assessment Tool (SWAT), the climate data from the station nearest to the centroid of each subbasin is used [33], [42] even though there are other stations available, assuming gauges located close to the watershed centroid are most contributing. Addresing the spatial variability, a study by Cho and Olivera [43] also evaluated the performance of a single rain gauge based on the proximity to the centroid of sub-catchment. In this work, results indicate a close agreement with the work by Chow and others [40]. However, work by MacKenzie et al. [32] shows that "largest variations in runoff simulations occurred when only one rain gauge was used to represent the rainfall over the entire watershed". This does not show a disagreement because the least error had been with a single gauge located close to the centroid. Another noteworthy case from the above was that, with a two-gauge selection, the preferred locations had been at upper and lower 1/3 portions of the watershed.

Reality of Selection
The practice of selecting rainfall gauges to compute areal rainfall for a watershed shows very little concern regarding the density or the spatial distribution. There are instances when the same catchment is modelled by different researchers using different combinations of rainfall inputs, either with different mathematical models or to fulfil a distinct water management objective. The choices appear to follow the belief that, availability is acceptable, personal discretion is rational or considering any combination is capable of delivering reasonable results. Different Rainfall station settings with the same Xiangjiang model had been satisfactorily used for the study of Xiangjiang River basin with an extent of 94660 km 2 by Xu et al. [15] and Zeng et al. [7]. In the Aller-Leine river basin of Germany, one subbasin had been modelled for different rainfall spatial density selections by using 53 rainfall stations having only one gauge within the basin [31] and having other 52 stations outside of the watershed. In the same basin, five sub-basins had been modelled with a network comprising of 344 stations, in order to evaluate hydrologic modelling strategies. Schulz [50], used 1, 13, 7, 15, 6, 5, 5 and 5 rain gauges respectively, with having different spatial averages for hydrologic model studies of Kalu Ganga/river in Sri Lanka. Several of the studies indicated adherence to WMO (168) standards. However, a majority of the above studies showed station densities between 200-400 km 2 /station.
The selection of rainfall gauging stations to achieve desired densities faces obstacles, such as discontinued gauging stations, long periods of missing data and inconsistencies in the temporal resolution of available data. Mishra and Coulibaly [5], in their work, comment that there are problems in finding the right amount of stations with data because of the decreasing trend in the number of hydrometric stations over the years. Although, Wallner et al. [51] selected 244 precipitation stations with a daily resolution, only 11 stations had an observation period of more than 10 years and thus the study was limited to 6 years due to data unavailability. Dissanayake [48] with a station density of 79.3 km 2 /station and Khandu [52] ENGINEER 67 67 ENGINEER with 92.2 km 2 /station have used two different gauging station networks for the Gin Ganga river basin of Sri Lanka because of the nonaccessibility of daily resolution data in all stations where monthly data was available. Even though rainfall station maps of Sri Lanka show the possibility of selecting rainfall stations with a high density of 86.1 km 2 /station for an evaluation of Kalu Ganga riverbasin, the data availability for a monthly evaluation over a common 10-year period limits the density to a near one-third value of 298 km 2 /station.
Although there are a large number of publications targeting high rainfall network densities with several gauging stations per watershed, a few research publications indicate that one gauging station per watershed would be sufficient to determine the rainfall input for representative modelling of streamflow. Subsequent to Chow [40] mentioning of a single gauge being capable of representing watershed rainfall, Beven & Hornberger [53] compared lumped Thiessen rainfall of 33 stations with a distributed input approach to investigate the effect of rainfall spatial variability using two rainfall recording experiments. They concluded that, in relatively homogeneous watersheds, the effect of spatial pattern on peak-flow is small and effect on streamflow volumes is relatively minor. Sufficiency of a single gauge for the entire catchment has been supported for the use of small watersheds having a relatively small time of concentration with respect to computational time [43]. A streamflow model with data from a single rainfall station for estimations had produced excellent daily Nash Sutcliffe efficiency values and a good yearround mean monthly streamflow, which can be recommended for policy and management recommendations with respect to climate change impacts on water resources [44]. Although this doesn't mean that an accurate estimation of areal rainfall was done, it should be noted that the streamflow estimations were quite satisfactory with said rainfall computed using a single rainfall station.

Spatial Interpolation Method
The main reason for establishing rain gauging stations or selecting a gauged rainfall data set is to determine the watershed averaged rainfall for water management. Spatially distributed rainfall data provides better streamflow estimates than point records [33]. There are many methods to determine areal average rainfall from the measured point rainfall. Therefore, it is not only important to select the appropriate rain gauging stations but also choose a suitable method for areal averaging. The arithmetic-mean method is the simplest and is satisfactory when the gauges are uniformly distributed. Thiessen method assumes that the rainfall in the watershed is the same as that at the nearest gauge, up to a distance halfway to the next station in any direction. The isohyetal method requires a dense network of gauging stations for accurate representations. Inverse Distance Weighted (IDW) and Spline methods are among other surface interpolation methods for areal rainfall computations.
All methods produce comparable results especially when the time period is long, but vary more from one another when applied to daily rainfall than when applied to annual data [54]. In a similar study, to compare Thiessen, IDW, Thin Plate Spline and Kriging interpolation methods, by Otieno et al. [20] revealed that, at a spatial density of 4.8 stations per 1 km 2 , monthly rainfall estimates from all methods vary only by a maximum of 7%.
Methods for the computation of areal rainfall have a mixed set of opinions. Spline method has been found more suitable for a gently varying surface generation [55], Kriging is the most frequently used for comparative studies [56], IDW method is considered better in comparison with Spline and Kriging [20], [57]. Thiessen method, when compared with IDW, Kriging and Multiquadric Equation Methods, had performed better in the estimation of annual rainfall in semi and arid region of Brazil [58]. At a station density of 373 km 2 /station, the thin plate spline technique proved to provide more accurate results of rainfall estimation than Isohyetal and Thiessen polygon techniques [13]. On the other hand, comparison of mean annual precipitation values computed with radar rainfall data had demonstrated 5-10% lower values when compared with Thiessen averages [59]. As a prerequisite for any application, a proper study of spatial averaging methods for the applicable region has been recommended by Burrough [60].
Thiessen method has attracted many modellers due to its simplicity [14], [58]. Unlike the other interpolation methods which utilize the volume of point rainfall at each time step along with the station geometry, Thiessen method is dependent only on the station geometry. This provides computational ease because the rainfall at each station does not vary with time. Although comparison ofrainfall interpolation methods has cited that the best options are IDW

Optimum Station Influence
In all available options, a watershed manager can identify two overarching concepts. One is the determination of optimum gauging stations based on the characteristics of rainfall, location of the gauging station and considering that any value at a measured location would either remain unchanged or decay with distance. This is the most commonly used concept. The other is the identification of rainfall stations and their influence that would deliver an aerial rainfall which mostly contributes to the observed streamflow from a watershed. This is associated with the optimization of gauging station influence to match the watershed response.
There is a strong need to ensure rainfall gauging station selection considering the performance of streamflow estimation models ensuring minimum modelling error [65].
Observing the errors in streamflow estimation with areal average rainfall, Anctilet al. [16] showed that high-density networks do not always lead to well-performing streamflow estimations due to the rainfall spatial variability. Stating that an ideal rain gauge network would neither be over-saturated with redundant rain gauges nor suffer from lack of rain gauges, Shaghaghian and Abedini [41] show the importance of prioritizing the raingauge stations. In their work, which compared a large number of combinations from a total of 34 gauging stations covering a watershed of 25000 km 2 , it has been concluded that a six-gauge combination as the most contributory option.
Optimization of gauging station weights and Sugawara's Tank model parameters using a single objective function had shown a very good agreement between observed and computed hydrographs [17]- [19]. In the work of Arsenault & Brissette [66], the optimization algorithm had clearly identified that combinations of two or three rain gauging stations can result in better hydrological performance than if a high-density network is fed to the model. Clark & Slater [67] used a locally weighted regression in which spatial attributes from station locations are used as an explanatory variable to predict spatial variability in precipitation.

Discussion
Selection of gauging stations and also the method of areal averaging can be classified as "without scientific reasoning". In literature, it appears that when reporting the rainfall input for streamflow modelling work, many modellers and reviewers do not mention the technique and/or the reason for selecting the method used for areal averaging [36], [44], [68]- [80].
There are different selection options when attempting to compute watershed rainfall from point rainfall measurements. Irrespective of the option, it is accepted that only a set of measurements with very fine spatial resolution would provide the areal average near enough to consider as a representation of the actual rainfall field. It is also accepted that the desirable fine resolutions are far from reality because of the resource constraints and the variability of rainfall fields.
The factors influencing rainfall and its spatial variability are catchment characteristics, temporal variations of rainfall, wind directions etc., while the station density, station distribution, temporal data resolution, catchment size and method of computation are the major influential factors when determining spatial average rainfall. The few available guidelines quantitatively recommend station densities while providing an indication that stations must be well distributed, eventhough the desired distribution dimensions are not specifically indicated. The applications very seldom appear to practice guideline recommendations. Figure 1 shows the reviewed common practice of station selection with respect to catchment extents and the behaviour of accessible guidelines and recommendations for different environment conditions. It illustrates that past research and practices commonly resort to higher spatial densities. Guide of Institute of Water Engineers, WMO on mountainous islands [24] and UK Met office [29], [30] show that station requirements have not been adhered to by many. It is also important to note that upper medium and large catchments have settled to work with lower densities while the small and lower medium catchments show the capability to fulfil the recommendations.

ENGINEER 69 69 ENGINEER
However, these results need to be re-evaluated by including more literature on modelling and guideline practices.
Thresholds between 2.6 and 2442 km 2 /station have been quoted as densities [4], [7], [31], [32], [73], [81], [82] beyond which improvements in streamflow modelling would be marginal irrespective of the catchment size. Recommendations of prevailing guidelines and ongoing research indicate a wide variety of opinions regarding the station densities. Therefore, the most rational option would be to select a suitable station density considering both the time-tested guidelines and the median values of prevailing practice. According to Figure 2 and Table 1, the median density values used for small, medium and large catchments are approximately 19, 117 and 470 km 2 /station respectively as identified in the prevailing watershed studies.
In streamflow modelling, the importance of having a rain gauge close to the centroid of the watershed is studied or discussed in many literature [32], [41]- [43]. Out of all literature surveys, 13 studies specifically mention that evenly or well distributed stations were used for reasonably accurate runoff estimations [4], [7], [15], [21], [32], [33], [53], [76], [81], [83]- [86]. Hence, guideline recommendations to select a well-distributed network having a station close to centroid of the watershed is the best available option with respect to station distribution.
The Arithmetic Average, Thiessen, Kriging, IDW and Spline are the most used in spatial averaging of rainfall. After a review of many recent research works, it was identified that the Thiessen method is the selection of the majority (45%), while Kriging is next with 11%. It is noteworthy that 16% of reviewed research has not mentioned the method used for spatial averaging (Figure 3). It illustrates that, Thiessen method is the most widely accepted method to compute areal average rainfall in many streamflow modelling studies, probably because of the complexity in the use of other methods for daily scale areal streamflow computations for mathematical models.
Very few research works appear to consider that careful gauging station selection and selection of an appropriate averaging method are important for meaningful streamflow modelling. Nevertheless, errors in rainfall estimations are normally compensated by model parameters representing the soil matrix. Therefore, if there are rainfall estimation errors, then they usually hinder the enhancement of scientific knowledge for the incorporation of known physics into hydrologic models. As such, selecting the appropriate gauging stations and averaging method is important not only for better streamflow estimations but also for the advancement of knowledge on catchment hydrology. On the other hand, there are major issues such as discontinued gauging stations, long periods of missing data and inconsistencies in the temporal resolution of available data that hinder a rational station selection. Thus, work on selecting the appropriate gauging stations enabling the optimization of stations and the influenceweights appear as the most rational concept for the estimation of catchment streamflow using mathematical models.

Conclusions
1. Literature guidelines do not provide specific recommendations for the selection of gauging densities and the station distribution. 2. In the absence of recommendations, it is most rational for a watershed manager to either select stations by using either a timetested guideline or resorting to a station number such as the median of prevailing studies. 3. In case of the station distribution, there must be at least one station close to the centroid of the watershed in a well-distributed network. 4. Thiessen method can be adopted as the rational option to compute areal average rainfall in runoff estimation by considering the computational ease, less resource requirements, and acceptance by the practitioners. 5. There is a clear necessity for a structured comparative research to conclusively determine the influence of rain gauging stations and their distribution for streamflow model applications.