Data Duration Options of a Practicing Water Manager when Developing a Model for Sustainable Water Resources Development

: Water scarcity emphasizes the importance of water resources planning and management. Monthly hydrologic models are commonly used because in practice, monthly time step is the choice for planning activities. Reliability of the model simulations depends upon the nature of the data. Data quality is important and data should be representative of the watershed processes. The information contained in the data and careful extraction is also important. A detailed review of available guidelines and research findings to identify the best data duration option available for water managers to calibrate and verify a monthly streamflow estimation model for sustainable water resources management was carried out. The data length selection for monthly water resources models varies between 12 and 780 months and the reasons for selection is mostly individual preference. This study exhibits a qualitative evaluation using five key criteria. Data length use in research and guidelines indicates that 20 to 40 year long data sets are the most favoured for monthly model applications. A total data length of range 20-30 years or greater should be utilized for monthly water resources model development. It may be prudent to select at least a 20-year data period for model calibration based on the analysis.


1.1.
Modelling Water Resources Water has become a very scarce resource because of the effects of population growth, industrialization, urbanization and climate change. Hence the gap between the demand and supply of water keeps widening [1]- [3]. The increasing demand for the scarce water resource requires systematic, meaningful and sustainable water resources management and planning. A hydrologic model is the common tool available for water managers to execute this task in a meaningful manner. Hydrologic models vary from simple regression type to detailed physics-based process representations. Irrespective of the type, these models require sufficiently long time series input data such as rainfall and evaporation. Many literature and text book recommendations on the subject of determining model parameters by means of model calibration and verification clearly mention that characteristics and length of data must be evaluated for parameter reliability [4]- [11].
Monthly time scale is the most commonly utilized temporal resolution for practical water resources planning and management [12], [13].
Therefore, most hydrologic mathematical models attempting to fulfil this purpose apply monthly input data to calculate monthly outputs. These monthly models vary between regression and physics-based process type representations. Among the popular types are the monthly water balance models which rationalize their model computations by using principle of mass conservation and other relationships which are closer to known physics [12]- [18]. Monthly water balance models are popular for water resources management, reservoir simulation, drought assessment, longterm drought forecasting, climate change impact evaluation etc. [12]- [14], [16].

1.2.
Data Length and Modelling Recommendations on data length for monthly water balance model development become the first priority of a watershed manager who has completed the model selection. Data length comprises two sections. One is the calibration data length and the other is the verification data length. Therefore, the total data length is the addition of calibration and verification data lengths.
There is research that had been conducted on the methods of splitting the dataset for model calibration and verification [19]. Zheng et al. [20] in a related study had considered the splitting of the total dataset as 80% for calibration and 20% for verification. Guidelines on mathematical modelling indicate that a comprehensive data set would include long-term streamflow measurements and rainfall and PET data collected at one or more locations within the catchment, along with land use coverage, vegetation cover and impervious area information, including changes over time [21]. WMO guideline [22] states that daily or even monthly river flow data may be sufficient for assessing regional water security; multidecadal records of continuous rainfall are necessary for reliable frequency analysis; and a few years of data may be sufficient to calibrate a groundwater model. Recommendations on Hydrologic Practices [23], while indicating the practice of British Meteorological Office for the computation of average annual rainfall that considers a 30 year minimum duration, state that flow records of 20-30 years are required to provide the streamflow pattern representative of the flow regime. For continuous modelling, an observed input containing sufficient events to calibrate all parameters has been recommended [8]. Literature on the selection of data duration for monthly models does not indicate firm recommendations [24]- [26]. Using a daily mathematical model and variety of data lengths, Gupta and Sorooshian [27] indicated that a data length of 500 to 1000 points would achieve acceptable precision of optimized parameters and hence 3-4 years of daily data would be sufficient for model calibration. Interpretation of this result would mean that for monthly modelling a dataset of over 40 years would be required. Subsequently, a study by Yapo et al. [28] recommends a 8-year dataset for calibration and then states that the wettest period of the dataset would identify improved parameters. Loucks and van Beek [29], discussing sensitivity and uncertainty of model parameters, indicate that longer datasets may not lead to better results. This raises the question whether the number of data points or whether the hydrological characteristics of a watershed are important when selecting a dataset for modelling. On the other hand, there are occasions that cite different data periods are suitable to identify different parameters [30]. Data availability and interpolation without sufficient data also creates a significant impact on parameter estimation [31].

Problem in Practice
Though there is a desire to use long data lengths, non-availability of data often compels modellers to compromise on the data length in order to fulfil the number of gauging stations required for the model to be realistic. Boughton [17], in his work, cites many other data issues influencing the decision on data length that is used for modelling. It appears that in most modelling work, all available data are used without concerning resource requirements [7]. The present practice of monthly water balance models also does not demonstrate the use of significantly long datasets for model calibration. There is literature indicating the use of 2-5 years of monthly data for model calibration [32]- [36]. A comparative study of 203 watersheds by Chang & Chen [37] had used a 10-year dataset for calibration and verification of monthly watershed models.
In this backdrop, the selection of data length for monthly water resources modelling gives rise to many difficulties. Wide variety of opinions in the available guidance material, constraints due to missing data in datasets, poor quality of data, and constraints due to data accessibility are some of the critical issues faced by practicing watershed managers. Hence, there is a clear necessity to comparatively evaluate available material to identify the best datalength-option when attempting to use an off the shelf model field level operation. Accordingly, the main objective of the present work is to evaluate available guidelines and research findings to identify the best data duration option available for water managers to calibrate and verify a monthly streamflow estimation model for sustainable water resources management.

2.1.
General A reviewed literature search was conducted to identify the publications available for a practicing water manager to select the appropriate data length for a water resources modelling application. Guidelines and research publications with direct and indirect guidance on data lengths were summarized for a comprehensive review. Direct publications are those which clearly recommended data length for a particular water resource management purpose. Indirect publications are the reported reviewed publications which convey the use of a data length for a specific water resources application.
The investigation included 63 publications which contained 275 and 138 monthly and daily water management applications providing assistance to select a data length for hydrologic modelling.

2.2.
Water Resources Models Length of a dataset for rational water resources modelling is dependent upon the temporal resolution of data used for model computations. Therefore, recommendations on data length depend on the type of model, parameter characteristics and the number of parameters. There are several key factors about models and model parameters that a practitioner needs to be aware when investigating the appropriate model application for a particular application.
Water resources modelling is mostly for planning and management of water as a resource.
It is common to carry out these activities using results presented in monthly time scale. This is done in two ways. One is to perform computations and modelling in the daily time scale and then aggregate the results to monthly resolution. The other is to use monthly scale conceptualizations to model and generate monthly and coarser resolution results. This could be noted in the number of literature that was available for review. Daily and monthly models have their own advantages and disadvantages. In this work the length of data used for monthly and daily applications were evaluated to compare the number of data points used for calibration and verification of each model type. Also, daily models provide guidance on parameters to be looked at during applications. The present work concentrated on the assessments to capture the data length selection requirements for monthly water resources modelling applications.
The adequacy of monthly streamflow estimations for planning and management of water resources is common knowledge [16], [23], [38], [39]. Monthly continuous streamflow models play a major role in watershed management due to many reasons such as: 1) water resources planning and management practice is on monthly or longer time scales; 2) climate change effects on vegetation can be easily specified on a monthly or seasonal scale, daily or hourly; 3) large space scale seeking only averages of watershed characteristics can be selected; and 4) monthly hydroclimatological data are most readily available [40]. Apart from the above, monthly water balance models have specific advantages, such as simple mathematical representations which are closer to known physics, sufficiency of either precipitation only or precipitation, temperature and evaporation as inputs and having a lesser number of parameters compared to finer time resolution models [41]- [44].
Monthly water balance models are used to forecast the monthly streamflow or seasonal yield of a catchment. A monthly hydrograph smoothens out the streamflow fluctuations visible at hourly or daily temporal resolutions. The variations in finer resolution measurements are due to variations of watershed resistance to runoff and storage characteristics. Monthly models provide cumulative effects of actual phenomena aggregated over calendar months. Due to the aggregation of finer variations the monthly models enable simplifications when handling flow and storage. Therefore, monthly models are comparatively simpler and have a smaller number of parameters when compared with daily and hourly models [12].
Monthly water balance models have a long history dating back to 1940s. Thornthwaite and Mather [51] and Alley [14] developed a twoparameter monthly water balance model (T Model). Palmer [52] developed a water balance model (P Model) similar to Thornthwaite and Mather [51], dividing soil moisture into two layers. A four parameter ABCD water balance model combined with soil moisture storage was developed by Thomas [53]. Mimikou et al. [38] and Vandewiele et al. [16] describe several other monthly water balance models developed for Water Resources Planning and Management.
This makes it clear that the available models, their merits and demerits must be carefully evaluated prior to selecting a suitable monthly water resources model and an appropriate data length.

2.3.
Parameter Estimation Parameter estimation is the selection of values for the parameters, so that the model will match the watershed system as closely as possible [42]. Model calibration is the process which determines the model parameters based on the available data and prior knowledge [54]. Among many types of models ranging from black box to physics based, there are model parameters which are either physically interpretable or non-interpretable [43].
A desired characteristic of a particular hydrologic model is the capability to simulate its target hydrologic processes under the hydrologic conditions expected to be experienced by a watershed.
Therefore, model parameters estimated should be unique, realistic and independent of the data being used for calibration. Needs and characteristics of model parameters have been discussed by many. It is known that if the models are based on basic principles of physics (mass and energy conservation), then estimation of model parameters is straightforward [9]. However, watershed heterogeneity, such as nonuniformity of vegetation, slopes, existence of macro pores, together with unresolved spatial and temporal variability of meteorological variables, limits the applicability of "known physics"-based models either to laboratories or to well observed small experimental catchments but with limited success.
Therefore, calibration of models with a higher number of parameters would require to select a suitable data length in order to achieve the desired modelling accuracies and representativeness [9]. Practically, calibration is done with a representative dataset and the errors between observed and simulated data are minimized. This is then followed by model verification with an independent dataset in order to ensure the representativeness of the model.
As a result of practical constraints with data, process conceptualization and objective functions, model parameters are inevitably data dependent [5]. This hints that a search for data lengths required for model applications would require to keep the parameter influence at a low level by selecting a model with a few parameters. In case of data limited situations, a model with a few parameters is usually preferred by hydrologists. This is because higher number of parameters increase the complexity of models and this in turn tends to decrease the model performance [8]. Hence the determination of data length must be linked with the number of parameters associated with the model.

2.4.
Selection of Data Length A practitioner searching for suitable data lengths has two options. They are either to obtain assistance from available guidelines or to evaluate reviewed research publications for acceptable recommendations. under water data temporal frequency states requirement of continuous high frequency rainfall and streamflow data for flood forecasting, and the sufficiency of monthly river flow data to assess regional water security. Also, it mentions that under water data longevity of measurements, multi-decadal records of continuous rainfall are necessary for reliable rainfall Intensity Duration Frequency (IDF) analysis.

Guideline Suggestions
Australian Rainfall and Runoff (ARR) "A Guide to Flood Estimation", guideline updated in 2016 [58], states that calibration data length should be covering a range of different flood conditions in order to confidently use calibration results in data scarce situations. This guideline further states that the parameter uncertainty decreases as the length of the data increases, and increases when the length of data decreases.
More than a 10-year period is indicated as suitable for flood modelling. In cases where less than 10 years of data is available then regional approaches are recommended as more appropriate for computing base-flow contribution to design flood estimates. ARR goes onto mention that in an ideal situation, more than 10 years of continuous streamflow data is required to perform detailed site-specific analysis. However, ARR guideline before the update required at least 15 years of flood data for flood analysis.
Guideline issued by Department of Water Resources in Rajasthan, India [59] states that minimum of 25 years of monthly rainfall data and runoff data of maximum available period is required for yield studies. Guide to Hydrological Practices (WMO-No. 168) (1994) [60] states that at least 30 years of data is required to obtain a representative relationship between rainfall and runoff.
WMO/TD -No.554 (WMO 1993) Hydrological Design Data Estimation Techniques issued by Czech Hydrometeorological Institute states about entry data sets of example studies done and mention the use of 50 years for monthly data.
Daggupati et al. [61] mentioning expensive resource requirements indicate the need to use shorter datasets instead of lengthier datasets. Discussing the needs for model development, Razavi and Tolson [62] recommend the use of one third of a long dataset for calibration while using a longer dataset for model verification. Zheng [20] splits the data set as 80% data for model calibration and 20% data for model verification based on common practice, e.g. May et al. [63]; Wu et al. [19]. A modellers' guide for rainfall runoff modelling by Kherde [64] recommends a one year dataset for the warm up period, a 10 year data set for calibration and a minimum of two years data for validation.
The variety of guideline statements indicated above, also necessitates a careful review of ongoing research to determine an appropriate data length for water resources model applications.

Research Recommendations
A comparison of reviewed research publications displays the recommendation of data lengths between one year and 30 years for model calibration. A monthly dataset of 60 years enabling 30 years each for model calibration and verification has been recommended as the best modelling option for Nile River [65]. A comparative monthly water balance modelling effort where 5 nonoverlapping 10-year data periods had been evaluated on 5 models and on 10 sites, had reported verification with 40 years of data [14]. Xu and Vandewiele [45], working on 91 catchments in Belgium and China having extents between 16 km 2 to 3626 km 2 , and using 2, 5, 10, 15, 20 years long monthly data from a 158 year data series, concluded that a 10 year data period is necessary for an adequate model calibration. Selecting six events for calibration and 24 events for validation of simple and complex runoff models in semi-arid watersheds, Michaud and Sorooshian [66] concluded that a minimum data length of 15 years is required for calibration. GÖrgens [46] in a research on two hydrologic models for a semi-arid 73.1 km 2 catchment with monthly datasets of 3, 6, 10, 15 and 20 years, also concluded that 15 years of monthly data is required for a reliable model optimization. 30 years of data had been identified as the minimum requirement for realistic streamflow estimation to monitor droughts at two Indian catchments [67]. After a modelling study of 7 catchments in USA, Haan [32] stated that at least one and preferably two or three years of observed monthly flows are required for acceptable parameter estimation.
Burn and Elnur [68] recommended a minimum of 20 years of data for a reference hydrometric basin network. Kahya and Kalayci [69] in their work indicated that a 30 year period is long enough to compute a valid mean statistic. Nyunt et al. [70] carried out a climate change study and utilized 20 years of data for modelling. Similarly, for water management purposes, data lengths that had been used in research vary from 1 year to 65 years [7], [10], [73], [12], [14], [25], [32], [42], [45], [71], [72]. Sorooshian and Gupta [71] suggest that for any calibration procedure to be successful, the data should be representative of the various natural phenomena experienced by the watershed during a complete seasonal hydrological cycle. They also suggest choosing a wet year for calibration to activate all the model parameters. James [8] recommends collecting all rainfall and dry events in a dataset to calibrate the parameters, thereby ascertaining that the model is sensitive to all the events in the watershed.
After carrying out a research in Great Usuthu Catchment Gan, Dlamini and Biftu [5] suggested that wet years are preferred over dry years as calibration data, because dry years would not contain sufficiently high flows to excite the models.
There are many recommendations in the research carried out for daily water management applications.
Perrin et al. [10], working with 12 catchments varying between 1021 km 2 and 4421 km 2 and using 39 years of daily data, stated that 350 days chosen randomly are sufficient to obtain robust estimates for the model. A sample size greater than 500 days has been reported as a threshold to achieve acceptable precision of model parameters [27]. A comparison of data lengths on a 407 km 2 watershed has highlighted the importance of daily data sets greater than 2-5 years for long term runoff estimation [17]. Anctil et al. [7], in their work for the Serein River Basin in France (1120 km 2 ), concluded that best model performance was with 3 and 5 year calibration data sets. Sufficiency of at least 8 years of daily data for model calibration has been reported in an application using 55 watersheds having extents between 51-1891 km 2 by Li et al. [74] and by Gupta et al. [25] in their work for the Leaf river basin (1944 km 2 ). It is noteworthy that Ye et al. [26] had discussed the inadequacy of a 5 year dataset for the modelling of low yielding watersheds in Australia. Need of at least one hydrologically appropriate year for the activation of model parameters has been highlighted by several researchers [5], [24], [75]. Choosing a suitable data length for modelling is balancing the length of dataset and the quality of data [24]. Data length options between shorter periods with high quality data and longer datasets with lesser quality must be appropriately chosen to improve the precision of parameters [27].

3.1.
Data Length and Quality Success of model calibration process depends upon the nature of the data used. Erroneous input data causes a significant adverse impact on model calibration [76]. The dataset used should be representative of the watershed processes. Even though most researchers had tried to achieve the representativeness of data by using longest available data lengths, the information contained in data and the way of extracting these information is considered as more important than the length of data series [24]. Increased data lengths increase the computational burden to obtain the best parameter set and the cost of data acquisition. Hence the requisite parameter precision, computational constraints and data costs must be optimized to manage the desired quality of data [27]. Poor quality of data leading to unsatisfactory hydrologic modelling is well reflected in the study of 75 Belgian catchments by Vandewiele and Elias [72]. Rainfall data must not contain significant errors, and even random errors in a rainfall series significantly affect model performance and parameter values [77].
Xu and Vandewiele [45], having performed a sensitivity analysis for input data errors in 91 catchments of Belgium and China, concluded that random errors in rainfall data negatively influence model performance and that systematic errors are less important for the estimation of streamflow.
However, the systematic errors do have a significant influence on model parameter values and consequently on the estimation of other components of water balance. On the other hand, Das and Bárdossy [11] quoting Chaplot et al. [78] state that models have the capacity to achieve performance criteria by compensating input errors that are within a reasonable range by adjusting their parameters values.

3.2.
Missing Data Issues Many research and work in projects use data without missing data periods and work of Anctil et al. [7] which focused only on periods without missing data is an example. They go on to mention that in most research the length of available observation series is rarely addressed. Missing data is often a major issue when attempting to acquire long datasets. This is especially because of the need to have a common dataset for all gauging stations representing the spatial variability of rainfall within a project area. To overcome the issue of non-continuous data in poorly gauged catchments, Perrin et al. [10] suggested investigations on model behavior with noncontinuous data. Adhering to an appropriate data filling technique has been suggested as a measure to overcome the poor performance of models due to lack of long datasets adequately representing the variations in ground elevation and geography of a project area [11]. Though data length requirements are not specifically mentioned, many text books mention methods to fill the missing data in order to ensure common datasets for hydrological computations. There are commonly used data filling methods mentioned in text books.  [85].
Filling with the data from nearest available station has been used in an 8-year long daily dataset for a SWAT model application at six watersheds in North Ohio [86]. In a case study of 144 sub watersheds in 10 separate watersheds, Liew et al. [87] had used the inverse distance weighing method to fill the missing daily data of 31 gauges with the help of four closest gauging stations by constructing a 23 year and 8-year long common dataset. Khandu [88] has used a linear interpolation technique to fill missing rainfall data for a 48 year data set for a two parameter monthly water balance model development. Dissanayake [89], Artan et al. [90] and Gutierrez-Magness and McCuen [91] had filled the missing data to carry-out hydrologic modelling, water balance modelling, and rainfall runoff modelling by using 8 and 15, 13, and 15 year long datasets, respectively.

Common Data Periods
The data usage for hydrologic modelling varies with the purpose, type of model and model resolution. Occasions where similar studies on the same watershed had utilized different sets of data are shown in Table 1. In this summary, the study by Bastiaanssen and Chandrapala [92] is an exception because the reported work is based on a remote sensing satellite based evaluation. The choice of gauging stations for the same watershed had varied even with the use of same data resolution and the most likely reason for such differences could be the difficulties to access data. As indicated previously, these reasons could be the location preferences, financial limitations, data release constraints, time availability access data, period of interest etc. Figure 1 shows that, in a particular study area, the gauging stations and data duration have differed, thus hinting that the selection of either one of these parameters could influence the other. It also can be noted that the monthly studies have used greater data durations. This could be due to lesser cost of monthly data, and relatively easy accessibility pertaining to monthly data.  Figure 2, Figure 3 and Figure 4, respectively. The summary table with key parameters and corresponding references is in Table 2. Mixture of applications and studies listed in Table 2 indicate that a wide variety was subjected to the evaluation.

4.2.
Monthly Water Resources Modelling The present review identified 48 monthly water resources models of varying complexities. Of these models, two models had been the center of attraction ( Figure 5). They are the PE Model [45] and the 2P Model [12] which indicated application percentages of 25.7 and 22.2 respectively. A summary of mostly used 10 models for monthly water resources modelling is in Table 2 and in Figure 5

Models and Number of Parameters
Parameter use in daily and monthly water resource management models is shown in Figure 6. Daily models due to model complexities require a greater number of parameters than monthly models and this is clearly shown by the summary of applications. In general, the monthly model parameter number varies between 2 and 5. The same for daily models is approximately between 3 and 10.
In most models commonly used for monthly water resources management applications, number of parameters varies between 2 and 4. A summary of parameters in the most used 10 models are shown in Figure 7 and in Table 3.

General
Data length used for model development is important to excite a model during parameter estimation and then to verify the model for its representativeness over hydrologic variability. Though this is amply discussed in literature, many reviewed literature neglects to explicitly indicate either the rationale for selection of data length or the data period used for a particular application. Figure 8 and Figure 9 show the use of data length against the time of exceedance of daily and monthly scale watershed applications. The verification data length curve reaches zero data length at an approximate time of exceedance value of 58% and 41% for calibration and verification, respectively. This shows a lesser prominence given either to perform model verification or when mentioning the verification period. The summary could also be interpreted that the importance is given to calibration and hence verification had not been carried out. This supports the notion that calibration period is of greater importance when models are developed [20].

Calibration Data Length
Individual exceedance curves for calibration and verification cases are in Figures 10 to 14. Shape of the exceedance curve for calibration data length enables the identification of less frequent data usage. The less frequently used long data lengths, shown by highly varying slope of small time of exceedance, are approximately between 0% and 10% time of exceedance for calibration data periods. This segment with highly variable slope over a short % time of exceedance, is followed by a data length range with a shorter bandwidth having milder slopes and spanning over a longer % time of exceedance. This represents the most frequently used data lengths and provides an indication of the percentage usage. Subsequent to this section with a stable slope is another rapidly varying slope which corresponds to less frequent short data lengths. The data length ranges that had been used in applications are summarized in Table 4.
Daily models showed that Calibration data duration has respective values of 21, 19 and 19 years as the maximum, minimum and median of the watersheds subjected to the review between 0 to 10% times of exceedance. Similarly, the maximum, minimum and median of the reviewed watersheds are 19, 0.96 and 5 years respectively for % time of exceedance between 10 -85%. Maximum, minimum and median of the reviewed watersheds greater than 85% exceedance is 0.96, 0.96 and 0.96 years, respectively.
Monthly models showed that calibration data duration has respective values of 65, 20 and 24.5 years as the maximum, minimum and median of the watersheds subjected to the review between 0 to 10% times of exceedance. Similarly, the maximum, minimum and median of the watersheds reviewed are 20, 10 and 10 years respectively for % time of exceedance between 10 -85%. Maximum, minimum and median of the reviewed watersheds greater than 85% exceedance is 10, 1 and 6 years, respectively.

Verification Data Length
The exceedance curve for verification data lengths shows the long and short less frequent data length usage at either end. The threshold values interpreted from the curve are shown in Table 5.

Total Data Length
The exceedance curve for total data lengths shows the long and short, less frequent data length usage at either end. Threshold values interpreted from the curve are shown in Table 6.

4.6.
Catchment Size All case applications reviewed in the present work were plotted to identify the behaviour between the data duration and the size of catchment. Semi logarithmic plots of data length against catchment size indicated a clustering of catchments in Figure 15, Figure 16, and Table 7. Number of applications against the data length is also indicated under each category.
A majority of applications in most of the considered watershed range indicated that the median value is between 10 and 19 years. It is felt reasonable to conclude that maximum data length that had been considered by many for monthly applications is in the order of 30 years and that the most common data length could be approximated as 15 years.

Evaluation
The present review revealed that there is a clear deficiency of supporting material for the selection of an appropriate data length to carry out a monthly water resources model development.
Available documentation pointed to the criteria that lead to making a rational decision. The following discussion presents the reasoning associated with the selection of criteria while facilitating the determination of available data length alternatives.

5.1.
Model Excitation Hydrological models require the calibration datasets to representatively include both wet and dry periods to achieve the precision of model parameters. In case of daily models, even a single representative year has been found as adequate. Models require data not only for model excitation but also to carry out the fitting of mathematical equations representing the watershed hydrology. Reviewed literature did not point to a minimum data length for the excitation of monthly models. One year of daily data contains 365 data points. Therefore, the minimum number of data points to calibrate a model can be considered similar to a daily model. On the other hand, a single water year in daily scale may contain sufficient hydrological variations. In case of monthly data, 360 points cover 30 years and it is felt that this would be more than sufficient to represent the hydrologic variation in monthly scale. Therefore, it is felt that even a 20-30 year period containing over 240 data points could be safely considered as a highly favourable data length when considering the model excitation requirements. Using a similar rationalization, a data length period between 10-20 years taken as moderately favourable while data lengths less than 10 years were considered as least favourable.

5.2.
Data Quality Literature recognizes the need for good quality data for successful model development. In this context, quality represents both the quality of measurements and the quality of hydrological representativeness. The latter is important for the model excitation. Quality of measurements is important for the representation of the reality. Missing data is a huge challenge when attempting to preserve these characteristics. Modellers, who target the acquisition of a common data period for several rainfall, streamflow, evaporation and other climate data, are often faced with the missing data obstacle. It is a very much easier task to capture shorter common data periods without missing data. Therefore, shorter the desired data length the higher is the chance of successfully developing a watershed model.
Hence in the present study, identification of data length selection options was based on the allocation of a lower preference rank to longer datasets. This, in other words, considers that the increase of data length is inversely proportionate to the quality of input data.

Precision of Parameters Precision of parameters depends on the model calibration and verification.
The review identified that many research work around the globe recommends a lengthy dataset for calibration. In certain occasions there had been recommendations to split the entire data set to allocate 80% for calibration and 20% for verification. Therefore, in case of calibration, not only longer data length classes lead to better model development but also exercise a higher weight on better model development. Even though the priority is low relative to calibration, longer model verification data leads to better model development. The evaluation of the present practice indicated that for shorter data lengths, verification had a much lower weightage compared to calibration. Selection of mid-range data lengths showed that calibration had received greater priority over verification. In the longer datasets it was noted that equal status had been granted for both calibration and verification.
Therefore, in the present evaluation of data length options for model development, the calibration data length options were assigned a one-step higher weightage than the verification data length options.

5.4.
Model Complexity Complexity of a hydrologic model increases with the advances in improving mathematical descriptions of hydrological processes. Improved mathematical representations to represent the numerous physical processes observed in the real world inevitably increases the number of model parameters. Optimization of an increased number of model parameters require lengthier datasets which possess better chances of reflecting the variety of process characteristics embedded in the model. The number of parameters and data length relationship in monthly water resources model applications reveal an increasing trend in the data length when the number of parameters increase.

5.5.
Practice amidst Constraints The practice of selecting a data length in monthly water resources applications was captured during the review. In most of the applications, the selection of data length has not been justified. The behaviour of data length selection in peer reviewed publications is considered as a reflection of executing model development and application amidst practical constraints.
Peer reviewed publications, by nature, confirm the trustworthiness of the contents together with scrutiny and acceptance by experts in the same area of work. Therefore, the current selection trend was qualitatively ranked to identify a suitable data length option.

5.6.
Assessment of Options The present study exhibits the possibility of executing a qualitative evaluation using five key criteria. At the outset it is important to identify the available decision alternatives. Data length use in research and guidelines indicate that 20 to 40 yearlong datasets are the most favoured for monthly model applications. In order to meaningfully determine the best data length, it is necessary to evaluate the merits and demerits of various data lengths by dividing the options into shorter ranges. Therefore, the data length selection options were divided into five smaller Likert scale ranges to represent the responses as, "most preferred", "preferred", "acceptable", "low preference" and "very low preference". The five data range options which in turn form the decision alternatives are <10, 10-20, 20-30, 30-40 and >40 years, respectively. Decision criteria, alternatives, responses and a summary of the rationalization used to assign responses are given in Table 8.
Responses for each criterion were coded numerically by assigning values to measure the preferences, such as, 9="most preferred", 7="preferred", 5="acceptable", 3="low preference" and 1="very low preference". Table 9 presents the numerical values and the normalized indicator for each decision alternative. The data ranges greater than 10-20 years were found as acceptable. The most preferred options were to use a data range greater than 30-40 years. Normalised ranks show that a water resources modeller making attempts to decide on the data length for a water resources management modelling application should at least select a data length between 20-30 years.

Conclusions
 The qualitative analysis carried out using the existing literature which is described in analysis indicates that a total data length of range 20-30 years or greater should be utilized for monthly water resources model development.  The analysis for the literature review also highlights the need for the use of majority of data for model calibration. Therefore, it may be prudent to select at least a 20-year data period for model calibration.  Decision alternative selection discussed above was based on the observations made during the review. It is necessary to carry out well designed research to determine design alternatives at finer data range.  Data length selection for model development decisions should consider model excitation, data quality, precision of parameters, model complexity and current practice as decision criteria.