Discharge simulations are affected by several sources of uncertainty (e.g. errors in meteorological forcings, sub-optimal parameter estimates).
Data assimilation (DA) allows to combine measurements and model simulations under the assumption that both supply useful information to obtain the most likely estimate of the system state. Among the sequential methods, the ensemble-based techniques allow to explicitly handle different sources of uncertainty and to quantify the unknown errors of both model states and observations.
After the study of Piazzi et al. (accepted), airGRdatassim package (Piazzi and Delaigue 2021) provides functions allowing to perform DA-based ensemble discharge simulations. airGRdatassim is a package based on the airGR hydrological modeling package (Coron et al. 2017, 2021). This package is designed to flexibly use the Ensemble Kalman filter (EnKF) or Particle filter (PF) technique to assimilate daily discharge observations into GR4J, GR5J, and GR6J models (use the command
?airGR to get the references of the GR models).
The uncertainty in both the meteorological forcings (i.e. precipitation and potential evapotranspiration) and model states (i.e. production store level, routing store level and unit hydrographs) can be easily taken into account by enabling specific perturbation procedures.
In this user guide we go through the assimilation procedure with the aim of explaining how to perform DA-based discharge simulations. Along with the description of the modelling system, key recommendations are provided to support the definition of the operational configuration.
airGRdatassim is a package based on the airGR hydrological modeling package (Coron et al. 2017, 2021).
All the examples rely on the following data set, which is available in airGR package:
## Loading required package: airGR
data(L0123001, package = "airGR") head(BasinObs)
## DatesR P T E Qls Qmm ## 1 1984-01-01 4.1 0.5 0.2 2640 0.6336 ## 2 1984-01-02 15.9 0.2 0.2 3440 0.8256 ## 3 1984-01-03 0.8 0.9 0.3 12200 2.9280 ## 4 1984-01-04 0.0 0.5 0.3 7600 1.8240 ## 5 1984-01-05 0.0 -1.6 0.1 6250 1.5000 ## 6 1984-01-06 0.0 0.9 0.3 5650 1.3560
This section introduces and explains the main settings of the DA scheme that the user needs to define.
The ensemble size can be defined by assigning
NbMbr with the number of ensemble members.
It is noteworthy that the PF technique is more sensitive to the ensemble size than the EnKF scheme. Indeed, compared to the EnKF scheme, the PF generally needs a larger ensemble size to efficiently perform the weighting and resampling of particles. It is recommended to generate and use an ensemble of at least 20 or 30 members, when using the EnKF or the PF scheme, respectively.
The DA methodology can be chosen between EnKF and PF by defining
DaMethod. To perform the open-loop simulation (i.e., without DA), the assimilation of discharge observations can be disabled.
DaMethod = "PF", the PF performs the joint update of all the state variables.
DaMethod = "EnKF", the EnKF performs the update of the state variables specified in
DaMethod = "none", discharge assimilation is not performed (i.e., open-loop simulation).
Among the model states, the update of the routing store level ensures the most benefit from the assimilation of observed discharges. Conversely, the update of the unit hydrograph only has a slight impact on the accuracy of the DA-based discharge simulations (Piazzi et al. accepted).
Both the uncetainties in meteorological forcings and state variables can be taken into account in the assimilation scheme.
For both DA techniques, the state variables defined in
StatePert are perturbed.
The perturbation of model states is performed through a normally-distributed null-mean noise. The noise variance is assumed equal to the variance of the state variables resulting from the analysis procedure and it is restricted between upper and lower limits (Salamon and Feyen 2009).
When considering the uncertainty in the forcings, the
CreateInputsPert() function generates an ensemble for precipitation or/and potential evapotranspiration, provided as function argument/s. It is noteworthy that a significant reduction in the ensemble spread may occur especially in no-rain periods, when accounting only for meteorological uncertainty.
Even though a more comprehensive representation of the state uncertainty can prevent the ensemble shrinkage, it has a different impact on the performance of the PF and the EnKF schemes (Piazzi et al. accepted). Indeed, while a higher model error covariance can lead to an overweighting of observations in the EnKF-based analysis procedure, a larger spread of the ensemble simulations allows for a more straightforward weighting and thus a more efficient resampling of particles in the PF scheme.
The example below considers EnKF-based discharge simulations relying on a 100-member ensemble, when accounting for the uncertainty in model state variables (see
StatePert). More specifically, only the levels of the production (
"Prod") and the routing (
"Rout") stores are updated and perturbed (see
It is of key importance to consider that
StateEnKF allows to select the state variables to be updated and
StatePert identifies the state variables to be perturbed in the EnKF scheme. Conversely, in the PF scheme
StatePert identifies the state variables of interest only within the perturbation procedure, as this DA technique jointly updates all the model states.
# ensemble size <- 100L NbMbr # "EnKF" ; "PF" ; "none" <- "EnKF" DaMethod # "Prod": production store level # "Rout": routing store level; # "UH1": unit hydrograph 1 levels (not defined in GR5J model) # "UH2": unit hydrograph 2 levels <- c("Prod", "Rout") StateEnKF <- c("Prod", "Rout") StatePert
When accounting for the uncertainty in meteorological forcings, the
CreateInputsPert() function generates an ensemble for precipitation or/and potential evapotranspiration, provided as function argument/s.
Probabilistic model inputs result from the perturbation of the meteorological data through a multiplicative stochastic noise, according to the methodology proposed by Clark et al. (2008) The random perturbations are provided through a first-order autoregressive model in order to guarantee a physical consistency and a temporal correlation of the time-variant forcings. For the generation of perturbations of both the meteorological variables, the fractional error parameter is set equal to 0.65 and the temporal decorrelation length is defined as 1 day for rainfall and 2 days for potential evapotranspiration, by default.
Deterministic model inputs are still needed for the warm-up period and they are generated through the
CreateInputsModel() function of the airGR package.
In this example, discharge simulations are provided by GR5J model throughout the period from 2006-09-01 to 2006-09-31.
## simulation period <- seq(which(format(BasinObs$DatesR, format = "%Y-%m-%d") == "2006-09-01"), IndRun which(format(BasinObs$DatesR, format = "%Y-%m-%d") == "2006-10-31")) ## preparation of InputsModel object <- CreateInputsModel(FUN_MOD = RunModel_GR5J, DatesR = BasinObs$DatesR, InputsModel Precip = BasinObs$P, PotEvap = BasinObs$E) ## generation of probabilistic model inputs <- CreateInputsPert(FUN_MOD = RunModel_GR5J, InputsPert DatesR = BasinObs$DatesR, Precip = BasinObs$P, PotEvap = BasinObs$E, NbMbr = NbMbr)
Please note that if the uncertainty in meteorological forcings is not taken into account, the same deterministic
InputsModel object is used to force each ensemble member.
The estimate of the observation error is of critical importance, as it defines how the filter trusts the observations and thus to what extent they are assimilated into the model.
By default, the measurement noise is generated from a normal distribution with a zero-valued mean and a variance parameterized as a function of the observed discharge rate (Weerts and El Serafy 2006; Clark et al. 2008). With the aim of preventing underestimated error variances in case of low discharge, the minimum threshold of the observation error variance is evaluated as the quantile 10 of discharge observations, by default (Thirel et al. 2010).
## discharge observations to be assimilated <- BasinObs$Qmm[IndRun]Qobs
We assume that the GR5J model has been previously calibrated by using the
Calibration() function of airGR package. Therefore, we refer to the calibrated values of model parameters as
RunModel_DA() function supplies the ensemble discharge simulations resulting from the sequential assimilation of daily discharge observations via EnKF or PF. Because of the recursive approach, the analysis states resulting from the assimilation procedure at time step
t are used as initial states at the following prediction time step
t + 1.
If the state uncertainty is taken into account (
StatePert), the initial state at the prediction time step
t + 1 is the perturbed analysis state resulting from the assimilation and perturbation procedures at time step
The function provides three outputs:
the ensemble discharge simulations (
the ensemble values of background model states, namely before the filter update (
the ensemble values of analysis model states, namely after the filter update (
The following example supplies probabilistic discharge simulations relying on the EnKF-based assimilation of daily discharge observations.
<- c(X1 = 194.243, X2 = -0.088, X3 = 117.740, X4 = 1.680, X5 = 0.000) Param <- RunModel_DA(InputsModel = InputsModel, ResEnKF InputsPert = InputsPert, Qobs = BasinObs$Qmm, IndRun = IndRun, FUN_MOD = RunModel_GR5J, Param = Param, DaMethod = "EnKF", NbMbr = NbMbr, StateEnKF = StateEnKF, StatePert = StatePert)
As defined in
StateEnKF, in this example the EnKF scheme updates only the levels of the production (
"Prod") and the routing (
"Rout") stores through the assimilation of discharge observations. Along with the uncertainty in meterological forcings (
InputsPert is supplied), also the uncertainty in model states (
StatePert) is taken into account. Therefore, the selected state variables (i.e. the store levels) are consistently perturbed after their update.
The EnKF scheme relies on mass-constraints to guarantee the non-negativity of state variables by ensuring minimum state values. It is noteworthy that the level of production store cannot drop below the 5 percent of its capacity. A further constraint prevents the levels of both the production and the routing stores from exceeding their maximum capacities.
The example below provides ensemble discharge simulations resulting from the daily PF-based assimilation of discharge observations.
The PF scheme relies on the Sequential Importance Resampling (SIR) technique (Gordon, Salmond, and Smith 1993). According to the SIR approach, the importance weights associated to the particles are updated according to the likelihood of particle states with respect to the observed one. A resampling procedure allows to discard particles having a low probability and to replicate those having a high importance weight. If the ensemble is too squeezed to properly evaluate the importance weights, all particles are assigned equal weights.
<- RunModel_DA(InputsModel = InputsModel, ResPF InputsPert = InputsPert, Qobs = BasinObs$Qmm, IndRun = IndRun, FUN_MOD = RunModel_GR5J, Param = Param, DaMethod = "PF", NbMbr = NbMbr, StatePert = "Rout")
Unlike the previous example, here only the uncertainty in the level of the routing store (see
StatePert) is taken into account. Therefore, the routing store level is the only state variable to be perturbed after the updating procedure.
After the state perturbation, a mass-constraint prevents the levels of both the production and the routing stores from exceeding their maximum capacities.
Here the performances of the EnKF- and PF-based modelling systems are assessed against the open-loop run, which provides discharge simulations without the assimilation of discharge observations.
<- CreateRunOptions(FUN_MOD = RunModel_GR5J, RunOptions InputsModel = InputsModel, IndPeriod_Run = IndRun) <- CreateInputsCrit(FUN_CRIT = list(ErrorCrit_KGE, ErrorCrit_NSE), InputsCritMulti InputsModel = InputsModel, RunOptions = RunOptions, Obs = list(Qobs, Qobs), VarObs = list("Q", "Q"), transfo = list("", "sqrt"), Weights = NULL)
# open-loop run (without DA) <- RunModel_GR5J(InputsModel = InputsModel, RunOptions = RunOptions, OutputsModel_OL Param = Param) <- ErrorCrit(InputsCrit = InputsCritMulti, OutputsModel = OutputsModel_OL)CritOL
# EnKF run <- OutputsModel_OL OutputsModel_EnKF $Qsim <- rowMeans(ResEnKF$QsimEns) OutputsModel_EnKF<- ErrorCrit(InputsCrit = InputsCritMulti, OutputsModel = OutputsModel_EnKF)CritEnKF
# PF run <- OutputsModel_OL OutputsModel_PF $Qsim <- rowMeans(ResPF$QsimEns) OutputsModel_PF<- ErrorCrit(InputsCrit = InputsCritMulti, OutputsModel = OutputsModel_PF)CritPF
When comparing the performance of these 3 modelling configurations, it is noteworthy that the assimilation of discharge observations generally succeeds in enhancing the accuracy of discharge simulations, though they rely on different DA settings (i.e. different DA techniques, system uncertainties and updated states).
## Method KGE[Q] NSE[sqrt(Q)] ## 1 OL 0.747 0.771 ## 2 EnKF 0.749 0.677 ## 3 PL 0.864 0.787