The package **uGMAR** contains tools to estimate and analyze univariate Gaussian mixture autoregressive (GMAR), Student’s t mixture autoregressive (StMAR) and Gaussian and Student’s t mixture autoregressive (G-StMAR) models. We refer to these three models as the GSMAR models. This vignette does not explain details about the models and it’s assumed that the reader is familiar with the cited articles introducing the models. There are currently no published references for the G-StMAR model, so it will be discussed briefly in the section “The G-StMAR model”.

The models in **uGMAR** are defined as class `gsmar`

S3 objects whose can be created with the estimation function `fitGSMAR`

or with the constructor function `GSMAR`

. The created `gsmar`

objects can then be conveniently used as main arguments in several other functions, allowing one for example to perform quantile residual based model diagnostics, simulate from the processes, and to forecast. It’s thus easy to carry out further analyses after the model has been estimated. Some tasks, however, such as setting up initial population for the genetic algorithm, applying linear constraints or building `gsmar`

models with pre-specified parameter values, require knowledge of the model details such as the form of the parameter vector. These details will be therefore explained in this vignette.

The rest of this vignette is organized as follows. In the first section it’s explained what the G-StMAR model is and when and why one should use it. In the second section notations for the parameter vector are described in detail, and it’s also shown how to apply constraints to autoregressive parameters of the models. Finally, in the third section some useful functions provided by **uGMAR** are briefly described.

To motivate introduction of the G-StMAR model, observe that the conditional variances of the component processes of the StMAR model depend on the past observations through the same parameters as the conditional means. This specific formulation is required to obtain the stationary Student’s t autoregressions, and it also enables the model to capture stronger forms conditional heteroskedasticity than the Gaussian case. However, this relation between the component specific conditional variance and mean might be restrictive if some regimes exhibit strong conditional mean but weak conditional variance (or vice versa). If such case occurs in a series modeled with the StMAR model, the degrees of freedom parameters of the corresponding regimes will get large estimates, for it makes the component process’s conditional variance of approach a constant variance parameter (which equals conditional variance of the linear Gaussian autoregressions that the GMAR model considers as its component processes). Moreover, since a large degrees of freedom parameter value makes the shape the innovations’ distribution to resemble Gaussian distribution, it seems natural to allow these regimes to be Gaussian, resulting the G-StMAR model.

The G-StMAR model can be described as a combination of the GMAR and the StMAR model. Its first `M1`

component processes are taken to be linear Gaussian autoregressions as in the GMAR model, and the rest `M2`

component processes are taken to be conditionally heteroscedastic linear Student’s t autoregressions as in the StMAR model, yielding total of `M1+M2=M`

mixture components. Theoretical and practical properties of the G-StMAR model are similar to the ones of the GMAR model and the StMAR model. The conditional and also stationary distribution of a G-StMAR process is a mixture of Gaussian distributions and Student’s t distributions.

Besides reducing redundant parameters from the model, using the G-StMAR model instead of the StMAR model with large degrees of freedom parameters also comes with further advantages. One problem with very large degrees of freedom parameters is that their profile log-likelihoods are very flat near the estimates implying very low information indicating a near-indentification problem. Moreover, very large degrees of freedom parameter values cause inconsistent numerical inaccuracies in calculation of the log-likelihood function values and thus makes numerical approximations its derivatives biased. Approximate standard errors (obtained from numerically approximated observed information matrix) therefore become spurious and the first and second order conditions inaccessible for checking whether an estimate denotes a maximum point. The problems disappear when one reduces the problematic degrees of freedom parameters by switching to a G-StMAR model. Furthermore, **uGMAR**’s implementation of the quantile residual tests proposed by Kalliovirta (2012) is not applicable for models with such near-identification problem.

Building a GSMAR model requires the user to specify the autoregressive (AR) order of the model `p`

and the number of mixture components `M`

. For the G-StMAR model one has to define the number of GMAR type components `M1`

and the number of StMAR type components `M2`

. If one wished to build a model with pre-specified parameter values rather than estimating them, knowledge of the exact form of the parameter vector is obviously necessary. In **uGMAR**, the form of the parameter vector depends on specifics of the model: is GMAR, StMAR or G-StMAR model considered, are all the AR coefficients restricted to be the same for all regimes and/or are linear constraints applied to the AR-parameters? It’s vital to use the correct type of parameter vector accordingly.

In the following, the intercept parametrization with intercept parameters \(\phi_{m,0}\) is considered. One may alternatively use the mean parametrization; in that case, one simply needs to replace each intercept parameter with the corresponding mean parameter \(\mu_m=\phi_{m,0}/(1-\sum_{i=1}^p\phi_{i,m}),\enspace m=1,...,M.\)

The parameter vector for unconstrained GMAR model is a size *(M(p+3)-1)x1* vector of the form \[\boldsymbol{\theta}=(\boldsymbol{\upsilon_{1}},...,\boldsymbol{\upsilon_{M}}, \alpha_{1},...,\alpha_{M-1}),\quad where\] \[\boldsymbol{\upsilon_{m}}=(\phi_{m,0},\boldsymbol{\phi_{m}}, \sigma_{m}^2) \enspace and \enspace \boldsymbol{\phi_{m}}=(\phi_{m,1},...,\phi_{m,p}) ,\quad m=1,...,M.\] Symbol \(\phi_{m,i}\) denotes an AR coefficient, \(\sigma_m^2\) is a variance parameter and \(\alpha_m\) a mixing weight parameter.

For the StMAR model, the parameter vector has to be expanded to include the degrees of freedom parameters. The parameter vector for unconstrained StMAR model is thus a size *(M(p+4)-1)x1* vector of the form \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{1},...,\nu_{M})\] contains the degrees of freedom parameters and the parameter \(\boldsymbol{\theta}\) is as in the case of the GMAR model. To ensure existence of finite second moments, the degrees of freedom parameters \(\nu_{m}\) are assumed to be larger than \(2\).

In the G-StMAR model the first `M1`

components are GMAR type and the rest `M2`

components are StMAR type. Parameter vector of the G-StMAR model is similar to the one of the StMAR model but with `M2`

degrees of freedom parameters for the StMAR components. That is, a size `(M(p+3)+M2-1)x1`

vector of the form \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{M1+1},...,\nu_{M})\] contains the degrees of freedom parameters and the parameter \(\boldsymbol{\theta}\) is as in the case of the GMAR model. As in the StMAR case, the degrees of freedom parameters are assumed to be larger than two.

In addition to unconstrained GSMAR models, `uGMAR`

gives an option to analyze restricted models whose AR coefficients \(\phi_{m,1},...,\phi_{m,p}\) are restricted to be the same for all regimes \(m=1,..,M\). Structure of the parameter vector is different for restricted and non-restricted models.

Parameter vector of the restricted GMAR model is a size `(3M-p+1)x1`

vector of the form \[\boldsymbol{\theta}=(\phi_{1,0},...,\phi_{M,0},\boldsymbol{\phi},\sigma_{1}^2,...,\sigma_{M}^2,\alpha_{1},...,\alpha_{M-1}), \quad where \quad \boldsymbol{\phi}=(\phi_{1},...,\phi_{p}).\]

Parameter vector of the restricted StMAR model is then defined by adding the degrees of freedom parameters, yielding a size `(4M-p+1)x1`

vector of the form \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{1},...,\nu_{M})\] again contains the degrees of freedom parameters and parameter \(\boldsymbol{\theta}\) is as in the case of the GMAR model.

Parameter vector of the restricted G-StMAR model is similar to the StMAR model’s one but with `M2`

degrees of freedom parameters for the StMAR type components.

So one will have to work with different kind of parameter vectors depending on whether you work with restricted or non-restricted model. In order to restrict the AR parameters or to implicate that the parameter vector is restricted, one needs to supply the considered function with the argument `restricted=TRUE`

.

**uGMAR** makes it easy to apply linear constraints to the autoregressive parameters of GSMAR models. Considering *non-restricted* models, each mixture component has its own constraint matrix. **uGMAR** considers constraints of the form \[\boldsymbol{\phi_{m}}=\boldsymbol{C_{m}\psi_{m}}, \enspace m=1,...,M,\] where \(\boldsymbol{C_{m}}\) is a known size \((pxq_{m})\) constraint matrix of full column rank and \(\boldsymbol{\psi_{m}}\) is a size \((q_{m}x1)\) parameter vector. Observe that this particular specification for linear constraints is not the most general one. However, it keeps the constraint matrices small and simple, and it’s convenient for applying the most typical constraints such as constraining some of the AR coefficients to be zero. For instance, in order to constraint the second AR coefficient of the second regime to zero in a model with `p=2`

and `M=2`

, the constraint matrix for the first regime is `diag(2)`

implying no constraints, while the constraint matrix for the second regime is simply `matrix(1:0)`

.

For further illustration, consider the following special case of linear constraints. We obtain a mixture version of the Heterogenous Autoregressive model (see Corsi 2009 for the original version) by setting \[\boldsymbol{C_{m}}=\left[{\begin{array}{ccc}
\boldsymbol{\iota}_{5} & \frac{1}{5}\boldsymbol{1}_{5} & \frac{1}{22}\boldsymbol{1}_{5} \\
0_{17} & 0_{17} & \frac{1}{22}\boldsymbol{1}_{17} \\
\end{array}}\right],\] where \(\boldsymbol{\iota}_{5}=[1,0,0,0,0]'\) for all regimes \(m=1,...,M\) and applying the constraints to the `GMAR(22,M)`

model.

In order to apply the linear constraints in **uGMAR**, one simply needs to parametrize the model with vectors \(\boldsymbol{\psi_{m}}\) instead of \(\boldsymbol{\phi_{m}}\) and provide the constraint matrices \(\boldsymbol{C_{m}}\) in the argument `constraints`

(or if one estimates the parameters, only the constraint matrices need to be provided). Note that despite the lengths of \(\boldsymbol{\psi_{m}}\), the nominal order of AR coefficients is always \(p\) for all regimes.

Similarly to the case of unconstrained GMAR model, parameter vector for the constrained GMAR model is of the form \[\boldsymbol{\theta}=(\boldsymbol{\upsilon_{1}},...,\boldsymbol{\upsilon_{M}}, \alpha_{1},...,\alpha_{M-1}),\] but now the vectors \(\boldsymbol{\upsilon_{m}}\) are defined by using the vectors \(\boldsymbol{\psi_{m}}\), that is, \[\boldsymbol{\upsilon_{m}}=(\phi_{m,0},\boldsymbol{\psi_{m}}, \sigma_{m}^2) \enspace and \enspace \boldsymbol{\psi_{m}}=(\psi_{m,1},...,\psi_{m,q_{m}}), \enspace m=1,...,M.\] The user has to also provide a list of constraint matrices \(\boldsymbol{R_{m}}\) that satisfy \(\boldsymbol{\phi_{m}}=\boldsymbol{R_{m}\psi_{m}}\) for all \(m=1,...,M.\)

Parameter vector for the constrained StMAR model is defined by simply adding the degrees of freedom parameters to the GMAR’s parameter vector, that is, \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{1},...,\nu_{M}),\] and \(\boldsymbol{\theta}\) is as in the case of constrained GMAR model.

Parameter vector for the constrained G-StMAR model is similar to the one of the constrained StMAR model, but with degrees of freedom parameters for the StMAR components only.

Analogously non-restricted models, the parameter vectors for constrained versions of restricted GSMAR models are defined by simply replacing vector \(\boldsymbol{\phi}\) with vector \(\boldsymbol{\psi}\). Hence the parameter vector for restricted and constrained GMAR model has the form \[\boldsymbol{\theta}=(\phi_{1,0},...,\phi_{M,0},\boldsymbol{\psi},\sigma_{1}^2,...,\sigma_{M}^2,\alpha_{1},...,\alpha_{M-1}), \quad where \quad \boldsymbol{\psi}=(\psi_{1},...,\psi_{p}).\] The constraint matrix \(\boldsymbol{C}\) needs to be provided and it’s assumed to satisfy \(\boldsymbol{\phi}=\boldsymbol{R\psi}.\)

Parameter vector for the restricted and constrained StMAR model is then again defined by adding the degrees of freedom parameters, that is \((\boldsymbol{\theta}, \boldsymbol{\nu})\) where \(\boldsymbol{\nu}=(\nu_{1},...,\nu_{M}).\) For the restricted and constrained G-StMAR model, the parameter vector is similar to the one of restricted and constrained StMAR model but with degrees of freedom parameters for the StMAR components only.

The function used to estimate models in `uGMAR`

is `fitGSMAR`

. It estimates the model parameters using the method of maximum likelihood and employs a hybrid estimation scheme that is performed in two phases. In the first phase `fitGSMAR`

uses a genetic algorithm to find starting values for the gradient based variable metric algorithm, which it then uses in the second phase for finalize the estimation. It’s important to note that it’s not guaranteed that the numerical estimation algorithms end up in the global maximum point rather than a local one or a saddle point. Because of multimodality and challenging surface of the log-likelihood function, it’s actually expected that many of the estimation rounds won’t find the global maximum point. For this reason one should always perform multiple estimation rounds since more estimation rounds yield more reliable result. The number of estimation rounds can be chosen with the argument `ncalls`

but multiple estimation rounds is also performed default. To shorten the estimation time, **uGMAR** uses parallel computing to run multiple estimation rounds in parallel. The number of cores used can be set with the argument `ncores`

.

There is also an option to perform some quantile residual tests for the estimated model to get a quick sense on how the model fits to the data.

If the model estimates poorly, it’s often because the number of mixture components is chosen too large. One may also adjust settings of the genetic algorithm employed, or set up an initial population with guesses for the estimates. This can by done by passing arguments in `fitGSMAR`

to the (non-exported) function `GAfit`

which implements the genetic algorithm. To check the available settings, read the documentation `?GAfit`

. If the iteration limit is reached when estimating the model, the function `iterate_more`

can be used to finish the estimation.

The parameters of the estimated model are printed in an illustrative and easy to read form. In order to easily compare approximate standard errors to certain estimates, it’s advisable to use the `summary`

method, which prints the errors inside brackets next to the estimates. Numerical approximation of the gradient and Hessian matrix of the log-likelihood function at the estimates can be obtained conveniently with the functions `get_gradient`

and `get_hessian`

. The estimated objects also have their own plot method.

Use the function ‘stmar_to_gstmar’ in order to conveniently switch from a StMAR model with large degrees of freedom estimates to the corresponding G-StMAR model.

**uGMAR** considers model diagnostics based on quantile residuals (see Kalliovirta 2012). Quantile residuals are asymptotically standard normal distributed if the model is correctly specified, and they can be hence used for graphical diagnostics and testing.

The function `quantileResidualTests`

performs the quantile residual tests introduced by *Kalliovirta (2012)*, testing for normality, autocorrelation and conditional heteroscedasticity. For graphical diagnostics, one may use the functions `diagnosticPlot`

and `quantileResidualPlot`

.

Consider installing the suggested package `gsl`

for much faster evaluations of quantile residuals in the cases of StMAR and G-StMAR models. If the model and data are both large, performing quantile residuals tests may take significantly long time for StMAR and G-StMAR models without the package `gsl`

because numerical integration is used. It’s not imported because, in our experience, it might not install to some platforms directly when installing **uGMAR**.

One may wish to construct an arbitrary model without estimating the parameters, for example in order to simulate from the particular process of interest. An arbitrary model can be created with the function `GSMAR`

. If one wants to add or update data to the model afterwards, it’s advisable to use the function `add_data`

.

The function `simulateGSMAR`

is the one for the job. As the main argument it uses a `gsmar`

object created with `fitGSMAR`

or `GSMAR`

.

We advice to directly use the function `simulateGSMAR`

for quantile based forecasting. However, **uGMAR** contains the predict method `predict.gsmar`

for forecasting GSMAR processes. For one step predictions using the exact formula for conditional mean is supported, but the forecasts further than that are based on independent simulations. The predictions are either sample means or medians and the confidence intervals are based on sample quantiles. The objects generated by `predict.gsmar`

have their own plot method.

For analysing multivariate versions of the model, you are welcome to try the package `gmvarkit`

. It currently supports the GMVAR model which is the multivariate extension of the GMAR model.

- Corsi F. 2009. A Simple Approximate Long-Memory Model of Realized Volatility.
*Journal of Financial Econometrics*,**7**, 174-196. - Kalliovirta L., Meitz M. and Saikkonen P. 2015. Gaussian Mixture Autoregressive model for univariate time series.
*Journal of Time Series Analysis*,**36**, 247-266. - Kalliovirta L. 2012. Misspecification tests based on quantile residuals.
*The Econometrics Journal*,**15**, 358-393. - Meitz M., Preve D., Saikkonen P. 2018. A mixture autoregressive model based on Student’s t-distribution. arXiv:1805.04010
**[econ.EM]**. - There are currently no published references for the G-StMAR model, but it’s a straightforward generalization with theoretical properties similar to the GMAR and StMAR models.