Infinite Mixtures of Infinite Factor Analysers
IMIFA v2.0.0 - (6th release [major update]: 2018-05-01)
mcmc_IMIFA by consolidating arguments using new helper functions (with defaults):
- Args. common to all factor-analytic mixture methods & MCMC settings supplied via
- MGP & AGS args. supplied via
mgpControl for infinite factor models.
- Dirichlet/Pitman-Yor Process args. supplied via
bnpControl for infinite mixture models.
- Storage switch args. supplied via
- New functions also inherit the old documentation for their arguments.
- Posterior predictive checking overhauled: now MSE, RMSE etc. between empirical & estimated covariance
matrices are computed for every retained iteration so uncertainty in these estimates can be quantified:
- Can be switched on/off via the
error.metrics argument to
- Can be visualised by supplying
- For methods which achieve clustering, the ‘overall’ covariance matrix
is now properly computed from the cluster-specific covariance matrices.
- Same metrics also evaluated at posterior mean parameter estimates & for final sample where possible.
mixfaControl gains the arg.
prec.mu to control the degree of flatness of the prior for the means.
- Posterior confusion matrix now returned (
get_IMIFA_results) & visualisable (
plot.meth="zlabels"), via new function
post_conf_mat, to further assess clustering uncertainty.
- Added new type of clustering uncertainty profile plot in
- For convenience,
get_IMIFA_results now also returns the last valid samples for parameters of interest,
after conditioning on the modal G & Q values and accounting for label switching and Procrustes rotation.
plot.Results_IMIFA gains new arg.
show.last that replaces any instance of showing the posterior mean
with the last valid sample instead (i.e. when
- Added ability to constrain mixing proportions across clusters using
equal.pro argument for M(I)FA models:
PGMM_dfree accordingly and forced non-storage of mixing proportions when
- All methods now work for univariate data also (with apt. edits to plots & uniqueness defaults etc.).
sim_IMIFA_data also extended to work for univariate data, as well as sped-up.
- Retired args.
mgpControl, replaced by ability to specify more general gamma prior,
phi.hyper arg. specifying shape and rate -
mgp_check has also been modified accordingly.
Zsimilarity sped-up via the
cltoSim functions s.t. when # observations < 1000,
z.avgsim=TRUE now by default in
get_IMIFA_results (when suggested
mcclust package is loaded).
- Matrix of posterior cluster membership probabilities now returned by
- Modified AGS to better account for when the number of group-specific latent factors shrinks to zero.
psi.alpha no longer needs to be strictly greater than 1, unless the default
psi.beta is invoked;
thus flatter inverse gamma priors can now be specified for the uniquenesses via
- Added “
hc” option to
z.init to initialise allocations via hierarchical clustering (using
- Allowed optional args. for functions used to initialise allocations via
mu argument to
sim_IMIFA_data to allow supplying true mean parameter values directly.
- Standard deviation of
bicm model selection criteria now computed and returned.
- Speed-ups due to new
Rfast utility functions:
- Speed-ups due to utility functions from
matrixStats, on which
IMIFA already depends.
- Slight improvements when
adapt=FALSE for infinite factor models with fixed high truncation level.
- Misclassified observations now highlighted in 1st type of uncertainty plot in
plot.meth="zlabels" and the true
zlabels are supplied.
mixfaControl gains arg.
drop0sd to control removal of zero-variance features (defaults to
cex.lab argument to control magnification of legend text.
mat2cols gains the
PGMM_dfree to include the 4 extra models from the EPGMM family.
get_IMIFA_results will now match the cluster labels and parameters to
the true labels even if there is a mismatch between the number of clusters in both.
- Similarly, supplying
plot.meth="zlabels" no longer does
any matching when printing performance metrics to the screen - previously this caused confusion
as associated parameters were not also permuted as they are within
plot(get_IMIFA_results(sim), plot.meth="zlabels", zlabels=z) gives different results from
plot(get_IMIFA_results(sim, zlabels=z), plot.meth="zlabels") as only the latter will permute.
- Accounted for errors in covariance matrix when deriving default
- Accounted for missing empirical covariance entries within
- Fixed model selection in
get_IMIFA_results for IMFA/OMFA models when
range.Q is a range.
- Fixed calculation of
dic criteria: all results remain the same.
- Fixed support of Ga(a, b) prior on
discount is being learned.
- Fixed bug preventing
- Fixed treatment of exact zeros when plotting average clustering similarity matrix.
- Fixed tiny bug when neither centering nor scaling (of any kind) are applied to data within
- Fixed plotting of posterior mean scores when one or more clusters are empty.
- Fixed storage switches to account for
- Fixed bug with default plotting palette for data sets with >1024 variables.
- Fixed bug with label switching permutations in
get_IMIFA_results when there are empty clusters.
- Fixed bug when plotting posterior mean loadings heatmap when one or more clusters have zero factors.
summary functions for objects of class
- Fixed calculating posterior mean
zeta when adaptively targeting
alpha’s optimal MH acceptance rate.
alpha be tiny for (O)M(I)FA models (provided
z.init != "priors" for overfitted models).
- Normalised mixing proportions in
get_IMIFA_results when conditioning on
G for IM(I)FA/OM(I)FA models.
- New controls/warnings for excessively small Gamma hyperparemeters for uniqueness/local shrinkage priors.
- Clarified recommendation in
alpha.d2 be moderately large relative to
sigma.mu hyperparameter arg. is always coerced to diagonal entries of a covariance matrix.
- Transparency default in
plot.Results_IMIFA now depends on device’s support of semi-transparency.
- Replaced certain instances of
inherits(x, "list") for stricter checking.
check.margin=FALSE to calls to
PGMM_dfree are now properly vectorised.
USPSdigits data set (training and test),
with associated utility functions
- Optimised compression of
coffee and vignette data and used
stop() messages and
immediate.=TRUE to certain
- Removed dependency on
- Reduced dependency on
Rfast w/ own versions of
- Added utility function
IMIFA_news for accessing this
- Extensively improved package documentation:
Collate: field to
- Added line-breaks to
usage sections of multi-argument functions.
- Consolidated help files for
IMIFA v1.3.1 - (5th release [patch update]: 2017-07-07)
- Fixed bug preventing M(I)FA models from being treated as (I)FA models when
range.G contains 1.
- Fixed bug preventing
get_IMIFA_results from working properly when true labels are NOT supplied.
IMIFA v1.3.0 - (4th release [minor update]: 2017-06-22)
- Added options
as well as being either diagonal or isotropic (UUU / UUC), uniquenesses can now further be
constrained across clusters (UCU / UCC), with appropriate warnings, defaults, checks,
initialisations, computation of model choice penalties, and plotting behaviour in all 4 cases.
mcmc_IMIFA gains the
tune.zeta argument, a list of
target parameters, to invoke
diminishing adaptation for tuning the uniform proposal to achieve a target acceptance rate when
is learned via Metropolis-Hastings when the Pitman-Yor Process prior is employed for the IM(I)FA models.
- (I)FA models sped up by considering uniquenesses under 1-cluster models as
rather than previously
"isotropic", utilising pre-computation and empty assignment.
- Previously hidden functions improved, exported and documented with examples:
make argument, merging it with previously hidden function
Thus the ‘nearest’ positive-(semi)definite matrix and the usual check can be returned in a single call.
- Sped-up sampling IM(I)FA labels, esp. when ‘active’ G falls to 1, or the dependent slice-sampler is used:
log.like arg. removed from
gumbel_max; function stands alone, now only stored log-likelihoods computed.
psi argument added to
sim_IMIFA_data to allow supplying true uniqueness parameter values directly.
density is invoked for plotting (
bw="nrd0" is invoked if this fails).
- Fixed initialisation of uniquenesses for
isotropic (I)FA models.
- Fixed parallel coordinates plot axes and labels for all
isotropic uniquenesses plots.
- Fixed adaptation for MIFA/OMIFA/IMIFA models when all clusters simultaneously have zero factors.
- Fixed storage bug in IM(I)FA models when
- Fixed density plot for
discount when mutation rate is too low (i.e. too many zeros).
- Fixed simulation of loadings matrices for empty MIFA/OMIFA/IMIFA clusters using
Loop to simulate loadings matrices now generally faster also for all models.
- Fixed silly error re: way in which (I)FA models are treated as 1-cluster models to ensure they run:
Related bug fixed for OM(I)FA/IM(I)FA models when starting number of clusters is actually supplied.
IMIFA v1.2.1 - (3rd release [patch update]: 2017-05-29)
- Posterior mean scores can now also be plotted in the form of a heat map (previously loadings only).
load.meth argument replaced by logical
compare argument to yield common palettes/breaks for heat maps of multiple matrices:
plot_cols function also fixed, and now unhidden.
- Removed certain dependencies with faster personal code: e.g. Procrustes rotation now quicker:
IMIFA no longer depends on the
par()$bg (i.e. default
"white") for plotting zero-valued entries of similarity matrix.
- Range of data for labelling in
heat_legend calculated correctly.
verbose argument now governs printing of
cat calls, but not
- Fixed storage and plotting of loadings, particularly when some but not all clusters have zero factors.
NEWS.md to build.
IMIFA v1.2.0 - (2nd release [minor update]: 2017-05-09)
- Learning the Pitman-Yor
alpha parameters via Metropolis-Hastings now implemented.
param argument gains the option
discount for posterior inference.
- Sped up simulating cluster labels from unnormalised log probabilities using the Gumbel-Max trick (Yellott, 1977):
gumbel_max replaces earlier function to sample cluster labels and is now unhidden/exported/documented.
- Added new plot when
plot.meth=GQ for OM(I)FA/IM(I)FA models depicting trace of #s of active/non-empty clusters.
- Added function
Zsimilarity to summarise posterior clustering by the sampled labels with minimum
squared distance to a sparse similarity matrix constructed by averaging the adjacency matrices:
when optionally called inside
get_IMIFA_results, the similarity matrix can be plotted via
- Metropolis-Hastings updates implemented for
discount is non-zero, rather than usual Gibbs.
Mutation rate monitored rather than acceptance rate for Metropolis-Hastings updates of
- Fixed calculation of # ‘free’ parameters for
bic.mcmc criteria when uniquenesses are isotropic:
PGMM_dfree, which calculates # ‘free’ parameters for finite factor analytic mixture models is exported/documented.
This function is also used to add checks on the Dirichlet hyperparameter for OM(I)FA methods.
- DIC model selection criterion now also available for infinite factor models (previously finite only).
G_priorDensity now better reflects discrete nature of the density, and plots for non-zero PY discount values.
- Posterior mean loadings heatmaps now also display a colour key legend via new function
- Avoided redundant simulation of stick-breaking/mixing proportions under both types of IM(I)FA slice sampler.
- Simulated (finite) mixing proportions w/ Gamma(alpha, 1) trick (Devroye 1986, p.594) instead of
rDirichlet replaces earlier function to sample mixing proportions and is now unhidden/exported/documented.
- Deferred setting
dimnames attributes in
get_IMIFA_results: lower memory burden/faster simulations.
- Jettisoned superfluous duplicate material in object outputted from
get_IMIFA_results to reduce size/simplify access.
trunc.G arg, the max allowable # active clusters, defaults to
range.G and # active clusters now stored.
- Code sped up when
active G=1 by not simulating labels for IM(I)FA models.
- Reduced chance of crash by exceeding memory capacity;
score.switch defaults to
FALSE if # models ran is large.
- 2nd IM(I)FA label switching move sped up/properly weighted to ensure uniform sampling of neighbouring cluster pairs.
- Offline label switching square assignment correction now permutes properly.
- Fixed factor score trace plots by extracting indices of stored samples using
Rfast::sort_unique and rotating properly.
- Fixed adding of
rnorm columns to scores matrix during adaptation, esp. when widest loadings matrix grows/shrinks.
- Fixed initialisation (and upper limit) of number of clusters for OM(I)FA/IM(I)FA, esp. when
N < P.
- Updates of DP/PY
alpha parameter now correctly depend on current # non-empty rather than active clusters.
- Fixed density plots for parameters with bounded support, accounting for spike at zero for
- Slightly rearranged order Gibbs updates take place, esp. to ensure means enter simulation of uniquenesses properly.
- Edited/robustified subsetting of large objects when storing
- Tightened controls for when certain parameters are not stored for posterior inference.
- Edited Ledermann upper bound
stop(...) for finite factor models to
- Geometric rather than arithmetic mean used to derive single rate hyperparameter for PPCA’s isotropic uniquenesses.
- Uniquenesses now stored correctly for all clustering methods.
- Indices of uncertain obs. returned (
plot.Results_IMIFA) even when
zlabels not supplied.
- Fixed behaviour of progress bar when
- Fixed typos and expanded/clarified help documentation/vignette.
IMIFA v1.1.0 - (1st release: 2017-02-02)