`latentcor`

is an `R`

package for estimation of
latent correlations with mixed data types (continuous, binary,
truncated, and ternary) under the latent Gaussian copula model. For
references on the estimation framework, see

Fan, J., Liu, H., Ning, Y., and Zou, H. (2017), “High Dimensional Semiparametric Latent Graphical Model for Mixed Data.”

*JRSS B*.**Continuous/binary**types.Quan X., Booth J.G. and Wells M.T.”Rank-based approach for estimating correlations in mixed ordinal data.”

*arXiv***Ternary**type.Yoon G., Carroll R.J. and Gaynanova I. (2020). “Sparse semiparametric canonical correlation analysis for data of mixed types”.

*Biometrika*.**Truncated**type for zero-inflated data.Yoon G., Müller C.L. and Gaynanova I. (2021). “Fast computation of latent correlations”

*JCGS*.**Approximation method of computation**, see vignette for details.

No R software package is currently available that allows accurate and
fast correlation estimation from mixed variable data in a unifying
manner. The R package `latentcor`

,
introduced here, thus represents the first stand-alone R package for
computation of latent correlation that takes into account all variable
types (continuous/binary/ordinal/zero-inflated), comes with an optimized
memory footprint, and is computationally efficient, essentially making
latent correlation estimation almost as fast as rank-based correlation
estimation.

**Multi-linear interpolation:** Earlier versions of
`latentcor`

used multi-linear interpolation based on
functionality of R package `chebpol`

written by Simen Gaure. This functionality is needed for faster
computations of latent correlations with approximation method. However,
`chebpol`

was removed from CRAN on 2022-02-07. The current
version of `latentcor`

reuses the multi-linear interpolation
part of the `chebpol`

(provided under Artistic-2 license)
integrated directly within `latentcor`

. To cite multi-linear
interpolation only, please use original `chebpol`

.

**Accuracy:** The approximation method for
ternary/ternary, truncated(zero-inflated)/ternary, and ternary/binary
cases are less accurate close to boundary (zero proportions) due to size
limitations of CRAN packages on the pre-stored grid. If higher accuracy
is desired and original method is computationally prohibitive, latencor
is also available as Python package with Github development
python version

To use `latentcor`

, you need to install `R`

. To enhance your
user experience, you may use some IDE for it (e.g. `RStudio`

).

The development version of `latentcor`

is available on GitHub. You can download it with the help of the
`devtools`

package in `R`

as follow:

```
install.packages("devtools")
::install_github("https://github.com/mingzehuang/latentcor", build_vignettes = TRUE) devtools
```

The stable release version `latentcor`

is available on CRAN. You can download it in `R`

as
follow:

`install.packages("latentcor")`

A simple example estimating latent correlation is shown below.

```
library(latentcor)
# Generate two variables of sample size 100
# The first variable is ternary (pi0 = 0.3, pi1 = 0.5, pi2 = 1-0.3-0.5 = 0.2)
# The second variable is continuous.
# No copula transformation is applied.
= gen_data(n = 1000, types = c("ter", "con"), XP = list(c(0.3, .5), NA))$X
X
# Estimate latent correlation matrix with the original method
latentcor(X = X, types = c("ter", "con"), method = "original")$R
# Estimate latent correlation matrix with the approximation method
latentcor(X = X, types = c("ter", "con"))$R
# Speed improvement by approximation method compared with original method
library(microbenchmark)
microbenchmark(latentcor(X, types = c("ter", "con"), method = "original"),
latentcor(X, types = c("ter", "con")))
# Unit: milliseconds
# min lq mean median uq max neval
# 5.3444 5.8301 7.033555 6.06740 6.74975 20.8878 100
# 1.5049 1.6245 2.009371 1.73805 1.99820 5.0027 100
# This is run on Windows 10 with Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz 3.20 GHz
# Heatmap for latent correlation matrix.
latentcor(X = X, types = c("ter", "con"), showplot = TRUE)$plotR
```

Another example with the `mtcars`

dataset.

```
library(latentcor)
# Use build-in dataset mtcars
= mtcars
X # Check variable types for manual determination
apply(mtcars, 2, table)
# Or use built-in get_types function to get types suggestions
get_types(mtcars)
# Estimate latent correlation matrix with original method
latentcor(mtcars, types = c("con", "ter", "con", "con", "con", "con", "con", "bin",
"bin", "ter", "con"), method = "original")$R
# Estimate latent correlation matrix with approximation method
latentcor(mtcars, types = c("con", "ter", "con", "con", "con", "con", "con", "bin",
"bin", "ter", "con"))$R
# Speed improvement by approximation method compared with original method
library(microbenchmark)
microbenchmark(latentcor(mtcars, types = types, method = "original"),
latentcor(mtcars, types = types, method = "approx"))
# Unit: milliseconds
# min lq mean median uq max neval
# 201.9872 215.6438 225.30385 221.5364 226.58330 411.4940 100
# 71.8457 75.1681 82.42531 80.1688 84.77845 238.3793 100
# This is run on Windows 10 with Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz 3.20 GHz
# Heatmap for latent correlation matrix with approximation method.
latentcor(mtcars, types = c("con", "ter", "con", "con", "con", "con", "con", "bin",
"bin", "ter", "con"), showplot = TRUE)$plotR
```

Interactive heatmap see: interactive heatmap of latent correlations (approx) for mtcars

- Contributions and suggestions to the software are always welcome. Please consult our contribution guidelines prior to submitting a pull request.
- Report issues or problems with the software using github’s issue tracker.
- Contributors must adhere to the Code of Conduct.

We thank Dr. Grace Yoon for providing implementation details of the
`mixedCCA`

R
package.