This **R** package performs association tests between
the observed data and their systematic patterns of variation. Systematic
variation can be modeled by latent variables, that are likely arising
from biological processes, experimental conditions, and environmental
factors. We are often interested in estimating these patterns using
principal component analysis (PCA), factor analysis (FA), K-means
clustering, partition around medoids (PAM), and related methods. The
jackstraw methods learn over-fitting characteristics inherent in
unsupervised learning, where the observed data are used to estimate the
systematic patterns and to be tested again.

Using a variety of unsupervised learning techniques, the jackstraw provides a resampling strategy and testing scheme to estimate statistical significance of association between the observed data and their systematic patterns of variation. For example, the cell cycle in microarray data may be estimated by principal components (PCs); then, we can use the jackstraw for PCA to identify genes that are significantly associated with these PCs. On the other hand, cell identities in single cell RNA-seq data are identified by K-means clustering; then, the jackstraw for clustering can evaluate reliability of computationally determined cell identities.

The jackstraw tests enable us to identify the variables (or
observations) that are driving systematic variation, in an unsupervised
manner. Using **jackstraw_pca**, we can find statistically
significant variables with regard to the top r principal components.
Alternatively, **jackstraw_kmeans** can identify the
variables that are statistically significant members of clusters. There
are many functions to support statistical inference for unsupervised
learning, such as finding a number of PCs or clusters and estimating
posterior probabilities from jackstraw p-values. Furthermore, this
package includes more general and experimental algorithms such as
**jackstraw_subspace** for the dimension reduction
techniques and **jackstraw_cluster** for the clustering
algorithms.

*Chung, N.C.* (2020) Statistical significance of cluster
membership for unsupervised evaluation of cell identities.
Bioinformatics, 36(10): 3107–3114
https://academic.oup.com/bioinformatics/article/36/10/3107/5788523

*Chung, N.C.* and *Storey, J.D.* (2015) Statistical
significance of variables driving systematic variation in
high-dimensional data. Bioinformatics, 31(4): 545-554
https://academic.oup.com/bioinformatics/article/31/4/545/2748186

To use a stable version from CRAN:

`install.packages("jackstraw")`

Bioconductor dependencies may fail to automatically install, e.g., lfa, gcatest, qvalue

This would result in a warning.:

```
: package or namespace load failed for ‘jackstraw’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
Error there is no package called ‘lfa’
```

To solve this problem, please install them manually.

```
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
::install(c('lfa','gcatest','qvalue')) BiocManager
```

This package is in active development.

To install the jackstraw from GitHub:

```
install.packages("devtools")
library("devtools")
install_github("ncchung/jackstraw")
```