In this vignette we’ll look at how to use rrscale to re-scale data and help discover latent effects.

First we are going to generate data that is the concatenation of two log-normal groups. We do this by taking the outer product between two i.i.d. log-normal vectors, creating group 1 as:

```
set.seed(919)
u1 = rlnorm(10)
v1 = rlnorm(10)
Y1 = u1%*%t(v1)
```

and similarly group 2 as

```
u2 = rlnorm(10)
v2 = rlnorm(10)
Y2 = .5+u2%*%t(v2)
```

and then we concatenate these together to make a full data matrix (adding some noise)

```
Y_nn = rbind(Y1,Y2)
Y = Y_nn + array(rlnorm(prod(dim(Y_nn)),0,.05),dim(Y_nn))
```

Notice that its difficult to tell the groups apart:

```
library('reshape2')
library('ggplot2')
group = factor(rep(c(1,2),each=nrow(Y)/2))
levels(group) = c("group1","group2")
mY = melt(data.frame(Y,group),id.vars="group")
ggplot(data=mY,mapping=aes(x=value,color=group))+geom_histogram(bins=100)+geom_vline(data=aggregate(value~group,data=mY,mean),mapping=aes(xintercept=value,linetype=group),size=1.5)
```

Indeed if we look a t-test between the row means across groups we see no difference

`t.test(rowMeans(Y)[group=="group1"],rowMeans(Y)[group=="group2"])`

```
##
## Welch Two Sample t-test
##
## data: rowMeans(Y)[group == "group1"] and rowMeans(Y)[group == "group2"]
## t = -1.6241, df = 13.639, p-value = 0.1272
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.9623864 0.6916029
## sample estimates:
## mean of x mean of y
## 2.930059 5.065451
```

Let’s try this after transforming the data with rrscale:

```
library('rrscale')
scl = rrscale(Y,run_parallel=FALSE)
```

after running this we get an estimated transformation to help recover the latent group effect. The element “T_name” tells us that the best transformation is a box-cox-like transformation

`scl$T_name`

`## [1] "box_cox_negative"`

and the element “par_hat” tells us the optimal value for the parameter to this transformation:

`scl$par_hat`

`## [1] -0.5206409`

we can grab the pre-computed RR transformation from the call to rrscale

`trans_Y = scl$RR`

or we can use the returned “rr_fn” to calcluate this transformation, they are identical

```
trans_Y2 = scl$rr_fn(Y)
all(trans_Y2==trans_Y,na.rm=TRUE)
```

`## [1] TRUE`

Notice that if we plot the transformed Y we see that the group difference is easier to see:

```
tmY = melt(data.frame(trans_Y,group),id.vars="group")
ggplot(data=tmY,mapping=aes(x=value,color=group))+geom_histogram(bins=100)+geom_vline(data=aggregate(value~group,data=tmY,mean),mapping=aes(xintercept=value,linetype=group),size=1.5)
```

indeed the t-test is now significant

`t.test(rowMeans(trans_Y)[group=="group1"],rowMeans(trans_Y)[group=="group2"])`

```
##
## Welch Two Sample t-test
##
## data: rowMeans(trans_Y)[group == "group1"] and rowMeans(trans_Y)[group == "group2"]
## t = -2.7412, df = 16.971, p-value = 0.01394
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.432490 -0.186342
## sample estimates:
## mean of x mean of y
## -0.4046867 0.4047291
```

If we plot he first two PCs for the transformed and un-transformed data we can see the group difference much better after transformation:

`plot(svdc(Y)$u[,1:2],col=group)`

`plot(svdc(trans_Y)$u[,1:2],col=group)`

Here we are using the “svdc” function from the rrscale package which calculates ’completed" right and left singular vectors in the presence of missing values. We can also look at the canonical correlation between the group and the first two PCS for the transformed and untransformed data:

`cancor(model.matrix(~1+group),svdc(Y)$u[,1:2])`

```
## $cor
## [1] 0.7074969
##
## $xcoef
## [,1]
## groupgroup2 0.4472136
##
## $ycoef
## [,1] [,2]
## [1,] -0.3962615 1.6031726
## [2,] 0.9192846 0.7643497
##
## $xcenter
## (Intercept) groupgroup2
## 1.0 0.5
##
## $ycenter
## [1] -0.16540613 -0.08246504
```

`cancor(model.matrix(~1+group),svdc(trans_Y)$u[,1:2])`

```
## $cor
## [1] 0.9776598
##
## $xcoef
## [,1]
## groupgroup2 0.4472136
##
## $ycoef
## [,1] [,2]
## [1,] 0.731373 -0.6820391
## [2,] 0.856082 0.9079851
##
## $xcenter
## (Intercept) groupgroup2
## 1.0 0.5
##
## $ycenter
## [1] -0.001640214 -0.133762711
```

and we can see that it is much higher for the transformed data signifiying these principal components capture the latent group structure better.