# Which quantity to measure functional dependency?

#### 2018-02-20

Given an input contingency table, fun.chisq.test() offers three quantities to evaluate non-parametric functional dependency of the column variable $$Y$$ on the row variable $$X$$. They include test statistic (functional chi-square $$\chi^2_f$$), statistical significance ($$p$$-value), and effect size (function index $$\xi_f$$).

We explain their differences in analogy to those statistics returned from cor.test(), the R function for the test of correlation, and the $$t$$-test. We chose both tests because they are widely used and well understood. Another choice could be the Pearson’s chi-square test plus a statistic called Cramer’s V, analogous to correlation coefficient, but not as popularly used. The table below summarizes the differences among the quantities and their analogous counterparts in correlation and $$t$$ tests.

Comparison of the three quantities returned from fun.chisq.test().
Quantity Measure functional dependency? Affected by sample size? Affected by table size? Measure statistical significance? Counterpart in correlation test Counterpart in two-sample $$t$$-test
$$\chi^2_f$$ Yes Yes Yes No $$t$$-statistic $$t$$-statistic
$$p$$-value Yes Yes Yes Yes $$p$$-value $$p$$-value
$$\xi_f$$ Yes No No No correlation coefficient mean difference

The test statistic $$\chi^2_f$$ measures deviation of $$Y$$ from a uniform distribution contributed by $$X$$. It is maximized when there is a functional relationship from $$X$$ to $$Y$$. This statistic is also affected by sample size and the size of the contingency table. It summarizes the strength of both functional dependency and support from the sample. A strong function supported by few samples may have equal $$\chi^2_f$$ to a weak function supported by many samples. It is analogous to the test statistic (not to be confused with correlation coefficient) in cor.test(), or the $$t$$ statistic from the $$t$$-test.

The $$p$$-value of $$\chi^2_f$$ overcomes the table size factor and making tables of different sizes or sample sizes comparable. However, its null distribution (chi-square or normalized) is only asymptotically true. It is analogous to the role of the $$p$$-value of cor.test().

The function index $$\xi_f$$ measures only the strength of functional dependency normalized by sample and table sizes without considering statistical significance. When the sample size is small, the index can be unreliable; when the sample size is large, it is a direct measure of functional dependency and is comparable across tables. It is analogous to the role of correlation coefficient in cor.test(), or fold change in $$t$$-test for differential gene expression analysis.

## Examples

We provide four examples to illustrate the differences among the statistics. x1 and x4 represent the same non-monotonic function pattern in different sample sizes; x2 is the transpose of x1, no longer functional; and x3 is another non-functional pattern. Among the first three examples, x3 is the most statistically significant, but x1 has the highest function index $$\xi_f$$. This can be explained by a larger sample size but a smaller effect in x3 than x1. However, when x1 is linearly scaled to x4 to have exactly the same sample size with x3, both the $$p$$-value and the function index $$\xi_f$$ favor x4 over x3 for representing a stronger function.

require(FunChisq)
x1=matrix(c(5,1,5,1,5,1,1,0,1), nrow=3)
x1
##      [,1] [,2] [,3]
## [1,]    5    1    1
## [2,]    1    5    0
## [3,]    5    1    1
fun.chisq.test(x1)
##
##  Functional chi-square test
##
## data:  x1
## statistic = 10.043, parameter = 4, p-value = 0.03971
## sample estimates:
## non-constant function index xi.f
##                        0.5010703
x2=matrix(c(5,1,1,1,5,0,5,1,1), nrow=3)
x2
##      [,1] [,2] [,3]
## [1,]    5    1    5
## [2,]    1    5    1
## [3,]    1    0    1
fun.chisq.test(x2)
##
##  Functional chi-square test
##
## data:  x2
## statistic = 8.3805, parameter = 4, p-value = 0.07859
## sample estimates:
## non-constant function index xi.f
##                        0.4577259
x3=matrix(c(5,1,1,1,5,0,9,1,1), nrow=3)
x3
##      [,1] [,2] [,3]
## [1,]    5    1    9
## [2,]    1    5    1
## [3,]    1    0    1
fun.chisq.test(x3)
##
##  Functional chi-square test
##
## data:  x3
## statistic = 10.221, parameter = 4, p-value = 0.03686
## sample estimates:
## non-constant function index xi.f
##                        0.4614612
x4=x1*sum(x3)/sum(x1)
x4
##      [,1] [,2] [,3]
## [1,]  6.0  1.2  1.2
## [2,]  1.2  6.0  0.0
## [3,]  6.0  1.2  1.2
fun.chisq.test(x4)
##
##  Functional chi-square test
##
## data:  x4
## statistic = 12.051, parameter = 4, p-value = 0.01697
## sample estimates:
## non-constant function index xi.f
##                        0.5010703