Getting Started with NNS: Clustering and Regression

Fred Viole

Clustering and Regression

Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.

NNS Partitioning NNS.part

NNS.part is both a partitional and hierarchical clustering method. NNS iteratively partitions the joint distribution into partial moment quadrants, and then assigns a quadrant identification at each partition.

NNS.part returns a data.table of observations along with their final quadrant identification. It also returns the regression points, which are the quadrant means used in NNS.reg.

x=seq(-5,5,.05); y=x^3

for(i in 1:4){NNS.part(x,y,order=i,Voronoi = T)}

X-only Partitioning

NNS.part offers a partitioning based on \(x\) values only, using the entire bandwidth in its regression point derivation, and shares the same limit condition as partitioning via both \(x\) and \(y\) values.

for(i in 1:4){NNS.part(x,y,order=i,type="XONLY",Voronoi = T)}

Clusters Used in Regression

The right column of plots shows the corresponding regression for the order of NNS partitioning.

for(i in 1:3){NNS.part(x,y,order=i,Voronoi = T);NNS.reg(x,y,order=i)}

NNS Regression NNS.reg

NNS.reg can fit any \(f(x)\), for both uni- and multivariate cases. NNS.reg returns a self-evident list of values provided below.

Univariate:

NNS.reg(x,y,order=4,noise.reduction = 'off')

## $R2
## [1] 0.9998899
## 
## $SE
## [1] 0.7461974
## 
## $Prediction.Accuracy
## NULL
## 
## $equation
## NULL
## 
## $x.star
## NULL
## 
## $derivative
##     Coefficient X.Lower.Range X.Upper.Range
##  1:    67.09000        -5.000        -4.600
##  2:    58.87750        -4.600        -4.125
##  3:    43.66125        -4.125        -3.625
##  4:    34.04250        -3.625        -3.000
##  5:    24.00250        -3.000        -2.650
##  6:    15.96250        -2.650        -2.025
##  7:     9.48250        -2.025        -1.400
##  8:     2.92000        -1.400        -0.600
##  9:     0.78250        -0.600         0.650
## 10:     3.09250         0.650         1.425
## 11:     9.84250         1.425         2.050
## 12:    16.44250         2.050         2.700
## 13:    24.56250         2.700         3.025
## 14:    34.72250         3.025         3.650
## 15:    44.05000         3.650         4.150
## 16:    59.31250         4.150         4.600
## 17:    67.09000         4.600         5.000
## 
## $Point
## NULL
## 
## $Point.est
## NULL
## 
## $regression.points
##          x           y
##  1: -5.000 -125.000000
##  2: -4.600  -98.164000
##  3: -4.125  -70.197187
##  4: -3.625  -48.366563
##  5: -3.000  -27.090000
##  6: -2.650  -18.689125
##  7: -2.025   -8.712562
##  8: -1.400   -2.786000
##  9: -0.600   -0.450000
## 10:  0.650    0.528125
## 11:  1.425    2.924813
## 12:  2.050    9.076375
## 13:  2.700   19.764000
## 14:  3.025   27.746813
## 15:  3.650   49.448375
## 16:  4.150   71.473375
## 17:  4.600   98.164000
## 18:  5.000  125.000000
## 
## $Fitted
##          y.hat
##   1: -125.0000
##   2: -121.6455
##   3: -118.2910
##   4: -114.9365
##   5: -111.5820
##  ---          
## 197:  111.5820
## 198:  114.9365
## 199:  118.2910
## 200:  121.6455
## 201:  125.0000
## 
## $Fitted.xy
##          x         y     y.hat NNS.ID
##   1: -5.00 -125.0000 -125.0000  q4444
##   2: -4.95 -121.2874 -121.6455  q4444
##   3: -4.90 -117.6490 -118.2910  q4444
##   4: -4.85 -114.0841 -114.9365  q4444
##   5: -4.80 -110.5920 -111.5820  q4444
##  ---                                 
## 197:  4.80  110.5920  111.5820  q1111
## 198:  4.85  114.0841  114.9365  q1111
## 199:  4.90  117.6490  118.2910  q1111
## 200:  4.95  121.2874  121.6455  q1111
## 201:  5.00  125.0000  125.0000  q1111

Multivariate:

Multivariate regressions return a plot of \(y\) and \(\hat{y}\).

f= function(x,y) x^3+3*y-y^3-3*x
y=x; z=expand.grid(x,y)
g=f(z[,1],z[,2])
NNS.reg(z,g,order='max')

Inter/Extrapolation

NNS.reg can inter- or extrapolate any point of interest. The NNS.reg(x,y,point.est=...) parameter permits any sized data of similar dimensions to \(x\) and called specifically with $Point.est.

Classification

For a classification problem, we simply set NNS.reg(x,y,type="CLASS",...)

NNS.reg(iris[,1:4],iris[,5],point.est=iris[1:10,1:4],type="CLASS")$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

NNS Dimension Reduction Regression

NNS.reg also provides a dimension reduction regression by including a parameter NNS.reg(x,y,dim.red.method="cor",...). Reducing all regressors to a single dimension using the returned equation $equation.

NNS.reg(iris[,1:4],iris[,5],dim.red.method="cor")$equation

##        Variable Coefficient
## 1: Sepal.Length   0.7825612
## 2:  Sepal.Width  -0.4266576
## 3: Petal.Length   0.9490347
## 4:  Petal.Width   0.9565473
## 5:  DENOMINATOR   4.0000000

Thus, our model for this regression would be: \[Species = \frac{0.7825612*Sepal.Length -0.4266576*Sepal.Width + 0.9490347*Petal.Length + 0.9565473*Petal.Width}{4} \]

Threshold

NNS.reg(x,y,dim.red.method="cor",threshold=...) offers a method of reducing regressors further by controlling the absolute value of required correlation.

NNS.reg(iris[,1:4],iris[,5],dim.red.method="cor",threshold=.75)$equation

##        Variable Coefficient
## 1: Sepal.Length   0.7825612
## 2:  Sepal.Width   0.0000000
## 3: Petal.Length   0.9490347
## 4:  Petal.Width   0.9565473
## 5:  DENOMINATOR   3.0000000

Thus, our model for this further reduced dimension regression would be: \[Species = \frac{0.7825612*Sepal.Length -0*Sepal.Width + 0.9490347*Petal.Length + 0.9565473*Petal.Width}{3} \]

and the point.est=(...) operates in the same manner as the full regression above, again called with $Point.est.

NNS.reg(iris[,1:4],iris[,5],dim.red.method="cor",threshold=.75,point.est=iris[1:10,1:4])$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

References

If the user is so motivated, detailed arguments further examples are provided within the following:

*Nonlinear Nonparametric Statistics: Using Partial Moments

*Deriving Nonlinear Correlation Coefficients from Partial Moments

*New Nonparametric Curve-Fitting Using Partitioning, Regression and Partial Derivative Estimation

*Clustering and Curve Fitting by Line Segments

*Classification Using NNS Clustering Analysis