netCoin package

Modesto Escobar, David Barrios, Carlos Prieto and Luis Martinez-Uribe(University of Salamanca)

2018-01-29

netCoin creates interactive networked graphs of coincidences within the data. It brings together the data analysis capabilities of R with powerful interactive visualization JavaScript libraries to provide a package to study coincidences.

This vignette briefly describes the statistical methods and provides a few examples on how to use the package. The sections are structured as follows:

  1. Definitions
  2. Introduction and examples

Definitions

Coincidence analysis is a set of techniques to detect events, characters, objects, attributes, or characteristics that tend to occur together within certain delimited spaces.

These spaces are call scenarios (\(S\)) and are considered to be the units of analysis, and as such they have to be placed in the rows of a matrix or data.frame.

In each \(i\) scenario, a series of \(J\) events \(X_j\), which are represented as dichotomous variables \(X_{j}\) in columns, may occur (1) or may not occur (0). Scenarios and events constitute an incidence matrix \(\mathbf{I}\). \[\mathbf{I}= \begin{pmatrix} 0&1&0&...&1 \\ 1&0&1&...&0 \\ ...&...&...&...&... \\ 1&1&0&...&0 \end{pmatrix}\]

From this incidences matrix, a coincidence symmetric matrix (\(\mathbf{C}\)) can be obtained with the coin function . In this matrix the main diagonal represents frequencies of \(X_j\), while the others elements are number of coincidences between two events.

\[\mathbf{C}= \begin{pmatrix} 2&1&1&...&1 \\ 1&2&0&...&2 \\ 1&0&1&...&0 \\ ...&...&...&...&... \\ 1&2&0&...&2 \end{pmatrix}\]

Once there is a coin object, a similarity matrix can be obtained. Similarity matrices available in netCoin are:

In addition to the previous, other measures that can be obtained from coin are:

To obtain similarity and other measurement matrices, the function sim elaborates a list of them.
Haberman odd even small large
odd 10.000000 -10.000000 4.766506 -4.766506
even -10.000000 10.000000 -4.766506 4.766506
small 4.766506 -4.766506 10.000000 -10.000000
large -4.766506 4.766506 -10.000000 10.000000

The function edgeList generates a collecion of edges composed by a list of similarity measures whenever a criterium (generally p(Z)<.50) is met.

Source Target Haberman P(z)
odd small 4.766506 3.18645e-06
even large 4.766506 3.18645e-06

In order to make a graph, two data frames are needed: a nodes data frames with names and other nodes attributes (see as.nodes and an edge data frame (see edgeList). For more information go to netCoin.

Introduction and examples

Package installation and loading

To install and load the updated version of the netCoin package simply run the following commands:

install.packages("netCoin")
library(netCoin)

Basic coincidence analysis with dice roll data

Once the netCoin package has been installed and loaded, let´s now load the dice data and have a look at it:

data(dice)
head(dice)
##    dice 1 2 3 4 5 6 odd even small large
## V1    1 1 0 0 0 0 0   1    0     1     0
## V2    2 0 1 0 0 0 0   0    1     1     0
## V3    5 0 0 0 0 1 0   1    0     0     1
## V4    4 0 0 0 1 0 0   0    1     0     1
## V5    2 0 1 0 0 0 0   0    1     1     0
## V6    5 0 0 0 0 1 0   1    0     0     1

It contains the results of rolling a dice 100 times. The scenarios here are each dice roll. The events are the possible results, i.e. each of the numbers from 1 to 6 as well as odd or even and small(<4) or large(>3). Thus the first column contains the numeric result, the following 6 columns represent each of the dice roll possible outcomes with 1’s and 0´s. Finally, the last four columns also contain 0’s and 1´s for representing whether the result is odd or even, small or large.

Columns 2 to 11 can be considered the incidence matrix \(\mathbf{I}\)

head(dice[,-1])
##    1 2 3 4 5 6 odd even small large
## V1 1 0 0 0 0 0   1    0     1     0
## V2 0 1 0 0 0 0   0    1     1     0
## V3 0 0 0 0 1 0   1    0     0     1
## V4 0 0 0 1 0 0   0    1     0     1
## V5 0 1 0 0 0 0   0    1     1     0
## V6 0 0 0 0 1 0   1    0     0     1

Using the coin function the coincidence matrix \(\mathbf{C}\) can be obtained:

C <- coin(dice[,-1]) # coincidence matrix
C
## n= 100
##        1  2  3  4  5  6 odd even small large
## 1     15                                    
## 2      0 13                                 
## 3      0  0 26                              
## 4      0  0  0 18                           
## 5      0  0  0  0 13                        
## 6      0  0  0  0  0 15                     
## odd   15  0 26  0 13  0  54                 
## even   0 13  0 18  0 15   0   46            
## small 15 13 26  0  0  0  41   13    54      
## large  0  0  0 18 13 15  13   33     0    46

The nodes and edges can be calculated from the coincidence matrix \(\mathbf{C}\) and then the network object can be generated

N <- asNodes(C)# node data frame
E <- edgeList(C)# edge data frame
Net <- netCoin(N,E) # network object

The network to be visualised is created using the following command which generates a folder with an index.html file to open with a browser that will display the interface shown below:

Net <- netCoin(N,E,dir="dice")

Multigraph coincidence analysis with data of families of Renaissance Italy

The following example uses data about families of Renaissance Italy from Padgett & Ansell (1983). It consists of a dataframe (families) with information about italian families of the renaissance, and another dataframe (links) with the marriage and business links between families.

data("families")
data("links")

The previous coin, edgeList, asNodes and netCoin functions can be executed together with the allNet function where several parameters can be specifyed:

With the following commands two networks are generated that represent on the business and marriages links between the two families.

G <- allNet(incidence=links[links$link=="Marriage",-17],
     nodes=families, layout="md",
     criteria="f",minL=1, size="frequency",color="seat",
     main="Marriage Links beetween Italian families",
     note="Data source: Padgett & Ansell (1983)")
H <- allNet(incidence=links[links$link=="Business",-17],
     nodes=families, layout="md",
     criteria="f",minL=1, size="frequencb",color="seat",
     main="Marriage Links beetween Italian families",
     note="Data source: Padgett & Ansell (1983)")

Once the two networks are ready, the function multigraphCreate generates both graphs in the specified file.

multigraphCreate(Marriage=G,Business=H,dir="italian")

netCoin applied to Sanderson´s analysis of species co-ocurrences

This section uses one of the most renowned data examples in ecology. Charles Darwin compiled data about 13 species of finches and where they could be found in 17 of the Galapago islands. Sanderson ….

Here we add a few extra features to our graph:

data("Galapagos")
data("finches")
finches$species<-paste(system.file("doc/sanderson",package="netCoin"),
        "/images/",finches$species,sep="") # copy path to the species field
Net<-allNet(Galapagos,nodes=finches, criteria="hyp", maxL=.05,
        lwidth ="Haberman",lweight="Haberman",
        size="frequency", image="species", layout="mds",
        main="Species coincidences in Galapagos Islands",
        note="Data source: Sanderson (2000)")
plot(Net)