# GENEAclassify

## Overview

GENEActiv is the original wrist-worn, raw data accelerometer for objective behavioural measurement. It is the perfect tool for analysing free-living human behaviour, studying the impact of physical activity on health and understanding lifestyle. The device is an ergonomic body worn instrument:

• waterproof,
• robust to moderate impacts,
• contains a precision real-time clock,
• runs from a long-lasting, rechargeable battery,
• storage for 500 MB of binary data.

The package GENEAread provides data import functionality, giving researchers access to cutting edge analytical tools from the R environment. Imported data can be summarised by a segmentation process which cuts the dataset into time periods of characteristically similar behaviour and calculates a wide range of features for each event. The activities in each segment can be evaluated by an rpart GENEA classification tree. A sample rpart GENEA classification tree, trainingFit, is provided with GENEAclassify. This package provides classification tools, allowing researchers to segment training data and create custom classification trees. For best results, you will need to collect some training data for the activities that you expect your users to perform, label the appropriate segments, and create a new classification tree. Training data is data captured by the GENEActiv accelerometer during expected behaviours of your study participants, such as sleeping, sitting or running. To train the classification tree, ask a sample of your participants to wear the accelerometer and perform specific activities. These can be used to classify field data into behaviours of interest, to automatically process raw output into complete diary histories.

## Summary

There are multiple ways in which GENEAclassify can be used to understand your GENEActiv data. The analysis flow is typically:

• import GENEActiv bin file training data,
• segment and summarize training data,
• manually classify training data segments,
• creating an rpart GENEA fit from training data,
• import GENEActiv bin file test data,
• segment and summarize test data,
• apply rpart GENEA fit to segmented test data.

# Contents

1. Introduction and Installation.
1.    Preface
2.   Installing R
3.  Using GENEAclassifiyDemonstration.R 
4.   Installing and loading required libraries
5.    Installing GENEAclassify
6.   Development of GENEAclassify on GitHub
2. Segmentation
1.    Introduction
2.   Loading Data
3.  Segmenting Data
4.   Segmentation Variables, Functions and Features
5.    Varying Step Counting Algorithms
3. Applying a Classification Model
1.    Introduction
2.   Creating a classification model form Training Data
3.  Classifying a file
4.   Classifying a directory
4. Creating a Classification Model
1.    Introduction
2.   Manually Classifying files
3.  Creating a Training Data set

# 1. Introduction and Installation.

## i. Preface

This pdf file will give an introduction to using the programming language R with the package GENEAclassify which has been provided in a zip folder. The following steps will provide the user with the tools to use the package before running through the script. Please ensure that the folder has been decompressed. The folder found from the Dropbox link should contain the following:

• GENEAclassify_1.5.1.tar.gz
• GENEAclassifyDemonstration.R
• TrainingData (folder containing sample training data)
• TrainingData.csv (A larger training data set)
• RunWalk.bin (A sample .bin file)

## ii. Installing R.

To begin with install R from https://www.r-project.org. There is an introduction to the R environment here https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf that would familiarize a user. We would also recommend downloading the IDE (integrated development environment) RStudio from https://www.rstudio.com/products/rstudio/ after you have installed R. RStudio provides the user with more than the console to work with and gives the option of having a script, console, view of the R environment and file locations in one window. There is a list of tips here on using RStudio here https://www.rstudio.com/resources/cheatsheets/.

Ctrl-R or Cmd-Ent runs the line that the cursor is on or you can simply copy and paste the line of code into the console

Note: (You will also need to install x11 forward https://www.xquartz.org/ to run on OS.)

## iii. Using GENEAclassifiyDemonstration.R

Throughout this tutorial, commands are shown and briefly explained which are to be entered into the console. If you open the script GENEAclassifyDemostration.R (which is in the zip folder) you will find a detailed and commented script that you can work through, running each line at a time and making appropriate changes to get the results desired. This pdf runs through that script giving further explanation. Please remember that R is a case sensitive language.

##### The script provided will run through these steps:
2. Installing GENEAclassify
3. Loading in a data file/directory to segment
5. Creating the classification model from the Training Data
6. Classifying a file
7. Classifying a directory
8. Setting up the step counting algorithm
9. Varying Step Counting algorithms
10. Manually Classifying files
11. Creating a Training data set

The code shown in this PDF can also be copied and pasted into the console.

## iv. Installing and loading required libraries

install.packages("GENEAread", repos = "http://cran.us.r-project.org")
install.packages("changepoint", repos = "http://cran.us.r-project.org")
install.packages("signal", repos = "http://cran.us.r-project.org")
install.packages("mmap", repos = "http://cran.us.r-project.org")

# Load in the libraries
library(changepoint)
library(signal)
library(mmap)

## v. Installing GENEAclassify

Whilst GENEAclassify is in development, the easiest way to install the package is to use the Tar.gz file inside the zip folder. By running the code below GENEAclassify can be installed:

# You will need to change the folder location inside setwd("") to the directory where you saved the tar.gz file
# Note that R only uses / not \ when refering to a file/directory location
setwd("/Users/owner/Documents/GENEActiv")
install.packages("GENEAclassify_1.5.1.tar.gz", repos=NULL, type="source")

Once the package has been installed load in the library:

library(GENEAclassify)

## vi. Development of GENEAclassify on GitHub.

If you intend on working with the development of the package then we suggest setting up an account on GitHub here https://github.com/. RStudio can directly link to the repository for the development of the package by selecting to set-up a new project from the top right hand corner, selecting version control and cloning the GitHub repository.

This guide on using RStudio with GitHub is helpful https://www.r-bloggers.com/2015/07/rstudio-and-github/.

Once GitHub has been set-up we would recommend creating a personal branch for contributions which can be assessed and discussed by Activinsights before adding any changes to the master repository.

To use GitHub for development on windows, R tools will have to be downloaded from this link:

and a latex compiler found here:

For OS, xcode developer tools will have to be downloaded from this link:

and a latex compiler found here:

The package can also be installed using a GitHub authentication key which will go in the “” of auth_token. The key will be provided on request. The package devtools is also required to install from GitHub:

install.packages("devtools",repos = "http://cran.us.r-project.org")
library(devtools)

install_github("https://github.com/Langford/GENEAclassify_1.41.git",
auth_token = "7f0051aaca453eaabf0e60d49bcf752c0fea0668")

Again loading in the package to the work space:

library(GENEAclassify)

This vignette can be viewed from inside R by running the following code:

vignette("GENEAclassifyDemo", package = NULL, lib.loc = NULL, all = TRUE)

The pdf will appear on the right of RStudio or as a pop up if called from R.

# 2. Segmentation

## i. Introduction

The segmentation process outputs event based data from a change point analysis. The function determines when the statistical properties of the data have changed and hence the observed behaviour has also changed. This following section gives demonstrations on how this works given the GENEActiv .bin input data.

Now that we have the libraries required to segment and classify files/directories the data needs to be imported. Beginning with a file to import run the following lines of code:

 # Name of the file to analyse
DataFile = "DataDirectory/jl_left wrist_010094_2012-01-30 20-39-54.bin"
ImportedData = dataImport(DataFile, downsample = 100, start = 0, end = 0.1)
head(ImportData)

The start and end times can be set using values between 0 and 1 or using a 24 hour character string (time inside ““). The former divides the file into sections specified. For example if you have 10 days of data this might be useful. A 24 hour character string e.g start =”1 3:00”,end = “2 3:00”.The 1 represents the day and the time uses a 24 hour format. Ensure you leave a space between the days and the time.

The parameter ‘Use.Timestamps’ can be set to TRUE to use timestamps as the start and end times within dataImport. This parameter can be used in getGENEAsegments and classifyGENEA.

The output from the command head(ImportData) shows the variables calculated from importing the data.

The variable Downsample gives the user the option to compress the data to make the process less computationally heavy. This has a default value of 100 but can be made smaller to allow a higher resolution, although this will take longer to run.

## iii. Segmenting Data

The segmentation function in GENEAclassify works by finding changepoints in one or two selected streams on data by identifying differences in mean or variances across varying segment durations.

After loading this data, the segmentation can be applied. There are a number of methods for change point analysis within the package and the variable changepoint controls which analysis to perform. Some of these methods combine two methods and the analysis uses the function cpt.mean, cpt.var and cpt.meanvar from the package changepoint on both datasets before merging the two.

• “UpDownDegrees” will perform a change point analysis based on the variance of arm elevation and wrist rotation. This is the default analysis and is best for detecting posture change.
• “TempFreq” uses the variance of Temperature and Frequency. This analysis is better for determining changes during sleep but is computationally expensive ude to the use of frequency features.
• “UpDownFreq” will perform a change point analysis based on the variance of arm elevation and variance of frequency of the magnitude.
• “UpDownMean” will perform a single change point analysis of the mean position of the arm elevation.
• “UpDownVar” will perform a single change point analysis of the variance position of the arm elevation.
• “UpDownMeanVar” will perform a single change point analysis of the mean and variance of the arm elevation.
• “DegreesMean” will perform a single change point analysis of the mean position of the wrist rotation.
• “DegreesVar” will perform a single change point analysis of the variance position of the wrist rotation.
• “DegreesMeanVar” will perform a single change point analysis of the mean and variance of the wrist rotation.
• “UpDownMeanVarDegreesMeanVar” combines the meanvar change points of UpDown and Degrees. This is the default analysis and is best for detecting posture change.
• “UpDownMeanVarMagMeanVar” combines the meanvar change points of UpDown and magnitude.

## iv. Segmentation Variables, Functions and Features

Once a segment has been identified, the variables can be summarised using different functions to create features. For example, the mean of UpDown variable can be found and reported as a single numeric to become a feature of the segment. Within segmentation, the dataCols variable is a character vector that specifies what summary features are to be output for each segment. The format of each individual element of this vector has to be the variable name followed by the function applied to the variable. For example “UpDown.mean” will output the mean of the UpDown variable for each segment.

Variables that can be assessed with functions include: - UpDown (arm elevation) - Degrees (wrist rotation) - Magnitude (vector magnitude of acceleration) - Principal.Frequency (frequency domain analysis of acceleration) - Light (light meter) - Temp (temperature sensor) - Step (step internal counter) - Radians (If Radians = TRUE is selected)

These variables can be assessed with a range of standard R functions, and typical examples include: - mean - var - sd - max

However, any function that is loaded into the environment of R when using GENEAclassify can be used if it accepts a vector input and returns a single numeric. Functions returning other objects will cause an error.

Inside GENEAclassify there is also a range of custom functions: - GENEAratio (calculates the ratio of signal energies around a defined frequency) - GENEAskew (skewness, a measure of centredness) - sumdiff (finds the sum of the differences between samples) - meandiff (finds the mean of the differences between samples) - abssumdiff (finds the absolute sum of the differences between samples) - sddiff (finds the standard deviation of the differences between samples) - MeanDir (circular mean direction for radians) - CirVar (circular variance for radians) - CirSD (circular sd for radians) - CirDisp (circular dispersion for radians) - CirSkew (circular skewness for radians) - CirKurt (circular kurtosis for radians) - impact (calculates the proportion of samples above a defined absolute magnitude)

To find more information on these functions use the ? before the function in question. For example ?GENEAskew will provide details on that function in the help window of RStudio or as a pop-up.

The output of the function is created by taking raw data and returning calculated variables. These variables can be viewed using the function head:

# These are some of the output variables from segmentation and getGENEAsegments
dataCols <- c("UpDown.mean",
"UpDown.var",
"UpDown.sd",
"Degrees.mean",
"Degrees.var",
"Degrees.sd",
"Magnitude.mean",
# Frequency Variables
"Principal.Frequency.median",
"Principal.Frequency.GENEAratio",
"Principal.Frequency.sumdiff",
"Principal.Frequency.meandiff",
"Principal.Frequency.abssumdiff",
"Principal.Frequency.sddiff",
# Light Variables
"Light.mean",
"Light.max",
# Temperature Variables
"Temp.mean",
"Temp.sumdiff",
"Temp.meandiff",
"Temp.abssumdiff",
"Temp.sddiff",
# Step Variables
"Step.GENEAcount",
"Step.sd",
"Step.mean")

# Performing the segmentation now given the dataCols we want to find.

SegDataFile = segmentation(ImportedData, dataCols)
# View the data from the segmentation
head(SegDataFile)

getGENEAsegments combines the functions dataImport and segmentation:

 # Name of the file to analyse
DataFile = "DataDirectory/jl_left wrist_010094_2012-01-30 20-39-54.bin"
SegDataFile = getGENEAsegments(DataFile, dataCols, start = 0, end = 0.1)

## v. Varying Step Counting Algorithms

The segmentation function also applies a default step counting algorithm when no arguments are passed through the function. The step counting algorithm works by taking the y axis, filtering the signal with a chebyshev filter, applying a hysteresis threshold where the zero crossing are counted over a given window.

There are then 4 separate variables, shown with their defaults that can be changed in the Step Counter function:

• filterorder = 2
• boundaries = c(0.5, 5)
• Rp = 3
• hysteresis = 0.05

The filter order, boundaries and Rp are found in the cheby1 function from the package signal and are applied to the acceleration signal before a hysteresis is a applied.

To view all of the arguments that can be passed to the function stepCounter inside getGENEAsegments run the line ?stepCounter.

The following commands give examples from the training data provided:

WalkingData = "TrainingData/Walking/walking_jl_right wrist_024603_2015-12-12 15-36-47.bin"

# Starting with default filter
W1 = getGENEAsegments(WalkingData, plot.it = TRUE)

# plot.it Shows the crossing points. Turn this on for all plots to see how each filter works
# List the step outputs here.
W1$Step.GENEAcount; W1$Step.sd; W1$Step.mean W2 = getGENEAsegments(WalkingData, filteroder = 4) # Changing the filterorder changes the order of the chebyshev filter applied. W2$Step.GENEAcount; W2$Step.sd; W2$Step.mean

W3 = getGENEAsegments(WalkingData, boundaries = c(0.15, 1))
# List the step outputs here.
W3$Step.GENEAcount; W3$Step.sd; W3$Step.mean # Changing the deicbel paramter W4 = getGENEAsegments(WalkingData, Rp = 3) W4$Step.GENEAcount; W4$Step.sd; W4$Step.mean

# Increasing the hystersis
W5 = getGENEAsegments(WalkingData, hysteresis = 0.1)
SegData$Activity[3] = "Walking" ## iii. Creating a Training Data set A Training Data set that has been manually classified, can be used to create a Training model which can automatically classify files. To do this, the activities that are going to be identified must feature in the training model. Below is a demonstration of how to create a classification model by using the sample training data provided in the zip file. Running the following lines of code segments each of the .bin files in the sample training data. The second line manually classifies each of the activities which can be used to create the training model. The sample training data has been organised so that the .bin files in each sub folder only contain the activity named: Cycling = getGENEAsegments("TrainingData/Cycling") Cycling$Activity = "Cycling"

NonWear = getGENEAsegments("TrainingData/NonWear")
NonWear$Activity = "NonWear" onthego = getGENEAsegments("TrainingData/onthego") onthego$Activity = "onthego"

Running = getGENEAsegments("TrainingData/Running")
Running$Activity = "Running" Sitting = getGENEAsegments("TrainingData/Sitting") Sitting$Activity = "Sitting"

Sleep = getGENEAsegments("TrainingData/Sleep")
Sleep$Activity = "Sleep" Standing = getGENEAsegments("TrainingData/Standing") Standing$Activity = "Standing"

Swimming = getGENEAsegments("TrainingData/Swimming")
Swimming$Activity = "Swimming" Transport = getGENEAsegments("TrainingData/Transport") Transport$Activity = "Transport"

Walking = getGENEAsegments("TrainingData/Walking")
Walking$Activity = "Walking" Workingout = getGENEAsegments("TrainingData/Workingout") Workingout$Activity = "Workingout"

This provides the data required for the classification model. Combining all of these files together using the function rbind to form the training data:

TrainingData = rbind(Cycling,
NonWear,
onthego,
Running,
Sitting,
Sleep,
Standing,
Swimming,
Transport,
Walking,
Workingout)

Creating the classification model from this data using the commands from 3ii:

ClassificationModel = createGENEAmodel(TrainingData,
features = c("UpDown.mean",
"UpDown.sd","Degrees.mean",
"Degrees.sd","Magnitude.mean",
"Step.sd","Step.mean",
"Principal.Frequency.median",
"Principal.Frequency.mad"))