Use Case 09: Preparation of datasets for CRTspat

The starting point for the analyses is the co-ordinates for the spatial units that are to be randomized. These can be provided as lat-long coordinates, or as Cartesian point co-ordinates. Geolocated data from completed field studies or trials in progress can be read into CRTspat in the form of data frames, preferably with one record for each location in the trial. Pre-existing data from the field should be coded as follows:

Description of variable Variable name Type and values
x-coordinates of locations x Numeric
x-coordinates of locations y Numeric
cluster assignment cluster factor
arm assignment arm factor with levels “Control” and “Intervention”
buffer assignment buffer logical (TRUE for locations in the buffer)

Users might want to include an ID variable in the data frame if they intend to link the outputs back to other datasets. Other variables, including baseline data or trial outcomes may also be included.

Description of variable Default Variable name Type
Baseline numerator base_num Numeric
Baseline denominator base_denom Numeric
Numerator num Numeric
Denominator denom Numeric

Simulated co-ordinate sets can also be generated de novo (function CRTsp()) for use in methods development and testing.

To make it possible to share data files without compromising confidentiality the anonymize_site() function is provided. This removes absolute geolocations and applies a transformation to the coordinates to conserve distances between them but modifying the orientation.

The latlong_as_xy function is available to convert co-ordinates provided as decimal degrees into Cartesian co-ordinates with units of km with centroid (0,0). If the input co-ordinates are provided using a different projection then they must be converted externally to the package.