Input parameters:

** Y** A vector of observed outcome
variable.

`w`

`c`

`ci_appr`

- “matching”: Matching by GPS

- “weighting”: Weighting by GPS

`gps_density`

`use_cov_transform`

`transformers`

Available transformers:

- pow2: to the power of 2

- pow3: to the power of 3

`bin_seq`

`seq(min(w)+delta_n/2,max(w), by=delta_n)`

.`exposure_trim_qtls`

`gps_trim_qtls`

`params`

`sl_lib`

`nthread`

`...`

`ci.appr`

)- if ci.appr = ‘matching’:
*dist_measure*: Distance measuring function. Available options:- l1: Manhattan distance matching

- l1: Manhattan distance matching
*delta_n*: caliper parameter.

*scale*: a specified scale parameter to control the relative weight that is attributed to the distance measures of the exposure versus the GPS.

*covar_bl_method*: covariate balance method. Available options:- ‘absolute’

- ‘absolute’
*covar_bl_trs*: covariate balance threshold

*covar_bl_trs_type*: covariate balance type (mean, median, maximal)*max_attempt*: maximum number of attempt to satisfy covariate balance.

See create_matching() for more details about the parameters and default values.

- if ci.appr = ‘weighting’:
*covar_bl_method*: Covariate balance method.

*covar_bl_trs*: Covariate balance threshold

*max_attempt*: Maximum number of attempt to satisfy covariate balance.

- Generating Pseudo Population

```
set.seed(422)
n <- 10000
mydata <- generate_syn_data(sample_size = n)
year <- sample(x=c("2001", "2002", "2003", "2004", "2005"), size = n,
replace = TRUE)
region <- sample(x=c("North", "South", "East", "West"),size = n,
replace = TRUE)
mydata$year <- as.factor(year)
mydata$region <- as.factor(region)
mydata$cf5 <- as.factor(mydata$cf5)
pseudo_pop <- generate_pseudo_pop(
mydata[, c("id", "Y")],
mydata[, c("id", "w")],
mydata[, c("id", "cf1", "cf2", "cf3", "cf4",
"cf5", "cf6","year","region")],
ci_appr = "matching",
gps_density = "kernel",
use_cov_transform = TRUE,
transformers = list("pow2", "pow3", "abs",
"scale"),
exposure_trim_qtls = c(0.01,0.99),
sl_lib = c("m_xgboost"),
covar_bl_method = "absolute",
covar_bl_trs = 0.1,
covar_bl_trs_type = "mean",
max_attempt = 4,
dist_measure = "l1",
delta_n = 1,
scale = 0.5,
nthread = 1)
plot(pseudo_pop)
```

** matching_fn** is Manhattan distance
matching approach. For prediction model we use SuperLearner
package. SuperLearner supports different machine learning methods and
packages.

`params`

`sl_lib`

`params`

Package name | `sl_lib` name |
prefix | available hyperparameters |
---|---|---|---|

XGBoost | `m_xgboost` |
`xgb_` |
nrounds, eta, max_depth, min_child_weight |

ranger | `m_ranger` |
`rgr_` |
num.trees, write.forest, replace, verbose, family |

** nthread** is the number of available
threads (cores). XGBoost needs OpenMP installed on the system to
parallelize the processing.

- Estimating GPS

```
data_with_gps <- estimate_gps(w,
c,
params = list(xgb_max_depth = c(3,4,5),
xgb_rounds = c(10,20,30,40)),
nthread = 1,
sl_lib = c("m_xgboost")
)
```

- Estimating Exposure Rate Function

```
estimate_npmetric_erf<-function(matched_Y,
matched_w,
matched_counter = NULL,
bw_seq=seq(0.2,2,0.2),
w_vals,
nthread)
```

- Generating Synthetic Data

- Logging

The CausalGPS package is logging internal activities into the
`CausalGPS.log`

file. The file is located in the source file
location and will be appended. Users can change the logging file name
(and path) and logging threshold. The logging mechanism has different
thresholds (see logger package).
The two most important thresholds are INFO and DEBUG levels. The former,
which is the default level, logs more general information about the
process. The latter, if activated, logs more detailed information that
can be used for debugging purposes.