This tutorial will guide you on how to perform GWAS with SLOPE. Analysis consists of three simple steps.

You need to provide paths to three files:

- .fam file with information about observations including phenotype. By default this file is assumed to have six column, with last one containing phenotype. For details see documentation of function
*readPhenotype* - .map file with mapping information about snps. This file is not required for subsequent analysis, but is highly recommended. Note that lack of mapping information will result in less informative plots and results summary.
- .raw file with snps. We assume that snps were previously exported from PLINK with command

**plink --file input --recodeAD --out output**, where*input*is you name of .ped file.

```
library(geneSLOPE)
famFile <- system.file("extdata", "plinkPhenotypeExample.fam", package = "geneSLOPE")
mapFile <- system.file("extdata", "plinkMapExample.map", package = "geneSLOPE")
snpsFile <- system.file("extdata", "plinkDataExample.raw", package = "geneSLOPE")
```

`phenotype <- read_phenotype(filename = famFile)`

When you have phenotype you can move to reading snp data. Depending on data size reading SNPs may long time. As data is very large, snps are filtered with their marginal test p-value. All snps which p-values are larger than threshold \(pValMax\) will be truncated. For details on how to choose \(pValMax\) see How changing parameters affects my analysis?

```
screening.result <- screen_snps(snpsFile, mapFile, phenotype, pValMax = 0.05,
chunkSize = 1e2, verbose=FALSE)
```

Parameter *verbose=FALSE* suppresses progress bar. Default value is *TRUE*.

User look into result of reading and screening dataset

`summary(screening.result)`

```
## Object of class screeningResult
## $X: data matrix
## 90 observations
## 52 snps
## 1000 SNPs were screened
## 52 snps had p-value smaller than 0.05 in marginal test
```

When data is succesfully read, one can move to the second step of analysis.

Last step of analysis is using SLOPE

`slope.result <- select_snps(clumping.result, fdr=0.1)`

```
## Warning in if (type == "slope") {: warunek posiada długość > 1 i tylko
## pierwszy element będzie użyty
```

```
## Warning in select_snps(clumping.result, fdr = 0.1): All lambdas are equal. SLOPE does not guarantee
## False Discovery Rate control
```

As before one can plot and summarize results

`summary(slope.result)`

```
## Object of class selectionResult
## 2 snps selected out of 41 clump representatives
## Effect size for selected snps (absolute values)
## Min: 3.640963
## Mean: 3.768299
## Max: 3.895635
## R square of the final model: 0.9756304
## Kink value: 1
```

`plot(slope.result)`

Like with result of clumping, it is possible to identify interactively clump number which contains specific SNP selected by SLOPE. The procedure is the following. First plot the whole genome, then run function and click on SNP of interest.

```
plot(slope.result)
identify_clump(slope.result)
```

When clump is identified one can zoom into it

`plot(slope.result, clumpNumber = 1)`

It is easy to get information about selected SNPs. To get indices of columns in original SNP matrix they refer to use

`slope.result$selectedSnpsNumbers`

```
## rs2719295_T rs17546815_T
## 222 573
```

If .map file was given, then one can get more information about SNPs

`slope.result$X_info[slope.result$selectedSnpsNumbers,]`

```
## chromosome rs genetic_distance_(morgans)
## 222 8 rs2719295 34.2919
## 573 11 rs17546815 113.7420
## base_pair_position_(bp_units)
## 222 34291873
## 573 113741770
```

For information about SNPs that are part of specific clump use

`summary(slope.result, clumpNumber = 1)`

```
## Summary of 1 selected clump
## chromosome rs genetic_distance_(morgans)
## 222 8 rs2719295 34.2919
## 377 2 rs11124642 38.6580
## 598 2 rs4672803 217.2340
## 906 6 rs325120 147.8600
## base_pair_position_(bp_units)
## 222 34291873
## 377 38657967
## 598 217233745
## 906 147860161
```

There are three numerical parameters that influence result

- \(pValMax\) is the threshold p-value for marginal test. When data is loaded to R, initial screening of snps is performed. For every snp, test for slope coefficient in simple linear regression model \(lm(phenotype \sim snp)\) is performed. All snps with p-value larger than pValMax are discarded. Setting this parameter to too large will significantly increase number of snps on which clumping procedure will be performed. This may cause two technical threats
- Computer might run out of RAM memory
- Clumping procedure might take a lot of time

- \(rho\) is threshold for correlation between snps in clumping procedure. For given snp, every that is correlated with it at least at level \(rho\) will be clumped together. Setting this parameter too high (say 0.7) will cause SLOPE to work on highly correlated snps which might affect FDR control
- \(fdr\) false discover rate (FDR) for SLOPE procedure. The higher \(fdr\) is, the more variables will be accepted to the model. Contrary, small \(fdr\) yields more conservative models

**Input**: \(rho \in (0, 1)\);

- for each SNPs calculate p-value for simple linear regression test, i.e. after assuming linear regression model with a single explanatory variable and testing if slope parameter is nonzero. Vector created from p-values gives hierarchy which is used in next steps;
- select index, Idx, corresponding to the smallest p-value, create group of all SNPs correlated with SNPs Idx at least on level Rho (Pearson correlation);
- define this group as clump and Idx as representative of clump. Exclude entire clump from remained SNPs;
- repeat two previous steps until each SNP is assigned to some clump.