The **listcompr** package is a light-weight collection of functions for *list comprehension*. It is intended as “syntactic sugar” for R and it is inspired by the list comprehension capabilities from ‘python’. Next to lists, similar structures like vectors (of numeric or character type), data frames, or named lists can be easily composed. The package may be used for the simple generation of small data sets for “textbook examples”, for unit tests of your R code, or for tiny mathematical tasks.

The package is *not* intended for creating large data sets. The evaluation is done row-wise (and not vector-wise), which makes the evaluation relatively slow. On the other hand, it offers more flexibility to formulate “mathematical” statements. Especially for users not used to the vector-wise evaluation it should be easy to use.

In this section we present the basic features of **listcompr** in some simple examples.

Assume we want to create a vector of all numbers in `1:10`

which can be divided by 3 or by 4. We use the function `gen.vector`

taking the number `i`

as first argument. The range for `i`

and the given condition are passed to the function in the (arbitrary many) following arguments:

`gen.vector(i, i = 1:10, i %% 3 == 0 || i %% 4 == 0)`

`## [1] 3 4 6 8 9`

Note that it doesn’t matter if we use `||`

or `|`

as operator. All the conditions are evaluated line by line and *not* vector-wise.

Additional conditions are implicitly and-connected, i.e., if we want to explicitly exclude the number 8 we can state:

`gen.vector(i, i = 1:10, i %% 3 == 0 || i %% 4 == 0, i != 8)`

`## [1] 3 4 6 9`

Of course, we also could also unify that to one condition via `(... || ...) && i != 8`

.

Assume we want to create a list of all tuples `(i, j)`

where `i`

and `j`

are ranging in `1:3`

and `i >= j`

must hold. To this end we use the `gen.list`

function taking the base expression `c(i,j)`

as first argument and putting the ranges and conditions for `i`

and `j`

in the following arguments. This list of vectors can be expressed as:

`gen.list(c(i, j), i = 1:3, j = 1:3, i <= j)`

```
## [[1]]
## [1] 1 1
##
## [[2]]
## [1] 1 2
##
## [[3]]
## [1] 2 2
##
## [[4]]
## [1] 1 3
##
## [[5]]
## [1] 2 3
##
## [[6]]
## [1] 3 3
```

It’s also allowed to use the current value of `i`

within the range of `j`

. We can omit the condition `i <= j`

by changing the range for `j`

to `j = i:3`

. Moreover we now use the similar function `gen.data.frame`

which puts each generated vector in a row of a data frame:

```
<- gen.data.frame(c(i, j), i = 1:3, j = i:3)
df ::kable(df) knitr
```

i | j |
---|---|

1 | 1 |

1 | 2 |

2 | 2 |

1 | 3 |

2 | 3 |

3 | 3 |

Assume you have an arbitrary amount quadratic tiles and want to arrange them on your terrace under a certain condition: the number of tiles at the border should be the same as the number of tiles in the inner (i.e., all tiles without the border). For instance, a terrace with \(6 \times 4\) tiles has \((6-2)\cdot(4-2) = 8\) tiles at the inner and \((6\cdot4)-8 = 14\) tiles in the inner, i.e., does not fulfill the condition. The maximum size of your terrace is \(20 \times 20\) tiles (e.g., size of your garden). We get all possible sizes of a terrace with that condition via:

```
<- gen.data.frame(c(x, y), x = 1:20, y = 1:20, (x-2)*(y-2) == x*y/2)
df ::kable(df) knitr
```

x | y |
---|---|

12 | 5 |

8 | 6 |

6 | 8 |

5 | 12 |

The outer size of the terrace with that condition is either \(6 \times 8\) or \(5 \times 12\) tiles (and the symmetric solutions). Interestingly, it’s easy to prove that these are the only solutions even if your garden and the amount of tiles is infinite.

Now let’s prettify the output. Let’s rename the “x” and “y” columns to “width” and “height”. We also add a column for the number of tiles in the inner (which equals the number of tiles at the border).

```
<- gen.data.frame(c(width = x, height = y, inner_tiles), x = 1:20, y = 1:20,
df inner_tiles = (x-2)*(y-2), inner_tiles == x*y/2)
::kable(df) knitr
```

width | height | inner_tiles |
---|---|---|

12 | 5 | 30 |

8 | 6 | 24 |

6 | 8 | 24 |

5 | 12 | 30 |

Now we present some advanced features, allowing to create a bit more complex data sets.

For any variable with the name pattern `{varname}_{num}`

(underscore and a number), e.g. `a_2`

, the range for all these indexed variables can be defined by `{varname}_`

.

Assume we want to generate a data frame with all tuples \((a_1, a_2, a_3)\) with \(a_1 < a_2 < a_3\) where \(a_i \in \{1, ..., 4\}\). We simple specify the range for the `a_{i}`

variables by `a_ = 1:4`

:

```
<- gen.data.frame(c(a_1, a_2, a_3), a_ = 1:4, a_1 < a_2, a_2 < a_3)
df ::kable(df) knitr
```

a_1 | a_2 | a_3 |
---|---|---|

1 | 2 | 3 |

1 | 2 | 4 |

1 | 3 | 4 |

2 | 3 | 4 |

Now assume we want to get all tuples \((a_1, ..., a_5) \in \mathbb{N}^5\) with the condition \(\sum_{i} a_i = 6\). We make use of the `<operator> ... <operator>`

notation of **listcompr** which expands expressions in an intuitive way:

```
<- gen.data.frame(c(a_1, ..., a_5), a_ = 1:5, a_1 + ... + a_5 == 6)
df ::kable(df) knitr
```

a_1 | a_2 | a_3 | a_4 | a_5 |
---|---|---|---|---|

2 | 1 | 1 | 1 | 1 |

1 | 2 | 1 | 1 | 1 |

1 | 1 | 2 | 1 | 1 |

1 | 1 | 1 | 2 | 1 |

1 | 1 | 1 | 1 | 2 |

The expression expansions works also for arguments of functions, i.e., we could also replace the condition by `sum(a_1, ..., a_5) == 6`

(cf. next example).

Let’s consider a similar example to the above one. We want to generate all tuples \((a_1, ..., a_5) \in \{1, ..., 5\}^5\) with \(\sum_{i} a_i = 10\) and \(a_i \leq a_{i+1}\). To avoid writing `a_1 < a_2, a_2 < a_3, ...`

there is the function `gen.logical.and`

which is a list comprehension helper to generate conditions. We can express this data frame by:

```
<- gen.data.frame(c(a_1, ..., a_5), a_ = 1:5, sum(a_1, ..., a_5) == 10,
df gen.logical.and(a_i <= a_(i+1), i = 1:4))
::kable(df) knitr
```

a_1 | a_2 | a_3 | a_4 | a_5 |
---|---|---|---|---|

2 | 2 | 2 | 2 | 2 |

1 | 2 | 2 | 2 | 3 |

1 | 1 | 2 | 3 | 3 |

1 | 1 | 2 | 2 | 4 |

1 | 1 | 1 | 3 | 4 |

1 | 1 | 1 | 2 | 5 |

We can get all permutations of \((1, 2, 3, 4)\) (which are \(4! = 24\)) by using the following property of permutations: \(\text{perm}(1, ..., n) = \{(a_1, ..., a_n) | a_i \in \{1, ..., n\}, a_i \neq a_j \; \text{for} \; i \neq j\}\). This can be expressed via (we show only the first 8 permutations):

```
<- gen.data.frame(c(a_1, ..., a_4), a_ = 1:4,
df gen.logical.and(a_i != a_j, i = 1:4, j = (i+1):4))
::kable(df[1:8,]) knitr
```

a_1 | a_2 | a_3 | a_4 |
---|---|---|---|

4 | 3 | 2 | 1 |

3 | 4 | 2 | 1 |

4 | 2 | 3 | 1 |

2 | 4 | 3 | 1 |

3 | 2 | 4 | 1 |

2 | 3 | 4 | 1 |

4 | 3 | 1 | 2 |

3 | 4 | 1 | 2 |

*Note:* Don’t use such an approach for creating large data sets of permutations, e.g., for all the 5040 permutations of `1:7`

. This approach is very slow! We recommend the `permn`

function from the combinat package.

Conditions created with `gen.logical.and`

or `gen.logical.or`

can be used in other R packages, e.g., dplyr. To use a generated condition in, e.g., `dplyr::filter`

you need to put the operator `!!`

in front of the function which generates the condition.

For example, assume you have a data frame with the result of throwing three dices. You want to get the outcomes where at least two dices show the number 6. We show only the first 8 results:

```
<- gen.data.frame(c(a_1, ..., a_3), a_ = 1:6)
dices <- dplyr::filter(dices, !!gen.logical.or(a_i == 6 & a_j == 6, i = 1:3, j = (i+1):3))
res ::kable(res[1:8,]) knitr
```

a_1 | a_2 | a_3 |
---|---|---|

6 | 6 | 1 |

6 | 6 | 2 |

6 | 6 | 3 |

6 | 6 | 4 |

6 | 6 | 5 |

6 | 1 | 6 |

6 | 2 | 6 |

6 | 3 | 6 |

The list comprehension functions of **listcompr** can also be nested. In the following example we use `gen.data.frame`

as the outer function and `gen.vector`

(within a sum) to generate a data frame of the sum of all divisors of a whole number (the number itself excluded):

```
<- gen.data.frame(c(a, sumdiv = sum(gen.vector(x, x = 1:(a-1), a %% x == 0))), a = 2:10)
df ::kable(df) knitr
```

a | sumdiv |
---|---|

2 | 1 |

3 | 1 |

4 | 3 |

5 | 1 |

6 | 6 |

7 | 1 |

8 | 7 |

9 | 4 |

10 | 8 |

Now we want to add a flag to this data frame if the number `a`

is a so called *perfect number*. A perfect number is characterized by being identical to their sum of divisors. To this end we use a substitution variable `sumdiv`

for the sum of divisors. We use this variable for checking if the number is perfect:

```
<- gen.data.frame(c(a, sumdiv, perfect = (sumdiv == a)), a = 2:10,
df sumdiv = sum(gen.vector(x, x = 1:(a-1), a %% x == 0)))
::kable(df) knitr
```

a | sumdiv | perfect |
---|---|---|

2 | 1 | 0 |

3 | 1 | 0 |

4 | 3 | 0 |

5 | 1 | 0 |

6 | 6 | 1 |

7 | 1 | 0 |

8 | 7 | 0 |

9 | 4 | 0 |

10 | 8 | 0 |

This means the only perfect number between 1 and 10 is 6.

The package also offers some functions to compose characters. They can be used to generate lists or vectors of characters or (row-)names of the results.

Consider our example from above where we search for the number of tiles, such that the border and the inner have the same number of tiles. Assume we want a textual output instead of a data frame. We simply pass a character to `gen.vector`

where all expressions in `{}`

-brackets are evaluated according to the list comprehension result:

```
gen.vector("size: {x}x{y} tiles, where {x*y/2} tiles are at the border/inner",
x = 1:20, y = 1:20, (x-2)*(y-2) == x*y/2)
```

```
## [1] "size: 12x5 tiles, where 30 tiles are at the border/inner"
## [2] "size: 8x6 tiles, where 24 tiles are at the border/inner"
## [3] "size: 6x8 tiles, where 24 tiles are at the border/inner"
## [4] "size: 5x12 tiles, where 30 tiles are at the border/inner"
```

Let’s revisit the example with the divisors. Instead of the sum of the divisors, we want to get all divisors of a number as a vector. Each vector should be stored in a list entry named `divisors_of_{a}`

. With the following we get the divisors for all numbers in `5:10`

:

`gen.named.list("divisors_of_{a}", gen.vector(x, x = 1:(a-1), a %% x == 0), a = 5:10)`

```
## $divisors_of_5
## [1] 1
##
## $divisors_of_6
## [1] 1 2 3
##
## $divisors_of_7
## [1] 1
##
## $divisors_of_8
## [1] 1 2 4
##
## $divisors_of_9
## [1] 1 3
##
## $divisors_of_10
## [1] 1 2 5
```

We continue with calculating divisors. Assume, the numbers `a = 90:95`

are given and we want to show a table indicating which numbers within `2:10`

are integer divisors of `a`

. The following logical matrix shows this result. We use `byrow = TRUE`

to specify that the elements of the inner vector refers to the columns and not to the rows.

```
<- gen.named.matrix("divisors_{a}", gen.named.vector("{x}", a %% x == 0, x = 2:10),
m a = 90:95, byrow = TRUE)
::kable(m) knitr
```

divisors_90 | divisors_91 | divisors_92 | divisors_93 | divisors_94 | divisors_95 | |
---|---|---|---|---|---|---|

2 | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE |

3 | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE |

4 | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE |

5 | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE |

6 | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |

7 | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE |

8 | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |

9 | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |

10 | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |

Analogously to the example above we can create a matrix with characters as content, denoting either the integer result `x/a`

or simply `"no"`

if `x`

is not an integer divisor of `a`

.

```
<- gen.named.matrix("divisors_{a}",
m gen.named.vector("{x}", if (a %% x == 0) "factor = {a/x}" else "no", x = 2:10),
a = 90:95, byrow = TRUE)
::kable(m) knitr
```

divisors_90 | divisors_91 | divisors_92 | divisors_93 | divisors_94 | divisors_95 | |
---|---|---|---|---|---|---|

2 | factor = 45 | no | factor = 46 | no | factor = 47 | no |

3 | factor = 30 | no | no | factor = 31 | no | no |

4 | no | no | factor = 23 | no | no | no |

5 | factor = 18 | no | no | no | no | factor = 19 |

6 | factor = 15 | no | no | no | no | no |

7 | no | factor = 13 | no | no | no | no |

8 | no | no | no | no | no | no |

9 | factor = 10 | no | no | no | no | no |

10 | factor = 9 | no | no | no | no | no |