The compare function

Ethan Heinzen, Ryan Lennon, Andrew Hanson

20 July, 2018

Introduction

The compare() function (or more precisely, the compare.data.frame() function) can be used to determine and report differences between two data.frames. It was written in the spirit of replacing PROC COMPARE from SAS.

We reexport the compare() generic from the testthat package to avoid namespace conflicts, and write a data.frame S3 method to compare data.frames.

library(arsenal)

Basic examples

We first build two similar data.frames to compare.

df1 <- data.frame(id = paste0("person", 1:3),
                  a = c("a", "b", "c"),
                  b = c(1, 3, 4),
                  c = c("f", "e", "d"),
                  row.names = paste0("rn", 1:3),
                  stringsAsFactors = FALSE)
df2 <- data.frame(id = paste0("person", 3:1),
                  a = c("c", "b", "a"),
                  b = c(1, 3, 4),
                  d = paste0("rn", 1:3),
                  row.names = paste0("rn", c(1,3,2)),
                  stringsAsFactors = FALSE)

To compare these datasets, simply pass them to the compare() function:

compare(df1, df2)
Compare Object

Function Call: 
compare.data.frame(x = df1, y = df2)

Shared: 4 variables and 3 observations.
Not shared: 2 variables and 0 observations.

Differences found in 2/3 variables compared.
0 variables compared have non-identical attributes.

Use summary() to get a more detailed summary

summary(compare(df1, df2))
Variables not shared
version variable position class
x c 4 character
y d 4 character
Other variables not compared
No other variables not compared
Observations not shared
No observations not shared
Differences detected by variable
var.x var.y n NAs
id id 2 0
a a 2 0
b b 0 0
First 10 differences detected per variable
var.x var.y ..row.names.. values.x values.y row.x row.y
id id 1 person1 person3 1 1
id id 3 person3 person1 3 3
a a 1 a c 1 1
a a 3 c a 3 3
Non-identical attributes
No non-identical attributes

By default, the datasets are compared row-by-row. To change this, use the by= or by.x= and by.y= arguments:

summary(compare(df1, df2, by = "id"))
Variables not shared
version variable position class
x c 4 character
y d 4 character
Other variables not compared
No other variables not compared
Observations not shared
No observations not shared
Differences detected by variable
var.x var.y n NAs
a a 0 0
b b 2 0
First 10 differences detected per variable
var.x var.y id values.x values.y row.x row.y
b b person1 1 4 1 3
b b person3 4 1 3 1
Non-identical attributes
No non-identical attributes

A larger example

Let’s muck up the mockstudy data.

data(mockstudy)
mockstudy2 <- muck_up_mockstudy()

We’ve changed row order, so let’s compare by the case ID:

summary(compare(mockstudy, mockstudy2, by = "case"))
Variables not shared
version variable position class
x age 2 integer
x arm 3 character
x fu.time 6 integer
x fu.stat 7 integer
y fu_time 11 integer
y fu stat 12 integer
y Arm 13 character
Other variables not compared
var.x pos.x class.x var.y pos.y class.y
race 5 character race 3 factor
ast 12 integer ast 8 numeric
Observations not shared
version case observation
x 88989 9
x 90158 8
x 99508 7
x 112263 5
Differences detected by variable
var.x var.y n NAs
sex sex 1495 0
ps ps 1 1
hgb hgb 266 266
bmi bmi 0 0
alk.phos alk.phos 0 0
mdquality.s mdquality.s 0 0
age.ord age.ord 0 0
First 10 differences detected per variable (1741 differences not shown)
var.x var.y case values.x values.y row.x row.y
sex sex 76170 Male Male 26 20
sex sex 76240 Male Male 27 21
sex sex 76431 Female Female 28 22
sex sex 76712 Male Male 29 23
sex sex 76780 Female Female 30 24
sex sex 77066 Female Female 31 25
sex sex 77316 Male Male 32 26
sex sex 77355 Male Male 33 27
sex sex 77591 Male Male 34 28
sex sex 77851 Male Male 35 29
ps ps 86205 0 NA 6 3
hgb hgb 88714 NA -9 192 186
hgb hgb 88955 NA -9 204 198
hgb hgb 89549 NA -9 229 223
hgb hgb 89563 NA -9 231 225
hgb hgb 89584 NA -9 237 231
hgb hgb 89591 NA -9 238 232
hgb hgb 89595 NA -9 239 233
hgb hgb 89647 NA -9 243 237
hgb hgb 89665 NA -9 244 238
hgb hgb 89827 NA -9 255 249
Non-identical attributes
var.x var.y name
sex sex label
sex sex levels
race race class
race race label
race race levels
bmi bmi label

Column name comparison options

It is possible to change which column names are considered “the same variable”.

Ignoring case

For example, to ignore case in variable names (so that Arm and arm are considered the same), pass tol.vars = "case".

You can do this using comparison.control()

summary(compare(mockstudy, mockstudy2, by = "case", control = comparison.control(tol.vars = "case")))

or pass it through the ... arguments.

summary(compare(mockstudy, mockstudy2, by = "case", tol.vars = "case"))
Variables not shared
version variable position class
x age 2 integer
x fu.time 6 integer
x fu.stat 7 integer
y fu_time 11 integer
y fu stat 12 integer
Other variables not compared
var.x pos.x class.x var.y pos.y class.y
race 5 character race 3 factor
ast 12 integer ast 8 numeric
Observations not shared
version case observation
x 88989 9
x 90158 8
x 99508 7
x 112263 5
Differences detected by variable
var.x var.y n NAs
arm Arm 0 0
sex sex 1495 0
ps ps 1 1
hgb hgb 266 266
bmi bmi 0 0
alk.phos alk.phos 0 0
mdquality.s mdquality.s 0 0
age.ord age.ord 0 0
First 10 differences detected per variable (1741 differences not shown)
var.x var.y case values.x values.y row.x row.y
sex sex 76170 Male Male 26 20
sex sex 76240 Male Male 27 21
sex sex 76431 Female Female 28 22
sex sex 76712 Male Male 29 23
sex sex 76780 Female Female 30 24
sex sex 77066 Female Female 31 25
sex sex 77316 Male Male 32 26
sex sex 77355 Male Male 33 27
sex sex 77591 Male Male 34 28
sex sex 77851 Male Male 35 29
ps ps 86205 0 NA 6 3
hgb hgb 88714 NA -9 192 186
hgb hgb 88955 NA -9 204 198
hgb hgb 89549 NA -9 229 223
hgb hgb 89563 NA -9 231 225
hgb hgb 89584 NA -9 237 231
hgb hgb 89591 NA -9 238 232
hgb hgb 89595 NA -9 239 233
hgb hgb 89647 NA -9 243 237
hgb hgb 89665 NA -9 244 238
hgb hgb 89827 NA -9 255 249
Non-identical attributes
var.x var.y name
arm Arm label
sex sex label
sex sex levels
race race class
race race label
race race levels
bmi bmi label

Treating dots and underscores the same (equivalence classes)

It is possible to treat certain characters or sets of characters as the same by passing a character vector of equivalence classes to the tol.vars= argument.

In short, each string in the vector is split into single characters, and the resulting set of characters is replaced by the first character in the string. For example, passing c("._") would replace all underscores with dots in the column names of both datasets. Similarly, passing c("aA", "BbCc") would replace all instances of "A" with "a" and all instances of "b", "C", or "c" with "B". This is one way to ignore case for certain letters. Otherwise, it’s possible to combine the equivalence classes with ignoring case, by passing (e.g.) c("._", "case").

Passing a single character as an element this vector will replace that character with the empty string. For example, passing c(" “,”.“) would remove all spaces and dots from the column names.

For mockstudy, let’s treat dots, underscores, and spaces as the same, and ignore case:

summary(compare(mockstudy, mockstudy2, by = "case",
                tol.vars = c("._ ", "case") # dots=underscores=spaces, ignore case
))
Variables not shared
version variable position class
x age 2 integer
Other variables not compared
var.x pos.x class.x var.y pos.y class.y
race 5 character race 3 factor
ast 12 integer ast 8 numeric
Observations not shared
version case observation
x 88989 9
x 90158 8
x 99508 7
x 112263 5
Differences detected by variable
var.x var.y n NAs
arm Arm 0 0
sex sex 1495 0
fu.time fu_time 0 0
fu.stat fu stat 0 0
ps ps 1 1
hgb hgb 266 266
bmi bmi 0 0
alk.phos alk.phos 0 0
mdquality.s mdquality.s 0 0
age.ord age.ord 0 0
First 10 differences detected per variable (1741 differences not shown)
var.x var.y case values.x values.y row.x row.y
sex sex 76170 Male Male 26 20
sex sex 76240 Male Male 27 21
sex sex 76431 Female Female 28 22
sex sex 76712 Male Male 29 23
sex sex 76780 Female Female 30 24
sex sex 77066 Female Female 31 25
sex sex 77316 Male Male 32 26
sex sex 77355 Male Male 33 27
sex sex 77591 Male Male 34 28
sex sex 77851 Male Male 35 29
ps ps 86205 0 NA 6 3
hgb hgb 88714 NA -9 192 186
hgb hgb 88955 NA -9 204 198
hgb hgb 89549 NA -9 229 223
hgb hgb 89563 NA -9 231 225
hgb hgb 89584 NA -9 237 231
hgb hgb 89591 NA -9 238 232
hgb hgb 89595 NA -9 239 233
hgb hgb 89647 NA -9 243 237
hgb hgb 89665 NA -9 244 238
hgb hgb 89827 NA -9 255 249
Non-identical attributes
var.x var.y name
arm Arm label
sex sex label
sex sex levels
race race class
race race label
race race levels
bmi bmi label

Column comparison options

Logical tolerance

Use the tol.logical= argument to change how logicals are compared. By default, they’re expected to be equal to each other.

Numeric tolerance

To allow numeric differences of a certain tolerance, use the tol.num= and tol.num.val= options. tol.num.val= determines the maximum (unsigned) difference tolerated if tol.num="absolute" (default), and determines the maximum (unsigned) percent difference tolerated if tol.num="percent".

Also note the option int.as.num=, which determines whether integers and numerics should be compared despite their class difference. If TRUE, the integers are coerced to numeric. Note that mockstudy$ast is integer, while mockstudy2$ast is numeric:

summary(compare(mockstudy, mockstudy2, by = "case",
                tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                int.as.num = TRUE            # compare integers and numerics
))
Variables not shared
version variable position class
x age 2 integer
Other variables not compared
var.x pos.x class.x var.y pos.y class.y
race 5 character race 3 factor
Observations not shared
version case observation
x 88989 9
x 90158 8
x 99508 7
x 112263 5
Differences detected by variable
var.x var.y n NAs
arm Arm 0 0
sex sex 1495 0
fu.time fu_time 0 0
fu.stat fu stat 0 0
ps ps 1 1
hgb hgb 266 266
bmi bmi 0 0
alk.phos alk.phos 0 0
ast ast 3 0
mdquality.s mdquality.s 0 0
age.ord age.ord 0 0
First 10 differences detected per variable (1741 differences not shown)
var.x var.y case values.x values.y row.x row.y
sex sex 76170 Male Male 26 20
sex sex 76240 Male Male 27 21
sex sex 76431 Female Female 28 22
sex sex 76712 Male Male 29 23
sex sex 76780 Female Female 30 24
sex sex 77066 Female Female 31 25
sex sex 77316 Male Male 32 26
sex sex 77355 Male Male 33 27
sex sex 77591 Male Male 34 28
sex sex 77851 Male Male 35 29
ps ps 86205 0 NA 6 3
hgb hgb 88714 NA -9 192 186
hgb hgb 88955 NA -9 204 198
hgb hgb 89549 NA -9 229 223
hgb hgb 89563 NA -9 231 225
hgb hgb 89584 NA -9 237 231
hgb hgb 89591 NA -9 238 232
hgb hgb 89595 NA -9 239 233
hgb hgb 89647 NA -9 243 237
hgb hgb 89665 NA -9 244 238
hgb hgb 89827 NA -9 255 249
ast ast 86205 27 36 6 3
ast ast 105271 100 36 3 2
ast ast 110754 35 36 1 1
Non-identical attributes
var.x var.y name
arm Arm label
sex sex label
sex sex levels
race race class
race race label
race race levels
bmi bmi label

Suppose a tolerance of up to 10 is allowed for ast:

summary(compare(mockstudy, mockstudy2, by = "case",
                tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                int.as.num = TRUE,           # compare integers and numerics
                tol.num.val = 10             # allow absolute differences <= 10
))
Variables not shared
version variable position class
x age 2 integer
Other variables not compared
var.x pos.x class.x var.y pos.y class.y
race 5 character race 3 factor
Observations not shared
version case observation
x 88989 9
x 90158 8
x 99508 7
x 112263 5
Differences detected by variable
var.x var.y n NAs
arm Arm 0 0
sex sex 1495 0
fu.time fu_time 0 0
fu.stat fu stat 0 0
ps ps 1 1
hgb hgb 266 266
bmi bmi 0 0
alk.phos alk.phos 0 0
ast ast 1 0
mdquality.s mdquality.s 0 0
age.ord age.ord 0 0
First 10 differences detected per variable (1741 differences not shown)
var.x var.y case values.x values.y row.x row.y
sex sex 76170 Male Male 26 20
sex sex 76240 Male Male 27 21
sex sex 76431 Female Female 28 22
sex sex 76712 Male Male 29 23
sex sex 76780 Female Female 30 24
sex sex 77066 Female Female 31 25
sex sex 77316 Male Male 32 26
sex sex 77355 Male Male 33 27
sex sex 77591 Male Male 34 28
sex sex 77851 Male Male 35 29
ps ps 86205 0 NA 6 3
hgb hgb 88714 NA -9 192 186
hgb hgb 88955 NA -9 204 198
hgb hgb 89549 NA -9 229 223
hgb hgb 89563 NA -9 231 225
hgb hgb 89584 NA -9 237 231
hgb hgb 89591 NA -9 238 232
hgb hgb 89595 NA -9 239 233
hgb hgb 89647 NA -9 243 237
hgb hgb 89665 NA -9 244 238
hgb hgb 89827 NA -9 255 249
ast ast 105271 100 36 3 2
Non-identical attributes
var.x var.y name
arm Arm label
sex sex label
sex sex levels
race race class
race race label
race race levels
bmi bmi label

Factor tolerance

By default, factors are compared to each other based on both the labels and the underlying numeric levels. Set tol.factor="levels" to match only the numeric levels, or set tol.factor="labels" to match only the labels.

summary(compare(mockstudy, mockstudy2, by = "case",
                tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                int.as.num = TRUE,           # compare integers and numerics
                tol.num.val = 10,            # allow absolute differences <= 10
                tol.factor = "labels"        # match only factor labels
))
Variables not shared
version variable position class
x age 2 integer
Other variables not compared
var.x pos.x class.x var.y pos.y class.y
race 5 character race 3 factor
Observations not shared
version case observation
x 88989 9
x 90158 8
x 99508 7
x 112263 5
Differences detected by variable
var.x var.y n NAs
arm Arm 0 0
sex sex 0 0
fu.time fu_time 0 0
fu.stat fu stat 0 0
ps ps 1 1
hgb hgb 266 266
bmi bmi 0 0
alk.phos alk.phos 0 0
ast ast 1 0
mdquality.s mdquality.s 0 0
age.ord age.ord 0 0
First 10 differences detected per variable (256 differences not shown)
var.x var.y case values.x values.y row.x row.y
ps ps 86205 0 NA 6 3
hgb hgb 88714 NA -9 192 186
hgb hgb 88955 NA -9 204 198
hgb hgb 89549 NA -9 229 223
hgb hgb 89563 NA -9 231 225
hgb hgb 89584 NA -9 237 231
hgb hgb 89591 NA -9 238 232
hgb hgb 89595 NA -9 239 233
hgb hgb 89647 NA -9 243 237
hgb hgb 89665 NA -9 244 238
hgb hgb 89827 NA -9 255 249
ast ast 105271 100 36 3 2
Non-identical attributes
var.x var.y name
arm Arm label
sex sex label
sex sex levels
race race class
race race label
race race levels
bmi bmi label

Also note the option factor.as.char=, which determines whether factors and characters should be compared despite their class difference. If TRUE, the factors are coerced to characters. Note that mockstudy$race is a character, while mockstudy2$race is a factor:

summary(compare(mockstudy, mockstudy2, by = "case",
                tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                int.as.num = TRUE,           # compare integers and numerics
                tol.num.val = 10,            # allow absolute differences <= 10
                tol.factor = "labels",       # match only factor labels
                factor.as.char = TRUE        # compare factors and characters
))
Variables not shared
version variable position class
x age 2 integer
Other variables not compared
No other variables not compared
Observations not shared
version case observation
x 88989 9
x 90158 8
x 99508 7
x 112263 5
Differences detected by variable
var.x var.y n NAs
arm Arm 0 0
sex sex 0 0
race race 1285 0
fu.time fu_time 0 0
fu.stat fu stat 0 0
ps ps 1 1
hgb hgb 266 266
bmi bmi 0 0
alk.phos alk.phos 0 0
ast ast 1 0
mdquality.s mdquality.s 0 0
age.ord age.ord 0 0
First 10 differences detected per variable (1531 differences not shown)
var.x var.y case values.x values.y row.x row.y
race race 76170 Caucasian caucasian 26 20
race race 76240 Caucasian caucasian 27 21
race race 76431 Caucasian caucasian 28 22
race race 76712 Caucasian caucasian 29 23
race race 76780 Caucasian caucasian 30 24
race race 77066 Caucasian caucasian 31 25
race race 77316 Caucasian caucasian 32 26
race race 77591 Caucasian caucasian 34 28
race race 77851 Caucasian caucasian 35 29
race race 77956 Caucasian caucasian 36 30
ps ps 86205 0 NA 6 3
hgb hgb 88714 NA -9 192 186
hgb hgb 88955 NA -9 204 198
hgb hgb 89549 NA -9 229 223
hgb hgb 89563 NA -9 231 225
hgb hgb 89584 NA -9 237 231
hgb hgb 89591 NA -9 238 232
hgb hgb 89595 NA -9 239 233
hgb hgb 89647 NA -9 243 237
hgb hgb 89665 NA -9 244 238
hgb hgb 89827 NA -9 255 249
ast ast 105271 100 36 3 2
Non-identical attributes
var.x var.y name
arm Arm label
sex sex label
sex sex levels
race race class
race race label
race race levels
bmi bmi label

Character tolerance

Use the tol.char= argument to change how character variables are compared. By default, they are compared as-is, but they can be compared after ignoring case or trimming whitespace or both.

summary(compare(mockstudy, mockstudy2, by = "case",
                tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                int.as.num = TRUE,           # compare integers and numerics
                tol.num.val = 10,            # allow absolute differences <= 10
                tol.factor = "labels",       # match only factor labels
                factor.as.char = TRUE,       # compare factors and characters
                tol.char = "case"            # ignore case in character vectors
))
Variables not shared
version variable position class
x age 2 integer
Other variables not compared
No other variables not compared
Observations not shared
version case observation
x 88989 9
x 90158 8
x 99508 7
x 112263 5
Differences detected by variable
var.x var.y n NAs
arm Arm 0 0
sex sex 0 0
race race 0 0
fu.time fu_time 0 0
fu.stat fu stat 0 0
ps ps 1 1
hgb hgb 266 266
bmi bmi 0 0
alk.phos alk.phos 0 0
ast ast 1 0
mdquality.s mdquality.s 0 0
age.ord age.ord 0 0
First 10 differences detected per variable (256 differences not shown)
var.x var.y case values.x values.y row.x row.y
ps ps 86205 0 NA 6 3
hgb hgb 88714 NA -9 192 186
hgb hgb 88955 NA -9 204 198
hgb hgb 89549 NA -9 229 223
hgb hgb 89563 NA -9 231 225
hgb hgb 89584 NA -9 237 231
hgb hgb 89591 NA -9 238 232
hgb hgb 89595 NA -9 239 233
hgb hgb 89647 NA -9 243 237
hgb hgb 89665 NA -9 244 238
hgb hgb 89827 NA -9 255 249
ast ast 105271 100 36 3 2
Non-identical attributes
var.x var.y name
arm Arm label
sex sex label
sex sex levels
race race class
race race label
race race levels
bmi bmi label

Date tolerance

Use the tol.date= argument to change how dates are compared. By default, they’re expected to be equal to each other.

Other data type tolerances

Use the tol.other= argument to change how other objects are compared. By default, they’re expected to be identical().

User-defined tolerance functions

Details

The comparison.control() function accepts functions for any of the tolerance arguments in addition to the short-hand character strings. This allows the user to create custom tolerance functions to suit his/her needs.

Any custom tolerance function must accept two vectors as arguments and return a logical vector of the same length. The TRUEs in the results should correspond to elements which are deemed “different”. Note that the numeric and date tolerance functions should also include a third argument for tolerance size (even if it’s not used).

CAUTION: the results should not include NAs, since the logical vector is used to subset the input data.frames. The tol.NA() function is useful for considering any NAs in the two vectors (but not both) as differences, in addition to other criteria.

tol.NA
function (x, y, idx) 
{
    (is.na(x) & !is.na(y)) | (is.na(y) & !is.na(x)) | (!is.na(x) & 
        !is.na(y) & idx)
}
<environment: namespace:arsenal>

The tol.NA() function is used in all default tolerance functions to help handle NAs.

Example 1

Suppose we want to ignore any dates which are later in the second dataset than the first. We define a custom tolerance function.

my.tol <- function(x, y, tol)
{
  tol.NA(x, y, x > y)
}

date.df1 <- data.frame(dt = as.Date(c("2017-09-07", "2017-08-08", "2017-07-09", NA)))
date.df2 <- data.frame(dt = as.Date(c("2017-10-01", "2017-08-08", "2017-07-10", "2017-01-01")))
n.diffs(compare(date.df1, date.df2)) # default finds any differences
[1] 3
n.diffs(compare(date.df1, date.df2, tol.date = my.tol)) # our function identifies only the NA as different...
[1] 1
n.diffs(compare(date.df2, date.df1, tol.date = my.tol)) # ... until we change the argument order
[1] 3

Example 2

(Continuing our mockstudy example)

Suppose we’re okay with NAs getting replaced by -9.

tol.minus9 <- function(x, y, tol)
{
  idx1 <- is.na(x) & !is.na(y) & y == -9
  idx2 <- tol.num.absolute(x, y, tol) # find other absolute differences
  return(!idx1 & idx2)
}

summary(compare(mockstudy, mockstudy2, by = "case",
                tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                int.as.num = TRUE,           # compare integers and numerics
                tol.num.val = 10,            # allow absolute differences <= 10
                tol.factor = "labels",       # match only factor labels
                factor.as.char = TRUE,       # compare factors and characters
                tol.char = "case",           # ignore case in character vectors
                tol.num = tol.minus9         # ignore NA -> -9 changes
))
Variables not shared
version variable position class
x age 2 integer
Other variables not compared
No other variables not compared
Observations not shared
version case observation
x 88989 9
x 90158 8
x 99508 7
x 112263 5
Differences detected by variable
var.x var.y n NAs
arm Arm 0 0
sex sex 0 0
race race 0 0
fu.time fu_time 0 0
fu.stat fu stat 0 0
ps ps 1 1
hgb hgb 0 0
bmi bmi 0 0
alk.phos alk.phos 0 0
ast ast 1 0
mdquality.s mdquality.s 0 0
age.ord age.ord 0 0
First 10 differences detected per variable
var.x var.y case values.x values.y row.x row.y
ps ps 86205 0 NA 6 3
ast ast 105271 100 36 3 2
Non-identical attributes
var.x var.y name
arm Arm label
sex sex label
sex sex levels
race race class
race race label
race race levels
bmi bmi label

Extract Differences

Differences can be easily extracted using the diffs() function. If you only want to determine how many differences were found, use the n.diffs() function.

cmp <- compare(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), int.as.num = TRUE)
n.diffs(cmp)
[1] 1765
head(diffs(cmp))
  var.x var.y  case values.x values.y row.x row.y
1   sex   sex 76170     Male     Male    26    20
2   sex   sex 76240     Male     Male    27    21
3   sex   sex 76431   Female   Female    28    22
4   sex   sex 76712     Male     Male    29    23
5   sex   sex 76780   Female   Female    30    24
6   sex   sex 77066   Female   Female    31    25

Differences can also be summarized by variable.

diffs(cmp, by.var = TRUE)
         var.x       var.y    n NAs
1          arm         Arm    0   0
2          sex         sex 1495   0
3      fu.time     fu_time    0   0
4      fu.stat     fu stat    0   0
5           ps          ps    1   1
6          hgb         hgb  266 266
7          bmi         bmi    0   0
8     alk.phos    alk.phos    0   0
9          ast         ast    3   0
10 mdquality.s mdquality.s    0   0
11     age.ord     age.ord    0   0

To report differences from only a few variables, one can pass a list of variable names to diffs().

diffs(cmp, vars = c("ps", "ast"), by.var = TRUE)
  var.x var.y n NAs
5    ps    ps 1   1
9   ast   ast 3   0
diffs(cmp, vars = c("ps", "ast"))
     var.x var.y   case values.x values.y row.x row.y
1496    ps    ps  86205        0       NA     6     3
1763   ast   ast  86205       27       36     6     3
1764   ast   ast 105271      100       36     3     2
1765   ast   ast 110754       35       36     1     1

Appendix

Stucture of the Object

(This section is just as much for my use as for yours!)

obj <- compare(mockstudy, mockstudy2, by = "case")

There are two main objects in the "compare.data.frame" object, each with its own print method.

The frame.summary contains:

print(obj$frame.summary)
  version ncol nrow   by        attrs       unique n.shared
1       x   14 1499 case 3 attributes 4 unique obs     1495
2       y   13 1495 case 3 attributes 0 unique obs     1495

The vars.summary contains:

print(obj$vars.summary)
         var.x pos.x         class.x       var.y pos.y         class.y           values        attrs
8         case     1         integer        case     1         integer      by-variable 0 attributes
17         sex     4          factor         sex     2          factor 1495 differences 2 attributes
16        race     5       character        race     3          factor     Not compared 3 attributes
15          ps     8         integer          ps     4         integer    1 differences 0 attributes
13         hgb     9         numeric         hgb     5         numeric  266 differences 0 attributes
7          bmi    10         numeric         bmi     6         numeric    0 differences 1 attributes
4     alk.phos    11         integer    alk.phos     7         integer    0 differences 0 attributes
6          ast    12         integer         ast     8         numeric     Not compared 0 attributes
14 mdquality.s    13         integer mdquality.s     9         integer    0 differences 0 attributes
3      age.ord    14 ordered, factor     age.ord    10 ordered, factor    0 differences 0 attributes
2          age     2         integer        <NA>    NA              NA     Not compared 0 attributes
5          arm     3       character        <NA>    NA              NA     Not compared 0 attributes
11     fu.time     6         integer        <NA>    NA              NA     Not compared 0 attributes
10     fu.stat     7         integer        <NA>    NA              NA     Not compared 0 attributes
12        <NA>    NA              NA     fu_time    11         integer     Not compared 0 attributes
9         <NA>    NA              NA     fu stat    12         integer     Not compared 0 attributes
1         <NA>    NA              NA         Arm    13       character     Not compared 0 attributes