Mean Relative Indifference

A Comparison of R Comparison Functions

Brodie Gaslam

Most R object comparison functions are good at telling you that objects are different, but less so at conveying how they are different. I wrote diffobj to provide an “aha, that’s how they are different” comparison. In this vignette I will compare diffPrint to all.equal and to testthat::compare.

Disclaimer: I picked the examples here to showcase diffobj capabilities, not to carry out a fair and balanced comparison of these comparison functions. Nonetheless, I hope you will find the examples representative of common situations where comparison of R objects is useful.

Vectors

I defined four pairs of numeric vectors for us to compare. I purposefully hid the variable definitions to simulate a comparison of unknown objects.

Stage 1

all.equal(A1, B1)
## [1] "Mean relative difference: 0.1"

The objects are different… At this point I would normally print both A1 and B1 to try to figure out how that difference came about since the “mean relative difference” is unhelpful.

testthat::compare(A1, B1)
## 1/10 mismatches
## [10] 10 - 11 == -1

testthat::compare does a better job, but I still feel the need to look at A1 and B1.

diffPrint(A1, B1)
@@ 1 @@
@@ 1 @@
<
[1] 1 2 3 4 5 6 7 8 9 10
>
[1] 1 2 3 4 5 6 7 8 9 11

Aha, that’s how they are different!

Stage 2

Let’s up the difficulty a little bit:

testthat::compare(A2, B2)
## 20/20 mismatches (average diff: 1.9)
## [1] 1 - 20 == -19
## [2] 2 -  1 ==   1
## [3] 3 -  2 ==   1
## [4] 4 -  3 ==   1
## [5] 5 -  4 ==   1
## [6] 6 -  5 ==   1
## [7] 7 -  6 ==   1
## [8] 8 -  7 ==   1
## [9] 9 -  8 ==   1
## ...

If you look closely you will see that despite a reported 20/20 differences, the two vectors are actually similar, at least in the part visible part of the output. With diffPrint it is obvious that B2 and is the same as A2, except that the last value has been moved to the first position:

diffPrint(A2, B2)
@@ 1,2 @@
@@ 1,2 @@
<
[1] 1 2 3 4 5 6 7 8 9 10 11
>
[1] 20 1 2 3 4 5 6 7 8 9 10
<
[12] 12 13 14 15 16 17 18 19 20
>
[12] 11 12 13 14 15 16 17 18 19

Stage 3

testthat::compare throws in the towel as soon as lengths are unequal:

testthat::compare(A3, B3)
## Lengths differ: 20 is not 21

all.equal does the same. diffPrint is unfazed:

diffPrint(A3, B3)
@@ 1,2 @@
@@ 1,2 @@
<
[1] 1 2 3 4 5 6 7 8 9 10 11
>
[1] 20 21 1 2 3 4 5 6 7 8 9
<
[12] 12 13 14 15 16 17 18 19 20
>
[12] 10 11 12 13 14 15 16 17 18 19

diffPrint also produces useful output for largish vectors:

A4 <- 1:1e4
B4 <- c(1e4 + 1, A4[-c(4:7, 9e3)])
diffPrint(A4, B4)
@@ 1,4 @@
@@ 1,4 @@
<
[1] 1 2 3 4 5
>
[1] 10001 1 2 3 8
<
[6] 6 7 8 9 10
>
[6] 9 10 11 12 13
 
[11] 11 12 13 14 15
 
[11] 14 15 16 17 18
 
[16] 16 17 18 19 20
 
[16] 19 20 21 22 23
@@ 1798,5 @@
@@ 1798,5 @@
 
[8986] 8986 8987 8988 8989 8990
 
[8986] 8989 8990 8991 8992 8993
 
[8991] 8991 8992 8993 8994 8995
 
[8991] 8994 8995 8996 8997 8998
<
[8996] 8996 8997 8998 8999 9000
>
[8996] 8999 9001 9002 9003 9004
 
[9001] 9001 9002 9003 9004 9005
 
[9001] 9005 9006 9007 9008 9009
 
[9006] 9006 9007 9008 9009 9010
 
[9006] 9010 9011 9012 9013 9014

Do note that the comparison algorithm scales with the square of the number of differences, so very large and different vectors will be slow to process.

Objects

R Core and package authors put substantial effort into print and show methods. diffPrint takes advantage of this. Compare:

all.equal(iris, iris[-60,])
## [1] "Attributes: < Component \"row.names\": Numeric: lengths (150, 149) differ >"
## [2] "Component \"Sepal.Length\": Numeric: lengths (150, 149) differ"             
## [3] "Component \"Sepal.Width\": Numeric: lengths (150, 149) differ"              
## [4] "Component \"Petal.Length\": Numeric: lengths (150, 149) differ"             
## [5] "Component \"Petal.Width\": Numeric: lengths (150, 149) differ"              
##  [ reached getOption("max.print") -- omitted 3 entries ]

to:

diffPrint(iris, iris[-60,])
@@ 59,5 / 59,4 @@
~
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
 
58 4.9 2.4 3.3 1.0 versicolor
 
59 6.6 2.9 4.6 1.3 versicolor
<
60 5.2 2.7 3.9 1.4 versicolor
 
61 5.0 2.0 3.5 1.0 versicolor
 
62 5.9 3.0 4.2 1.5 versicolor

And:

all.equal(lm(hp ~ disp, mtcars), lm(hp ~ cyl, mtcars))
## [1] "Component \"coefficients\": Names: 1 string mismatch"          
## [2] "Component \"coefficients\": Mean relative difference: 2.778944"
## [3] "Component \"residuals\": Mean relative difference: 0.7074011"  
## [4] "Component \"effects\": Names: 1 string mismatch"               
## [5] "Component \"effects\": Mean relative difference: 0.5907086"    
##  [ reached getOption("max.print") -- omitted 9 entries ]

to:

diffPrint(lm(hp ~ disp, mtcars), lm(hp ~ cyl, mtcars))
@@ 1,8 @@
@@ 1,8 @@
 
 
 
 
 
Call:
 
Call:
<
lm(formula = hp ~ disp, data = mtcars)
>
lm(formula = hp ~ cyl, data = mtcars)
 
 
 
 
 
Coefficients:
 
Coefficients:
<
(Intercept) disp
>
(Intercept) cyl
<
45.7345 0.4376
>
-51.05 31.96
 
 
 
 

In these examples I limited all.equal output to five lines for the sake of brevity. Also, since testthat::compare reverts to all.equal output with more complex objects I omit it from this comparison.

Parting Thoughts

Another candidate comparison function is compare::compare. I omitted it from this vignette because it focuses more on similarities than on differences. Additionally, testthat::compare and compare::compare print methods conflict so they cannot be used together.

For a more thorough exploration of diffobj methods and their features please see the primary diffobj vignette.