United Nations Voting Correlations

David Robinson

2017-08-13

Here we’ll examine an example application of the widyr package, particularly the pairwise_cor and pairwise_dist functions. We’ll use the data on United Nations General Assembly voting from the unvotes package:

library(dplyr)
library(unvotes)

un_votes
## # A tibble: 738,764 x 4
##     rcid                  country country_code   vote
##    <int>                    <chr>        <chr> <fctr>
##  1     3 United States of America           US    yes
##  2     3                   Canada           CA     no
##  3     3                     Cuba           CU    yes
##  4     3                    Haiti           HT    yes
##  5     3       Dominican Republic           DO    yes
##  6     3                   Mexico           MX    yes
##  7     3                Guatemala           GT    yes
##  8     3                 Honduras           HN    yes
##  9     3              El Salvador           SV    yes
## 10     3                Nicaragua           NI    yes
## # ... with 738,754 more rows

This dataset has one row for each country for each roll call vote. We’re interested in finding pairs of countries that tended to vote similarly.

Pairwise correlations

Notice that the vote column is a factor, with levels (in order) “yes”, “abstain”, and “no”:

levels(un_votes$vote)
## [1] "yes"     "abstain" "no"

We may then be interested in obtaining a measure of country-to-country agreement for each vote, using the pairwise_cor function.

library(widyr)

cors <- un_votes %>%
  mutate(vote = as.numeric(vote)) %>%
  pairwise_cor(country, rcid, vote, use = "pairwise.complete.obs", sort = TRUE)

cors
## # A tibble: 39,800 x 3
##             item1          item2 correlation
##             <chr>          <chr>       <dbl>
##  1       Slovakia Czech Republic   0.9888333
##  2 Czech Republic       Slovakia   0.9888333
##  3      Lithuania        Estonia   0.9714049
##  4        Estonia      Lithuania   0.9714049
##  5      Lithuania         Latvia   0.9696069
##  6         Latvia      Lithuania   0.9696069
##  7        Germany  Liechtenstein   0.9677790
##  8  Liechtenstein        Germany   0.9677790
##  9       Slovakia       Slovenia   0.9657651
## 10       Slovenia       Slovakia   0.9657651
## # ... with 39,790 more rows

We could, for example, find the countries that the US is most and least in agreement with:

US_cors <- cors %>%
  filter(item1 == "United States of America")

# Most in agreement
US_cors
## # A tibble: 199 x 3
##                       item1                                                item2 correlation
##                       <chr>                                                <chr>       <dbl>
##  1 United States of America United Kingdom of Great Britain and Northern Ireland   0.5755822
##  2 United States of America                                               Canada   0.5594441
##  3 United States of America                                               Israel   0.5401690
##  4 United States of America                                          Netherlands   0.5154255
##  5 United States of America                                           Luxembourg   0.5049859
##  6 United States of America                                            Australia   0.5018343
##  7 United States of America                                              Belgium   0.4964066
##  8 United States of America                                                Italy   0.4666960
##  9 United States of America                                          New Zealand   0.4581041
## 10 United States of America                                                Japan   0.4577422
## # ... with 189 more rows
# Least in agreement
US_cors %>%
  arrange(correlation)
## # A tibble: 199 x 3
##                       item1                item2 correlation
##                       <chr>                <chr>       <dbl>
##  1 United States of America              Belarus  -0.3584770
##  2 United States of America       Czechoslovakia  -0.3297787
##  3 United States of America                 Cuba  -0.3061703
##  4 United States of America   Russian Federation  -0.3006679
##  5 United States of America                Egypt  -0.2467654
##  6 United States of America                India  -0.2430560
##  7 United States of America Syrian Arab Republic  -0.2380441
##  8 United States of America          Afghanistan  -0.2289134
##  9 United States of America              Ukraine  -0.2251402
## 10 United States of America  Yemen Arab Republic  -0.2244060
## # ... with 189 more rows

This can be particularly useful when visualized on a map.

library(maps)
library(fuzzyjoin)
library(countrycode)
library(ggplot2)

world_data <- map_data("world") %>%
  regex_full_join(iso3166, by = c("region" = "mapname")) %>%
  filter(region != "Antarctica")
US_cors %>%
  mutate(a2 = countrycode(item2, "country.name", "iso2c")) %>%
  full_join(world_data, by = "a2") %>%
  ggplot(aes(long, lat, group = group, fill = correlation)) +
  geom_polygon(color = "gray", size = .1) +
  scale_fill_gradient2() +
  coord_quickmap() +
  theme_void() +
  labs(title = "Correlation of each country's UN votes with the United States",
       subtitle = "Blue indicates agreement, red indicates disagreement",
       fill = "Correlation w/ US")

Visualizing clusters in a network

Another useful kind of visualization is a network plot, which can be created with Thomas Pedersen’s ggraph package. We can filter for pairs of countries with correlations above a particular threshold.

library(ggraph)
library(igraph)

cors_filtered <- cors %>%
  filter(correlation > .6)

continents <- data_frame(country = unique(un_votes$country)) %>%
  filter(country %in% cors_filtered$item1 |
         country %in% cors_filtered$item2) %>%
  mutate(continent = countrycode(country, "country.name", "continent"))

set.seed(2017)

cors_filtered %>%
  graph_from_data_frame(vertices = continents) %>%
  ggraph() +
  geom_edge_link(aes(edge_alpha = correlation)) +
  geom_node_point(aes(color = continent), size = 3) +
  geom_node_text(aes(label = name), check_overlap = TRUE, vjust = 1, hjust = 1) +
  theme_void() +
  labs(title = "Network of countries with correlated United Nations votes")

Choosing the threshold for filtering correlations (or other measures of similarity) typically requires some trial and error. Setting too high a threshold will make a graph too sparse, while too low a threshold will make a graph too crowded.