Exploring the 2013 Election data

Ben Marwick, Di Cook, Heike Hofmann, Rob Hyndman,Carson Sievert, Nicholas Tierney, Fang Zhou

2017-05-24

Introduction

eechidna (Electoral Commission Handling Intelligent Data Nationally for Australia) is an R package that makes it easy to look at the data from the 2011 Australian Census, and the 2013 Federal Election.

This vignette documents how to access the data from the 2013 Federal Election (House of Representatives (first preferences and two party preferred). We obtained these data from http://results.aec.gov.au/ . This package incorporates data that is copyright Commonwealth of Australia (Australian Electoral Commission) 2013.

In this vignette we show a few typical methods to explore the data, inspired by Peter Ellis’s work on NZ election data.

2013 Federal Election House of Representatives data

In the eechidna package we have four election datasets, summarised here:

Data object name Description
aec2013_fp first preference votes for candidates at each polling place
aec2013_fp_electorate first preference votes for candidates at each electorate
aec2013_2pp two party preferred votes for candidates at each polling place
aec2013_2pp_electorate two party preferred votes for candidates at each electorate

We can access and explore each of these using common methods in R, for example:

library(eechidna)
library(plyr)
library(dplyr)
library(purrr)
library(knitr)
library(broom)
library(tidyr)
library(purrrlyr)
# inspect the data frame
glimpse(aec2013_fp)
## Observations: 82,398
## Variables: 26
## $ ID                 <int> 101, 101, 101, 101, 101, 101, 101, 101, 101...
## $ Electorate         <chr> "Canberra", "Canberra", "Canberra", "Canber...
## $ PollingPlaceID     <int> 8829, 8829, 8829, 8829, 8829, 8829, 8829, 6...
## $ PollingPlace       <chr> "Barton", "Barton", "Barton", "Barton", "Ba...
## $ CandidateID        <int> 24857, 24895, 24239, 24118, 24122, 24238, 9...
## $ Surname            <chr> "SEFTON", "MELROSE", "HANLEY", "MAHER", "BR...
## $ GivenNm            <chr> "Tom", "Julie", "Tony", "Damien", "Gai", "N...
## $ BallotPosition     <int> 1, 2, 3, 4, 5, 6, 999, 1, 2, 3, 4, 5, 6, 99...
## $ Elected            <chr> "N", "N", "N", "N", "Y", "N", "N", "N", "N"...
## $ HistoricElected    <chr> "N", "N", "N", "N", "Y", "N", "N", "N", "N"...
## $ PartyAb            <chr> "LP", "GRN", "PUP", "BTA", "ALP", "SPA", ""...
## $ PartyNm            <chr> "Liberal", "The Greens", "Palmer United Par...
## $ OrdinaryVotes      <int> 742, 338, 40, 88, 795, 23, 69, 290, 143, 40...
## $ Swing              <dbl> 4.06, -7.89, 1.97, 4.34, -3.64, 1.14, -0.19...
## $ State              <chr> "ACT", "ACT", "ACT", "ACT", "ACT", "ACT", "...
## $ PollingPlaceTypeID <int> 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 5, 5...
## $ PremisesNm         <chr> "Telopea Park School", "Telopea Park School...
## $ PremisesAddress1   <chr> "New South Wales Cres", "New South Wales Cr...
## $ PremisesAddress2   <chr> "", "", "", "", "", "", "", "", "", "", "",...
## $ PremisesAddress3   <chr> "", "", "", "", "", "", "", "", "", "", "",...
## $ PremisesSuburb     <chr> "BARTON", "BARTON", "BARTON", "BARTON", "BA...
## $ PremisesStateAb    <chr> "ACT", "ACT", "ACT", "ACT", "ACT", "ACT", "...
## $ PremisesPostCode   <int> 2600, 2600, 2600, 2600, 2600, 2600, 2600, 2...
## $ Latitude           <dbl> -35.31510, -35.31510, -35.31510, -35.31510,...
## $ Longitude          <dbl> 149.1350, 149.1350, 149.1350, 149.1350, 149...
## $ uid                <chr> "1018829SEFTONTom7424.06", "1018829MELROSEJ...
# show the first few rows
head(aec2013_fp) %>% kable
ID Electorate PollingPlaceID PollingPlace CandidateID Surname GivenNm BallotPosition Elected HistoricElected PartyAb PartyNm OrdinaryVotes Swing State PollingPlaceTypeID PremisesNm PremisesAddress1 PremisesAddress2 PremisesAddress3 PremisesSuburb PremisesStateAb PremisesPostCode Latitude Longitude uid
101 Canberra 8829 Barton 24857 SEFTON Tom 1 N N LP Liberal 742 4.06 ACT 1 Telopea Park School New South Wales Cres BARTON ACT 2600 -35.3151 149.135 1018829SEFTONTom7424.06
101 Canberra 8829 Barton 24895 MELROSE Julie 2 N N GRN The Greens 338 -7.89 ACT 1 Telopea Park School New South Wales Cres BARTON ACT 2600 -35.3151 149.135 1018829MELROSEJulie338-7.89
101 Canberra 8829 Barton 24239 HANLEY Tony 3 N N PUP Palmer United Party 40 1.97 ACT 1 Telopea Park School New South Wales Cres BARTON ACT 2600 -35.3151 149.135 1018829HANLEYTony401.97
101 Canberra 8829 Barton 24118 MAHER Damien 4 N N BTA Bullet Train For Australia 88 4.34 ACT 1 Telopea Park School New South Wales Cres BARTON ACT 2600 -35.3151 149.135 1018829MAHERDamien884.34
101 Canberra 8829 Barton 24122 BRODTMANN Gai 5 Y Y ALP Australian Labor Party 795 -3.64 ACT 1 Telopea Park School New South Wales Cres BARTON ACT 2600 -35.3151 149.135 1018829BRODTMANNGai795-3.64
101 Canberra 8829 Barton 24238 BURT Nicolle 6 N N SPA Secular Party of Australia 23 1.14 ACT 1 Telopea Park School New South Wales Cres BARTON ACT 2600 -35.3151 149.135 1018829BURTNicolle231.14

We can see that the data contains 82,398 rows and 26 columns. Each of the four data objects can be accessed and inspected in this way. Note that some parties have slightly different names in different states. For example there is ‘The Nationals’ and the ‘National Party’. We’ve simplified the data by renaming some of the parties to the varient of the name that won the most votes.

Which party won the election?

We can summarise the data to discover some basic details about the election. Let’s start by reproducing the overall result of the election by finding out which party won the most electorates according to the two party preferred votes:

who_won <- aec2013_2pp_electorate %>% 
  group_by(PartyNm) %>% 
  tally() %>% 
  arrange(desc(n)) 

# inspect
who_won %>% 
  kable()
PartyNm n
Liberal 58
Australian Labor Party 55
Liberal National Party of Queensland 22
The Nationals 9
Independent 2
Country Liberals (NT) 1
Katter’s Australian Party 1
Palmer United Party 1
The Greens 1
# plot
library(ggplot2)
library(scales)
ggplot(who_won, 
       aes(reorder(PartyNm, n), 
           n)) +
  geom_point(size = 2) + 
  coord_flip() + 
  scale_y_continuous(labels = comma) +
  theme_bw() +
  ylab("Total number of electorates") +
  xlab("Party") +
  theme(text = element_text(size=10))

This result above matches the official AEC result, showing the Liberal party winning the most electorates.

Which party received the most ordinary votes?

An alternative way to evaluate the outcome of the election is by counting the number of ordinary first preference votes for each party (not including postal votes, preference flows, etc.). Here we can find the total number of ordinary votes for each party:

total_votes_for_parties <- aec2013_fp %>% 
  select(PartyNm, OrdinaryVotes) %>% 
  group_by(PartyNm) %>% 
  summarise(total_votes = sum(OrdinaryVotes, rm.na = TRUE)) %>% 
  ungroup() %>%
  arrange(desc(total_votes))

#  inspect
total_votes_for_parties %>% 
  head %>% # just the first 10 rows
  kable
PartyNm total_votes
Australian Labor Party 3611179
Liberal 3392461
Liberal National Party of Queensland 932725
The Greens 898411
Informal 709482
Palmer United Party 595217

The table above shows a similar result, with the Liberal Party recieving the most ordinary votes.

We can plot the ordinary vote count like so:

ggplot(total_votes_for_parties, 
       aes(reorder(PartyNm, total_votes), 
           total_votes)) +
  geom_point(size = 2) + 
  coord_flip() + 
  scale_y_continuous(labels = comma) +
  theme_bw() +
  ylab("Total ordinary votes") +
  xlab("Party") +
  theme(text = element_text(size=10))

Which candidate did the best?

There are a few ways we can answer this question. We can see who received the highest number of ordinary votes, like this:

who_most_votes <- 
aec2013_fp %>% 
  filter(CandidateID != 999) %>% #exclude informal votes
  mutate(candidate_full_name = paste0(GivenNm, " ", Surname, " (", CandidateID, ")")) %>% 
  group_by(candidate_full_name) %>% 
  summarise(total_votes_for_candidate = sum(OrdinaryVotes, rm.na = TRUE)) %>% 
  arrange(desc(total_votes_for_candidate))
# inspect
who_most_votes %>% 
  mutate(total_votes_for_candidate = 
           prettyNum(total_votes_for_candidate, 
                     big.mark = ","))  %>% 
  head %>% 
  kable
candidate_full_name total_votes_for_candidate
Andrew LEIGH (24147) 50,592
Mark COULTON (23248) 49,239
Alex HAWKE (23206) 48,833
Scott MORRISON (23219) 48,337
John COBB (23245) 47,929
Bronwyn BISHOP (23184) 47,911

But this is a crude measure of how successful a candidate was in the election, because electorates with more people will result in more votes for the winner, even if the victory is only by a tiny margin. So let’s remove the effect of the population size in the electorates by finding out who won the highest proportion of votes in their electorate.

who_most_votes_prop <- 
aec2013_fp %>% 
  filter(CandidateID != 999) %>% #exclude informal votes
  mutate(candidate_full_name = 
           paste0(GivenNm, " ", Surname, " (", CandidateID, ")")) %>% 
  group_by(Electorate, candidate_full_name) %>% 
  summarise(sum_votes = sum(OrdinaryVotes))  %>% 
  mutate(prop_votes = round(sum_votes / sum(sum_votes), 3),
         sum_votes = prettyNum(sum_votes, ",")) %>% 
  ungroup %>% 
  arrange(desc(prop_votes))

# inspect
who_most_votes_prop %>% 
  data.frame %>% 
  head %>% 
  kable
Electorate candidate_full_name sum_votes prop_votes
Mitchell Alex HAWKE (23206) 48,832 0.659
Bradfield Paul FLETCHER (23200) 47,869 0.653
Parkes Mark COULTON (23248) 49,238 0.632
Wentworth Malcolm TURNBULL (23199) 46,367 0.631
Mackellar Bronwyn BISHOP (23184) 47,910 0.624
Curtin Julie BISHOP (23840) 41,589 0.623

In the above table (which shows just the first ten rows for convienience) we see that Alex Hawke received the highest proportion of the vote in any electorate, as well as a few more familiar names doing very well in their electorates. We can see that taking about 65% of the vote in an electorate counts as a huge win. But what is the smallest winning margin that occured in the 2013 election? Let’s find the candidates who won their electorate by the slimmest advantage over their competitors.

Which candidate won with the smallest number of votes?

who_won_least_votes_prop <- 
 aec2013_fp %>% 
   filter(CandidateID != 999) %>% # keep only the winners
   mutate(candidate_full_name = 
            paste0(GivenNm, " ", Surname, " (", CandidateID, ")")) %>% 
   group_by(Electorate, candidate_full_name) %>% 
   summarise(sum_votes = sum(OrdinaryVotes)) %>% 
   mutate(prop_votes = round(sum_votes / sum(sum_votes), 2)) %>% 
   ungroup %>% 
   left_join(aec2013_2pp_electorate %>% 
               mutate(candidate_full_name = 
            paste0(GivenNm, " ", Surname, " (", CandidateID, ")")), "candidate_full_name") %>% 
   filter(Elected == "Y") %>% 
   select(Electorate.x, 
          candidate_full_name, 
          prop_votes, 
          PartyNm) %>% 
   arrange(prop_votes)

# have a look
who_won_least_votes_prop %>%
 head %>%
 kable
Electorate.x candidate_full_name prop_votes PartyNm
Fairfax Clive Frederick PALMER (23396) 0.27 Palmer United Party
Kennedy Bob KATTER (24318) 0.29 Katter’s Australian Party
Melbourne Ports Michael DANBY (23314) 0.32 Australian Labor Party
Indi Cathy McGOWAN (23288) 0.33 Independent
Richmond Justine ELLIOT (24531) 0.34 Australian Labor Party
Bendigo Lisa CHESTERS (23318) 0.37 Australian Labor Party

The table above shows that Clive Palmer won his electorate by the smallest margin, just 27% of the ordinary votes. Bob Katter also won with a very small margin, 29% of the first preferences. Let’s see the distribution of votes in Fairfax to see a bit more detail:

fairfax <- 
aec2013_fp %>% 
  filter(CandidateID != 999) %>% # exclude informal votes
  filter(Electorate == "Fairfax") %>% 
  mutate(candidate_full_name = 
           paste0(GivenNm, " ", 
                  Surname, 
                  " (", 
                  CandidateID, 
                  ")")) %>% 
  group_by(Electorate, candidate_full_name) %>% 
  summarise(sum_votes = sum(OrdinaryVotes)) %>% 
  mutate(prop_votes = round(sum_votes / sum(sum_votes), 2),
         sum_votes = prettyNum(sum_votes, big.mark = ",")) %>% 
  ungroup %>% 
  arrange(desc(prop_votes))

# inspect
fairfax %>% 
  head %>% 
  kable
Electorate candidate_full_name sum_votes prop_votes
Fairfax Ted O’BRIEN (23792) 28,590 0.41
Fairfax Clive Frederick PALMER (23396) 19,110 0.27
Fairfax Elaine Lexie HUGHES (23746) 12,612 0.18
Fairfax David KNOBEL (23946) 5,758 0.08
Fairfax Angela MEYER (24815) 1,143 0.02
Fairfax Ray SAWYER (24322) 1,245 0.02

We see that the majority of ordinary votes of first preferences in Fairfax actually went to Ted O’Brien, not to Clive Palmer. Palmer’s win is due to the preference flow from minor conservative parties, details of this are here: http://www.tallyroom.com.au/19908

How did each electorate vote in each state?

We can look at the proportion of voters for a particular party in each electorate, grouped by each state. This plot was inspired by Peter Ellis’s work on the NZ election data.

# summarise and compute proportion of votes for a particular party
p <- aec2013_fp %>%
    filter(CandidateID != 999) %>% # exclude informal votes
  group_by(Electorate, State) %>%
  summarise(
    TotalVotes = sum(OrdinaryVotes),
    ProportionLabor = round(sum(OrdinaryVotes[PartyNm == "Australian Labor Party"]) / TotalVotes, 3)) %>%
  filter(TotalVotes != 0) %>% 
  arrange(desc(ProportionLabor)) %>% 
  group_by(State) %>% 
  # send the data to the plotting function
  do(plots=ggplot(data = .) + 
       aes(x = ProportionLabor, 
           y = reorder(Electorate, ProportionLabor), 
           size = TotalVotes, 
           label = State) +
       geom_point() +
       ylab("Electorate") +
       labs(title = .$State) + 
       scale_x_continuous("Proportion voting Labor Party", 
                          label = percent) +
       scale_size("Number of\nvotes cast", 
                  label = comma)  +
       theme_bw() +
       theme(text = element_text(size=10)))

# draw the plots
library(gridExtra)
n <- length(p$plots)
nCol <- floor(sqrt(n))
do.call("grid.arrange", c(p$plots, ncol=nCol))

How are party votes correlated?

We can also see how party votes are correlated, for example here we can see a strong negative correlation between votes for Labor and votes for Liberal (this plot also inspired by Peter Ellis). Green and Labor have a slight negative correlation, and Green and Liberal have a slightly more negative correlation:

# Comparing party and candidate votes of several parties -------
proportions <- aec2013_fp %>%
  filter(CandidateID != 999) %>% # exclude informal votes
  group_by(Electorate) %>%
  summarise(Prop_Labour = sum(OrdinaryVotes[PartyNm == "Australian Labor Party"]) / sum(OrdinaryVotes),
            Prop_Coalition = sum(
              OrdinaryVotes[PartyNm == "Liberal"],
              OrdinaryVotes[PartyNm == " Country Liberal Party"],
              OrdinaryVotes[PartyNm == "The Nationals"],
              OrdinaryVotes[PartyNm == "Country Liberals (NT)"]
              ) / sum(OrdinaryVotes),
            Prop_Greens = sum(OrdinaryVotes[PartyNm == "The Greens"]) / sum(OrdinaryVotes)) 

# make a scatterplot matrix
library(GGally)
ggpairs(proportions, columns = 2:ncol(proportions)) + theme_bw()

How do election results correlate with census data?

One of our motivations for creating this package was to make it easy to explore relationships between political views and socio-economic conditions across Australia. With a few simple R functions, we can join the census data to the election data and build models to explore relationships.

Here’s how we join the two datasets together, the common variable is the Electorate:

census_and_election <- left_join(aec2013_2pp_electorate, 
                                 abs2011,
                                 by = c("Electorate" = "Electorate"))

Now we can easily explore some relationships between the census data and the two party preferring voting patterns. For example, what census variable mostly strongly correlates with the percentage that vote Labor? The first step in these kinds of explorations is always to visualise the data:

# subset only the columns we want for the model
census_and_election_subset <-  
  census_and_election %>% 
  ungroup %>% 
  select(Electorate, 
         Average_Australian_Labor_Party_Percentage_in_electorate,
         Population:NotOwned)  %>% 
  rename(ALP_perc = Average_Australian_Labor_Party_Percentage_in_electorate)

library(corrplot)
M <- cor(census_and_election_subset[, c(2:ncol(census_and_election_subset))], 
         use = "pairwise.complete.obs")
corrplot.mixed(M, 
               lower="ellipse", 
               upper="number", 
               tl.pos = "lt",
               tl.cex = 0.5,
               tl.col = "black",
               number.cex= 0.5)

It’s a little hard to read because we have so many variables. But we can spot some potentially interesting correlations, as well as some obvious ones.

The table below gives an overall summary of the model containing all the census variables. We see that the p-value is very low, indicating that we have some interactions that are probably not due to chance alone.

options(scipen = 10) # for more readable numbers
census_variables <- names(abs2011)[-c(1:3)]

# compute the multiple regressions
mutliple_regression_model <- 
  census_and_election %>% 
  ungroup %>% 
  select(
         Average_Australian_Labor_Party_Percentage_in_electorate,
         Population:NotOwned) %>% 
  lm(Average_Australian_Labor_Party_Percentage_in_electorate ~ ., 
     data = . )

mutliple_regression_model %>% 
  glance %>% 
  dmap(round, 3) %>% 
  kable
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.748 0.677 6.38 10.648 0 33 -465.595 999.189 1101.095 4680.475 115

The table below shows the variables that have a significant effect on the percentage of Labor votes across all electorates:

# find the variables with a significant effect
mutliple_regression_model %>% 
  tidy %>% 
  filter(p.value < 0.05) %>% 
  dmap_if(is.numeric, round, 3) %>% 
  arrange(p.value) %>% 
  kable
term estimate std.error statistic p.value
Unemployed 4.197 1.017 4.127 0.000
NotOwned -2.532 0.693 -3.651 0.000
EnglishOnly -0.764 0.230 -3.324 0.001
Internet 3.285 1.001 3.280 0.001
Bachelor -1.607 0.706 -2.277 0.025

In the plots below, we can see these census variables that have a statistically significant relationship with the percentage of Labor votes. There seems to be a positive relationship voting Labor and having a Bachelor degree, having internet access, being unemployed and not owning a home. We see EnglishOnly as a negatively correlated variable, meaning that increases in the proportion of households that speak English only correlate with decreases in the percentage of the Labor vote in that electorate. In general, more educated and multicultural electorates tend to vote more for Labor.

# model that includes all census variables
all_vars <- mutliple_regression_model %>% 
  tidy %>% 
  filter(p.value < 0.05) %>% 
  arrange(p.value) 

census_and_election %>% 
  ungroup %>% 
  select_(.dots = all_vars$term,
          "Average_Australian_Labor_Party_Percentage_in_electorate")  %>% 
  gather(variable, 
         value, 
         -Average_Australian_Labor_Party_Percentage_in_electorate) %>% 
  ggplot(aes(value, Average_Australian_Labor_Party_Percentage_in_electorate)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap( ~ variable, scales = "free") +
  theme_bw()

Senate results

A quick look at the senate results for 2013, using barchart by overall composition of the senate, and by state.

library(readr)
senate <- read_csv("http://results.aec.gov.au/17496/Website/Downloads/SenateSenatorsElectedDownload-17496.csv", 
    skip = 1)

coalition <- c("Country Liberals (NT)", "Liberal", "Liberal National Party of Queensland", 
    "The Nationals")

labor <- c("Australian Labor Party", "Australian Labor Party (Northern Territory) Branch", 
    "Labor")

greens <- c("The Greens", "Australian Greens", "The Greens (WA)")

senate <- senate %>% mutate(PartyNm = ifelse(as.character(PartyNm) %in% coalition, 
    "Liberal National Coalition", PartyNm))

senate <- senate %>% mutate(PartyNm = ifelse(as.character(PartyNm) %in% labor, 
    "Australian Labor Party", PartyNm))

senate <- senate %>% mutate(PartyNm = ifelse(as.character(PartyNm) %in% greens, 
    "Australian Greens", PartyNm))

senate$PartyNm <- factor(senate$PartyNm, levels = names(sort(table(senate$PartyNm), 
    decreasing = T)))

# Order = Liberal National Coalition, Australian Labor Party, The Greens,
# Palmer, motoring, sports, family first, Lib Dems, Nick Xenophon
partycolours = c("#0066CC", "#FF0033", "#009900", "#FFFF00", "#00008B", "#0000FF", 
    "#87CEFA", "#C71585", "#FF4500")

ggplot(data = senate, aes(x = PartyNm, fill = PartyNm)) + geom_bar() + xlab("") + 
    ylab("") + scale_fill_manual(name = "Party", values = partycolours) + coord_flip() + 
    theme(legend.position = "None")

senate$StateAb <- factor(senate$StateAb, levels = c("QLD", "NSW", "WA", "VIC", 
    "SA", "TAS", "ACT", "NT"))

ggplot(data = senate, aes(x = StateAb, fill = PartyNm)) + geom_bar() + xlab("") + 
    ylab("") + scale_fill_manual(name = "Party", values = partycolours)