2.5 A further example: the Scottish independence referendum


Example: The Scottish Referendum on Independence
The results of the Scottish referendum on independence were of enormous interest to the UK. The Financial Times reported the voting patterns separately by Council area, together with a variety of characteristics of each Council area. The file scottish-referendum.dat contains this information, together with data on population from the General Register Office for Scotland. Specifically, the information in this file is:

Scottish.Council the council name
Voted.no the percentage of people who voted no
Turnout the percentage of people who voted
Population the number of people eligible to vote
Unemployment.rate the percentage unemployed
Scottish.identity.only the percentage who identify themselves as Scottish only
Aged.16 the percentage who are 16 years of age
Aged.over.50 the percentage who are over 50 years of age
Aged.over.65 the percentage who are over 65 years of age

We can use this information to explore the demographic and geographic patterns of voting.


The first thing we need to do is to read the data. The rp.datalink function from the rpanel package provides a convenient way of locating the dataset, as described in Section 2.4 above. Here we also use the head function to inspect the first few rows.

path <- rp.datalink('scottish_referendum')
ref  <- read.table(path, header = TRUE)
head(ref)
##        Scottish.Council Voted.no Turnout Population Unemployment.rate
## 1              Aberdeen    58.61   81.75     227130               1.2
## 2         Aberdeenshire    60.36   87.19     257740               0.7
## 3                 Angus    56.32   85.84     116240               2.0
## 4       Argyll and Bute    58.52   88.21      88050               1.9
## 5      Clackmannanshire    53.80   88.59      51280               3.8
## 6 Dumfries and Galloway    65.67   87.49     150270               2.4
##   Scottish.identity.only Aged.16 Aged.over.50 Aged.over.65
## 1                  54.73    0.83        32.13        14.38
## 2                  61.27    1.20        37.29        16.07
## 3                  66.81    1.20        41.31        19.88
## 4                  57.39    1.14        44.67        21.93
## 5                  66.98    1.17        36.61        15.96
## 6                  59.62    1.19        44.07        21.84

R can act as a simple calculator. For example, consider the data recording the proportion of people voting ‘no’ and the proportion of people who voted. We can identify the number of people who voted ‘no’ in each region, and then confirm the percentage of those who voted ‘no’ across the country, by

ref$Vote <- ref$Population * ref$Turnout / 100
sum(ref$Voted.no * ref$Vote) / sum(ref$Vote)
## [1] 55.31357

The first instruction multiplies each population by the corresponding turnout proportion to find the number of people who voted in each region. Notice that the operation is performed for each element of ref$Population and the corresponding element of ref$Turnout. The resulting vector of numbers is stored in a new component of the ref dataframe, with the variable name Vote. The second instruction calculates a weighted average of the percentages voting ‘no’, using the number of people voting in each region as the weights. This creates the overall percentage who voted ‘no’.

It would be interesting to produce a scatterplot to explore the relationship between the percentage of people who voted ‘no’ and the unemployment rate of the Council regions. The plot function can do this for us. There seems to be quite a strong relationship here.

plot(ref$Unemployment.rate, ref$Voted.no)

It might be more helpful to plot the Council names instead of simple points. Here we add the argument type = "n" to the plot function to stop any points being plotted and then use the text function to plot the Council names instead.

plot(ref$Unemployment.rate, ref$Voted.no, type = "n")
text(ref$Unemployment.rate, ref$Voted.no, ref$Scottish.Council)