2.5 A further example: the Scottish independence referendum
Example: The Scottish Referendum on Independence
The results of the Scottish referendum on independence were of enormous interest to the UK. The Financial Times reported the voting patterns separately by Council area, together with a variety of characteristics of each Council area. The filescottish-referendum.dat
contains this information, together with data on population from the General Register Office for Scotland. Specifically, the information in this file is:
Scottish.Council
the council name Voted.no
the percentage of people who voted no Turnout
the percentage of people who voted Population
the number of people eligible to vote Unemployment.rate
the percentage unemployed Scottish.identity.only
the percentage who identify themselves as Scottish only Aged.16
the percentage who are 16 years of age Aged.over.50
the percentage who are over 50 years of age Aged.over.65
the percentage who are over 65 years of age We can use this information to explore the demographic and geographic patterns of voting.
The first thing we need to do is to read the data. The rp.datalink
function from the rpanel
package provides a convenient way of locating the dataset, as described in Section 2.4 above. Here we also use the head
function to inspect the first few rows.
## Scottish.Council Voted.no Turnout Population Unemployment.rate
## 1 Aberdeen 58.61 81.75 227130 1.2
## 2 Aberdeenshire 60.36 87.19 257740 0.7
## 3 Angus 56.32 85.84 116240 2.0
## 4 Argyll and Bute 58.52 88.21 88050 1.9
## 5 Clackmannanshire 53.80 88.59 51280 3.8
## 6 Dumfries and Galloway 65.67 87.49 150270 2.4
## Scottish.identity.only Aged.16 Aged.over.50 Aged.over.65
## 1 54.73 0.83 32.13 14.38
## 2 61.27 1.20 37.29 16.07
## 3 66.81 1.20 41.31 19.88
## 4 57.39 1.14 44.67 21.93
## 5 66.98 1.17 36.61 15.96
## 6 59.62 1.19 44.07 21.84
R can act as a simple calculator. For example, consider the data recording the proportion of people voting ‘no’ and the proportion of people who voted. We can identify the number of people who voted ‘no’ in each region, and then confirm the percentage of those who voted ‘no’ across the country, by
## [1] 55.31357
The first instruction multiplies each population by the corresponding turnout proportion to find the number of people who voted in each region. Notice that the operation is performed for each element of ref$Population
and the corresponding element of ref$Turnout
. The resulting vector of numbers is stored in a new component of the ref
dataframe, with the variable name Vote
. The second instruction calculates a weighted average of the percentages voting ‘no’, using the number of people voting in each region as the weights. This creates the overall percentage who voted ‘no’.
It would be interesting to produce a scatterplot to explore the relationship between the percentage of people who voted ‘no’ and the unemployment rate of the Council regions. The plot
function can do this for us. There seems to be quite a strong relationship here.
It might be more helpful to plot the Council names instead of simple points. Here we add the argument type = "n"
to the plot
function to stop any points being plotted and then use the text
function to plot the Council names instead.
plot(ref$Unemployment.rate, ref$Voted.no, type = "n")
text(ref$Unemployment.rate, ref$Voted.no, ref$Scottish.Council)