8.2 Comparing proportions
Example: Smoking and lung cancer
In a famous historical study of the association between smoking and lung cancer, Doll & Hill compared the numbers of smokers and non-smokers in samples of lung cancer patients and controls. The data for females are shown below.cases controls smokers 41 28 non-smokers 19 32
Is there evidence of a link between smoking and lung cancer?
Details of how these data were collected are given in the paper. There are interesting questions here about what constitutes an appropriate control group. In fact other hospital patients, not suffering from lung cancer, were used.
The principal question of interest is whether the proportion of smokers among the cases is different from the proportion of smokers among the controls. We denote the underlying true proportion among the cases and controls by and respectively, with corresponding sample sizes and . We can estimate the true proportions by the sample proportions, We can also calculate the standard error of each sample proportion as
However, it is the difference between the two groups which is of interest to us. We have a natural estimate in the differences of the proportions in the difference of the sample proportions We can also calculate the standard error of this difference by combing the individual standard errors, as follows: Notice that the squared standard errors are added together, despite the fact that the estimates of the proportions are being subtracted. This is because we are measuring the uncertainty involved and so the uncertainty of the difference combines the uncertainties of the individual components. With the present data this gives A confidence interval for the difference in proportions is then: Since this confidence interval does not contain , we therefore have clear evidence that the proportions of smokers in the cases and control groups are different.