15.5 Principles of experimental design
In an ideal world experimental design would be relatively straight forward. Variation would not be a problem and we could just design experiments on the basis that if we did the same thing twice, the results would be identical. We could do the experiment on all the material (or at any rate the result would be the same no matter which individuals/piece of tissue/cell culture was used). We would be in control of all the factors which affect the outcome. But if that were the case, we would already know the answer!
The main problems are:
- variation;
- not knowing what will affect the outcome;
- variation;
- we can’t do the experiment on the entire population of possible material;
- variation.
Here are three key concepts that may help in focusing our thinking about designing experiments.
- Control
This means more than ensuring we have a relevant control group against which to compare ‘treatment’ groups. It also refers to the careful choice of settings for the factors whose effects we would like to assess. It is also important to recognise those factors we can’t completely control but which we know will have an effect on what we measure. The discussion of blocks below is one example of this. - Replicate
The more data we have, the more precise our estimates will be. However, there also needs to be a realistic balance between the information gained from the experiment against the resource available to collect the data. As noted above, ethical issues can also arise. - Randomise
This is a very good way of avoiding the biases which can arise if we make our own decisions on how to allocate ‘treatments’ to experimental units.
15.5.1 Control
15.5.1.1 Holding Factors Constant
The classical approach to controlling factors which you know about is to experiment with one factor whilst at the same time holding all the others constant. This is often referred to as ‘One Factor at a Time’ experimentation. It works well for simple experimental situations, but can give a misleading picture in more complex situations, for example when there are interactions between the factors which we are examining. There are many instances in the literature of a number of subjects where one group of researchers have found one result for a particular factor and a second group have found a very different result. This apparent paradox may be explained by the fact that one or more of the factors being held constant by both groups is being held constant at a different level (pH, temperature, Na+, concn, etc.). There are thus inherent dangers in one factor at a time experimentation, which an experimenter must be aware of.
15.5.1.2 More Complex Designs
Traditionally the approach to dealing with multiple interacting factors has been the Factorial Experiment, but with three or more factors, this can require very large numbers of experimental units. There are alternatives to the factorial experiment, which make quite reasonable assumptions and which can reduce significantly the number of experimental units required. Such techniques can be very powerful, but should be used with care under the guidance of someone who is aware of the assumptions and what they mean in practice.
15.5.1.3 Blocks
The second way to control factors which you know about is to divide the experimental material into sub-parts where each sub-part consists of material which is very similar. Cell material may be divided into sub-parts:- small, medium and large cells; people into:- male and female or underweight, normal and overweight; and so on. These sub-parts are referred to as Blocks. Within blocks, the variation is much reduced over that to be found in the material as a whole and it thus obscures the differences due to treatments rather less than would otherwise be the case. In general terms, the whole experiment is conducted in each block and the results from the different blocks are essentially pooled. Blocking makes for much more efficient experimentation.
Suppose for example that to do all the assays required within a experiment, you will have to use more than one batch of reagents. There is a possibility that the different batches will be slightly different in some respect. This difference could affect the outcome of the assays conducted using the two batches. It would therefore be better to conduct the whole experiment (perhaps with no replication) using a single batch and to then do it a second time using another batch, to get the replication, rather than doing half the treatments with one batch and the other half with a second (possibly slightly different) batch. Similarly if you can’t fit all your vessels in one water bath, it would be better to have one from each treatment in one water bath rather than all of one treatment in one water bath and all of the other treatment in another water bath, unless of course the temperature of the water bath is the experimental treatment.
15.5.2 Replicate
Given that there is variation in measurements taken under seemingly ‘identical’ conditions, even after taking precautions as outlined in the previous section, then the best that we can do is to measure as accurately as possible how big that variation is. Armed with a reasonable estimate of the variation, you will be able to see if the treatment effects are larger than the ‘background’ variation. Unfortunately the larger the variation, the larger the number of replicates of each treatment you will need to:
- estimate the background variation accurately;
- see the treatment effects distinctly through the background variation.
You can usually get an estimate of the likely variation for a particular experiment from published work or previous experiments which you or your colleagues have carried out. From that and a knowledge of the approximate size of the expected treatment effects, you can calculate the number of replicates which will be required to give you a reasonable chance of detecting the treatment effects given the background variation. More experiments produce inconclusive results because of inadequate replication than any other cause.
Again there are statistical techniques which can reduce replication by concentrating it all in one part of the experimental design, but these techniques should be used with caution, under the guidance of someone who is fully aware of the assumptions and what they mean in practice.
15.5.3 Randomise
Having dealt with the causes of variation which we know about, we must ensure that those causes that we don’t know about are evenly distributed over the treatments so as not to introduce bias into the experiment. This sounds like an impossible task since, by definition, we do not know what these causes are. Luckily there is a very powerful statistical technique which comes to our rescue - randomisation.
The principle is simple - ensure that each piece of experimental material has an equal chance of being assigned to each treatment. That way it is very unlikely that all the healthy material will be assigned to one treatment and all the unhealthy material to another. They should get approximately equal shares of each. In practice you have to use some external mechanism to generate a random sequence, as human attempts to be random are usually very systematic.
Randomisation should avoid the introduction of any selection bias. Techniques for randomisation include the use of random number tables, but will require that a complete list of the pieces of experimental material can be enumerated. Randomisation should mean that having accounted for the factors you know (which you were able to control), then those that you do not know will be prevented from interfering with the experiment. Circumstances may arise where it is not possible to use randomisation over some factors, under these circumstances, the only solution is to measure and use that variable or factor as a covariate.
15.5.4 Replication and Pseudo-Replication
Consider an experiment where each treatment is replicated three times. This provides a means of dealing with the inherent variation in the results obtained, even if we have carried out identical manipulations, using the same batches of materials. The three replicates give us a measure of how variable the process is. However, if the process is very variable it may be difficult to distinguish between the treatments.
In some settings, when we record the value from each replicate we may be able to do this multiple times. If we can, why not do this three times for each replicate. This will give us a replication of nine rather than three for very little extra effort. Unfortunately this is not true replication.
Most experimental methods involve a number of steps, illustrated here in the context of an experiment requiring the preparation of cellular suspecsions:
- collect material;
- prepare cellular suspension;
- subject cellular suspension to one of a number of treatments;
- extract an indicator component as a surrogate for treatment success;
- assay indicator component.
Each of these steps may be replicated a variable number of times. Replication applied early in the process will lead to extra replication of later steps and it is thus tempting to replicate later steps more than earlier ones. In each case however the replication measures something different.
- Collect material Here the replication measures the variation in collection or the variation between plants, animals, etc. This helps you to see how well the results will apply to other individuals. This is all about variation in the source material. Increase in the number of individuals to measure obviously increases the cost and it is usual to try to keep this to a minimum. However, if several individuals are not independently subjected to the same ‘treatment’ or environmental influence, there is no true replication in the study and the remainder of the steps are measuring within individual variability, in the absence of any indication of between individual variability.
- Prepare cellular suspension Here the replication measures the variability in the preparation of the suspension. This may be due to different people carrying out the preparation, use of different apparatus, different reagent stocks, temperature, etc. This helps you to see how different the results are likely to be for multiple preparations. This is all about variation in technique. This measures the variability within the preparation and there will obviously be a need for more replication at this level if the preparation is very heterogeneous.
- Apply treatment This is the replication of the treatment and measures how different applications of the same treatment are and enables us to assess whether the differences due to treatment are larger than the differences that we see between replications of the same treatment. This is what the experiment is all about.
- Extract indicator Here the replication measures the variation in the extraction process, and will be due to similar reasons to those outlined under ‘Prepare cellular suspension’ above. This is again all about variation in technique.
- Assay indicator Here the replication measures the variation in the assay process, and will be due to similar reasons to those outlined above. This is again all about variation in technique.
Each of these sources of variation (and others) may be important in the overall conduct of the experiment; but they do measure different things. The importance of these components of variation depends on their relative size. Assessment of the relative size of components of variation is important in assessing the viability of sampling, preparation, extraction or assay procedures. In general terms, the variation due to these procedures should be less than that due to the treatment. If this is not true, it will not be possible to arrive at a justifiable conclusion.
The replication that we applied was at the ‘Apply treatment’ stage. The addition replication proposed is at the ‘Assay indicator’ stage and it contributes nothing to our estimation of the variation due to treatment and thus our ability to distinguish between the performance of the different types of bead.
When replication is carried out at an inappropriate stage, this is referred to as pseudo-replication. Pseudo-replication misleads you into attributing the wrong number of degrees of freedom to your treatment comparisons and thus to think that the inherent variation between replicates is smaller than it really is. This in turn can lead you to think that there are real differences between treatments when there are not. Most commonly pseudo-replication occurs by substituting extra replication at the final assay stage for replication at the treatment stage, as it is easier and requires less effort and resources. This phenomenon is to be found quite widely in experimental work and often finds its way into published papers.
The statistical significance of a particular treatment or phenomenon can only be determined relative to the variation that would be expected if the treatment or phenomenon did not apply. This means that a source of ‘error’ must be identified and measured which provides this comparison. A failure to do this will result in an experiment or study in which it is not possible to determine the statistical significance of what has been observed.