Section 3: Measures of Dispersion
From Research Methods in Psychology
Contents |
[edit] WHAT ARE YOU STUDYING?
IN THIS SECTION WE CONSIDER A SECOND TYPE OF SUMMARY STATISTIC THAT CAN BE USED TO DESCRIBE DATA: A MEASURE OF DISPERSION, OR SPREAD. WE FIRST LOOK AT WHY MEASURES OF DISPERSION ARE IMPORTANT – IT CAN BE DIFFICULT TO INTERPRET A MEASURE OF AVERAGE WITHOUT A MEASURE OF DISPERSION. THEN WE INVESTIGATE THREE DIFFERENT MEASURES OF DISPERSION: THE RANGE, THE INTERQUARTILE RANGE AND THE STANDARD DEVIATION.
[edit] Introduction
|
Crucial Concept: Dispersion is the degree of spread in a variable - it describes the distance from the average. When describing a variable it is vital to describe the central tendency, but the central tendency does not mean a lot, without a measure of dispersion or spread. |
An experiment was done to examine the effects of a mind-enhancing drug on memory performance. Participants in an experimental group were given a small dose of the drug before taking a memory test; participants in the control group were given a placebo before taking part in a memory test. The results were as shown in Table 1. Look at the table and decide: did the drug have an effect on memory? How much of an effect was it?
| Group | Mean Score |
| Control (placebo) | 70 |
| Experimental (drug) | 80 |
You can answer the first question – yes, the drug seemed to have an effect on memory, because the group that took the drug did better than the control group. The second question – how much of an effect was it? You cannot answer because I haven’t given you any information about the spread, or the dispersion, of the scores. Have a look at Figure 8 and Figure 9. The mean of the scores displayed on each graph are the same, but the graphs look very different, because the spread, or dispersion of the scores are very different in each of the graphs. What this shows is that it is very hard to interpret a measure of central tendency, without a measure of dispersion. The results of a measure of central tendency can be meaningless, or worse misleading, unless we also have a measure of dispersion.
We will now look at three different measures of dispersion, the range, the interquartile range, and the standard deviation.
[edit] Range
| Crucial Concept: The range is the distance from the highest score to the lowest score. |
The range is the simplest measure of dispersion. It is simply the distance between the highest score and the lowest score. It is expressed as a single number, or can be expressed as the highest and lowest scores. We will find the range of the variable, x, which consists of eight scores: 4, 11, 17, 12, 3, 15, 10, 2, 8. To find the range in the above example, we identify the lowest value, (which is 2), and the highest value (which is 17). Sometimes the range is expressed as a single figure, calculated as:
Range = highest value – lowest value =17 – 2 =15
Sometimes it is expressed as the range of scores – the range was 2 to 17.
[edit] Problems with the Range
The range suffers from one huge problem: it is massively affected by outliers. If one person gets a strange score, the range is highly distorted, and two dispersions that are actually very similar are suddenly made to appear very different – because one of them has an outlier. Because of this distorting effect of outliers, range is very rarely used is psychological research. It is most commonly used to describe some aspect of a sample that does not need to be summarised with any degree of accuracy. For example “the ages of the participants ranged from 18 to 48” or “class sizes ranged from 23 to36.”
[edit] Interquartile Range
|
Crucial Concept: The interquartile range is the range of the central 50% of a distribution. It is associated with the median. |
The interquartile range (IQR) is used with ordinal data and with non-normal distributions. It is often used in conjunction with the median, and is very similar to the median. To find the interquartile range the scores are placed in rank order and counted. The halfway point is the median (as we saw previously). The IQR is the distance between the quarter and three quarters distance points. Unlike the range, the IQR does not go to the ends of the scales, and is therefore not affected by outliers. It also is not greatly affected by skew and kurtosis. The IQR is very similar to the median, and is usually used in conjunction with the median. You might come across the semi-interquartile range. This is the interquartile range, divided by 2. That is, half the interquartile range.
[edit] The Standard Deviation and the Variance
|
Crucial Concept: The standard deviation is a measure of the width of a normal distribution. |
Finally, we shall look at the standard deviation (s) and the variance (s2). The standard deviation and the variance are very closely related – the variance is simply the square of the standard deviation (i.e. the standard deviation, multiplied by itself). The standard deviation is a measure of dispersion which (like the mean) takes all of the values in the dataset into account when it is calculated. It is also like the mean in that we must assume that we have a normal distribution before it is reasonable to use it.
|
Crucial Concept: The variance is the square of the standard deviation. |
There is an additional problem with the standard deviation - it comes in two different ‘flavours’. What we usually talk about, when talking about the standard deviation and the variance, are the sample standard deviation (s) and the sample variance (s2). You might also come across the population standard deviation, which is represented by the Greek letter sigma (σ) and the population variance (σ2).
[edit] The Standard Deviation and the Normal Distribution
The standard deviation is a very useful statistical tool, particularly when combined with a normal distribution. In a normal distribution, we can express any value in terms of its distance from the mean, in standard deviation units – this is called a z-score, or a standardised score. To convert a value to a z-score, we subtract the mean, and divide by the standard deviation.
The table below shows the original scores, and the z-scores, of values from a distribution where we know the mean to be 100, and the standard deviation to be 10. The original value of 100 is equal to the mean, and so has a z-score of zero. The value 120 is 2 SDs above the mean, and so has a z-score of 2.0, etc.
| Original Score | z-score |
| 100 | 0.0 |
| 120 | 2.0 |
| 80 | -2.0 |
| 164 | 6.4 |
| 48 | -5.2 |
One advantage of z-scores over raw scores is that if we know (or can assume) that the original variable has a normal distribution, we know how often scores in different ranges will occur. For example, we know, in a normal distribution, that half of the scores will be above the mean, and half of the scores will be below the mean. This means that if I have a z-score of 0.0, you know that half of the people in the sample will have a higher score than me, and half of the sample will have a lower score than me.
We can consult tables, or use a computer, to find out how many people will score higher or lower than a particular value. Some of those values are shown in the figure below.
[edit] Summary
Measures of central tendency should always be accompanied by a measure of dispersion, as they cannot be interpreted without one. There are three measures of dispersion, the range, which is not used very often, the interquartile range, which should be used with the median, and the standard deviation, which should be used with the mean. The variance is another measure of dispersion, but is the square of the standard deviation.
[edit] Further Reading
It is difficult to cover descriptive statistical analysis, without getting a little bit, well, for want of a better word, dull. Huff’s book “How to lie with statistics” (Penguin) gets around this problem in an original way - it looks at how you should not use descriptive statistics, if you want to get your point across, and how other people might use descriptive statistics to disguise the nature of their data. Another book which takes an interesting approach is “Elements of Graph Design,” by Kosslyn. Kosslyn is a cognitive psychologist, who has carried out research into mental imagery and visualisation processes. In this book he takes the principles that have come from research in this area and applied it to graphs.



