Section 2: Measures of Central Tendency
From Research Methods in Psychology
Contents |
[edit] What are you studying?
THIS SECTION IS ALL ABOUT AVERAGES – OR RATHER MEASURES OF CENTRAL TENDENCY, AS THE WORD AVERAGE IS AMBIGUOUS AND IS BETTER AVOIDED. WE LOOK AT THREE DIFFERENT MEANINGS OF THE WORD AVERAGE: THE MEAN, THE MEDIAN AND THE MODE, AND WE CONSIDER WHEN IT IS APPROPRIATE TO USE EACH OF THESE MEASURES.
[edit] Introduction
To use the word ‘average’ is ambiguous. In common usage, average refers only to what is technically know as the arithmetic mean, whereas in statistics, average can refer to any of several measures of central tendency. Therefore this book will largely avoid the word ‘average’, and whenever you hear someone else use the word ‘average,’ you should always think about which average they mean. The purpose of measuring central tendency is to describe a group of individual scores with a single measurement. The value we use to describe the group will be the single value that we consider to be most representative of all the individual scores. Central scores are the most representative so (using in everyday language) the goal of central tendency is to find the ‘average’ or ‘typical’ score. This average value can then be used to provide a simple description of the entire population or sample. Be careful not to say that this is the score of the average person though, or you find yourself saying things like “The average family has 1.9 children” instead you should say “The mean number of children per family is 1.9.”
Measures of central tendency are useful for making comparisons between groups of individuals or between sets of figures. They reduce a large number of measurements to a single figure, and thus makes comparisons easy. The mean temperature in Central England, in July from 1961 to 1990, was 16.1 degrees. Over the same period in September it was 13.6 degrees. We have summarised a lot of information in two numbers, and we can say that it is likely to be warmer in July than September in Central England. Unfortunately, there is no single, standard procedure for determining central tendency. The problem is that no single measure will always produce a typical, representative value in every situation. Therefore, we will consider three different ways to measure central tendency: the mean, the median and the mode. These three statistical measures are computed differently and have different characteristics. To decide which of these is best for any particular distribution, you should keep in mind that the general purpose of central tendency is to find the single most representative score.
[edit] Mean
|
Crucial Concept: To calculate the mean, you sum (add up) each individual score, and divide by the number of scores. The mean is what we all think of as the average. Strictly speaking, it is called the arithmetic mean, because there are other types of mean. (Although you are very unlikely to come across one.) |
As you probably know, adding up all of the scores, and dividing the total by the number of individual scores gives the mean. This is written in equation form as Equation 1.
The scores of whatever we have measured, whether it is temperature, IQ, or time taken to solve a problem, we will now refer to as x. In statistical formulae, ‘x’ represents all the separate values in a particular distribution. The mean of the variable x, is written as , and pronounced ‘x bar.’ Even if you know how to calculate the mean, have a look at Equation 1 to get used to using this sort of equation.
In this equation the Greek letter &Sigma or sigma, means “add up.” So this equation says “add up x” (where x is a set of scores) and “divide by N” (where N is the number of scores in X).
[edit] Assumptions
We often need to make assumptions in everyday life; in the UK, we assume that everyone else will drive on the left hand side of the road. We assume that when we press a light switch, the light will come on. We assume that if we do enough work and understand enough then we will pass our exams. We make similar assumptions in statistics: we assume, often, that the data will be distributed normally. If the distribution we are dealing with is not normal, these assumptions are wrong (statisticians usually say violated), and it follows that our statistical analysis will be wrong. However, whilst our assumptions will usually be broadly correct, they will never be exactly correct. People sometimes drive on the wrong side of the road (when parking, for example) but most of the time we manage not to crash into them. If people drove on the wrong side of the road a lot, we would have more serious problems. Similarly, if our assumptions about data are wrong, but not too wrong, we need to be a little bit aware that our statistics will not be perfectly correct, but as long as the assumptions are not violated to any great extent, we be able to find some useful results from our analysis. The mean is a useful statistic only if:
- The distribution of your data is symmetrical - not too much skew, and no outliers.
- The data are measured at the interval or ratio level. We saw in Chapter 6 that data can be measured at a number of levels.
It would no be sensible to say that half of the people were wearing blue shoes, and half of the people were wearing yellow shoes, and that therefore the mean shoe colour was green
|
Digression Box The mean, the median and lying with statistics The mean is sensitive to skew, and can be distorted if there is skew in the data. This is a good example of when people can lie with statistics (but only to people who do not understand statistics.) In most societies, pay is not normally distributed – it is positively skewed, most people earn a small to moderate amount of money, a smaller number of people earn a lot of money, and a very small number of people earn a very substantial amount of money. The mean amount of money that people earn can increase just by increasing the amount of money that a small number of people earn. If 99 people earn £10,000 per year, and one person earns £1,000,000 per year, the mean amount of money earned is £19,900. If I double the pay of the 1 person who earns £1,000,000 per year, then the mean pay becomes £29,900. I can say, “Thanks to me, the average person has had a pay rise of over 50%”. Although 99% of people have had no extra cash. |
[edit] Median
|
Crucial Concept: The median is the central score - the one in the middle. |
The median is the second most common measure of central tendency. It is the middle score in a set of scores. The median is useful when the mean is not valid, either because the data are not symmetrically or normally distributed, or because the data are measured at an ordinal level. If the scores are placed in ascending order of size from the smallest to the largest, the median is the middle one. To find the median when then there is an odd number of scores in the distribution, halve the number of scores and then take the next whole number up and that will be the ordinal number of the median. For example suppose there are 29 scores: sort the scores into ascending order (from the lowest to the highest) half of 29 is 14.5, round up to 15. The median is the fifteenth score. If there is an even number of scores, the median is the mean of the two middle scores. If there are 18 scores, the median will be the number halfway between the 9th and the 10th scores.
[edit] Mode
|
Crucial Concept: The mode is the most common. |
The final measure of central tendency is the mode. We need to know what it is although it rarely reported in psychological research. The definition of the mode is that it is the most frequent score in the distribution, or the most common observation among a group of scores (you could think of it as the most ‘fashionable’ score). The mode is the only measure of central tendency appropriate for nominal data. (A distribution which has two peaks, is called a bimodal distribution, and is highly kurtosed.)
It is usually not very useful to report the mode. If I report that the modal (i.e. most common) shoe colour amongst a group was green, this tells you very little. Was everyone wearing black shoes? Were there three people wearing black shoes, and two people wearing shoes of every other colour?
[edit] Ordinal Data
When data are measured on an ordinal scale (see Chapter 4, we have to consider carefully whether to use the mean or the median, or even the mode. Opinions differ between psychologists, and so I have to be careful about what advice to give. There is nothing wrong with this difference in styles: Abelson, (1995) defines four styles of data presentation: stuffy, brash, liberal and conservative, and you need to choose your style according to the demands of the audience.
The problem is that there is a very fuzzy line between what could definitely be called ordinal data, and what could definitely be called interval data. Some (very strict) psychologists would argue that the scales used with things like tests for IQ, personality and attitudes must be ordinal scales and the data produced must be considered to be ordinal data. The majority would argue that these can be considered interval data, and therefore it is OK to use the mean. In most psychological journals, you will find that most researchers, most of the time treat their data as interval data.. (Note that treating your data as if they were measured on an interval scale is not the same as saying that your data are measured on an interval scale. You know the data are not interval, but you also know that it will probably do no harm to treat them as if they were interval.)
[edit] Section Summary
The word “average” means three different things, and should therefore be avoided. The mean is the average as we usually think of it. The median is the middle score, and the median is the most common score. The mean should be used with interval data, when the data are symmetrically (normally) distributed. The median should be used with skewed distributions or ordinal data. The mode should usually be used with nominal (categorical) data. If the distribution of your data is skewed, the mean will be pulled in the direction of the skew. It is sometimes difficult to decide whether the scale used for the data are measured on an ordinal scale and the median should be used, or whether the scale is an interval scale and the mean should be used.. You choose the style of your presentation to fit in with the expectations of your audience.


