Section 8.3: Interval / Ordinal Dependent Variable
From Research Methods in Psychology
Contents
|
[edit] Choosing a Statistical Test
To decide which statistical test you should use, you need to know three things.
- Whether it is appropriate to use a parametric test or not.
- Whether the independent variable is within-participants (repeated measures) or between-participants (independent groups).
- How many levels the independent variable has.
[edit] The Structure of Experiments
In Chapter 3, we looked at the structure of experiments in some detail. In this section, we will look at how to analyse data from experimental research. If you are not familiar with the material in Chapter 3, you can have a look at the ‘Quick Reminder’ box, or might be worth quickly revising Chapter 3, before reading this chapter.
|
Quick Reminder An experiment investigated the effects three types of noise (silence, hammering, music) on the ability to learn two types of word (abstract and concrete). We would describe this experiment as having: Two independent variables. Noise, which has three levels: silence, hammering, music. Word type, which has two levels: abstract and concrete. The experiment has one dependent variable: Number of words remembered. The independent variables could each be between participants or within participants. If the were between participants, different people would take part in each level. If the independent variable were within-participants, the same people would take part in all levels. |
An experiment comprises one or more independent variables, each of which has two or more levels, and one or more dependent variables.
In this section we will look at:
- Experiments with one independent variable
- Experiments with more than one independent variable.
We will first consider the case of experiments with interval or ordinal dependent variables, and then move onto experiments with nominal dependent variables.
The figure below shows a flow diagram that will help you to choose an appropriate statistical test when you are analysing data from an experiment that has one dependent variable. The table shows the same information represented in a different way – you can use whichever you feel most comfortable with.
|
Between Participants Design | Within Participants Design | |||
|---|---|---|---|---|
| Parametric | Non-parametric | Parametric | Non-parametric | |
| 2 levels | Independent groups t-test | Mann-Whitney U Test / Wilcoxon Rank-Sum test | Repeated measures | t-test
Wilcoxon Signed Ranks Test |
| 3 or more levels | One-way ANOVA | Kruskal-Wallis Test | One way repeated measures ANOVA | Friedman’s Rank Test |
[edit] Experiments with One Independent Variable
[edit] Between Participants Design
In the simplest form of experiment, we have one independent variable that has two levels. For example, we might be investigating the effects of noise on learning. To do this we would randomly divide our participants into two groups. We might give each group the task of learning a list of 100 words in five minutes. Then, we would ask them to recall as many of the words as possible. One group (the control group) would undertake this task in silence. The second group of people would undertake the learning task while a tape player was playing classical music in the room.
We first need to decide whether to use a parametric or a non-parametric test. As we saw in the previous section, a parametric test is preferable if the assumptions that it makes are satisfied. All parametric tests make the following assumptions:
- [[Normal Distribution]]. We must check first that the data are normally distributed. To determine if the data are normally distributed you can draw a histogram, or calculate a skewness and a kurtosis statistic. Note that it is the distribution within each group that must be normal; we do not require that the distribution for both groups combined is normal.
- [[Interval Data]]. To use a parametric test, we also need to consider whether our dependent variable is measured on an interval scale. In the case of our experiment, the dependent variable - number of words recalled - is measured on an interval scale.
If we follow the decision chart shown in Figure 2, you will see that if we have satisfied the above two assumptions, we are able to do an independent groups t-test. If we have not satisfied the assumptions, we should do a Mann-Whitney U Test or a Wilcoxon Rank Sum test.
We will examine both of these options in turn.
[edit] The Independent Groups t-Test
(Also called other names, for example, the unpaired t-test, the between subjects t-test, or the uncorrelated t-test.)
|
Concept: The independent groups t-test compares the mean of two unrelated samples to determine whether the two unrelated samples are significantly different from one another. |
We have already assumed that we are dealing with interval data and a normal distribution – these are the assumptions made by all parametric tests. The independent groups t-test makes an additional assumption - called homogeneity of variance. For the data to satisfy the assumption of homogeneity of variance, the variances (standard deviations) of the two samples should be the same, or at least similar. However, the t-test is robust against violation of this assumption - the standard deviations can be fairly different, and the t-test will still perform satisfactorily as long as the sample sizes of the two groups are similar. In addition, if this assumption is violated, computers can carry out a modified version of the t-test, which doesn't make this assumption. (You could do it by hand too, but it's hard work, so you don't tend to see it given in books).
|
Tip: It is often important to note whether a letter which represents something is in upper case or lower case. Sometimes an upper case letter means something different to a lower case letter (r and R are not the same thing). |
You will sometimes find that the value for t is negative - this doesn’t matter, and you can ignore the minus sign.
When you report an independent groups t-test, you should report the value of t, the degrees of freedom (df) and the probability value. For example t=3.4, df=24, p=0.002 (2-tailed).
You might hear the t-test referred to as Student's t-test. It was developed by William Gosset, who wrote under the pseudonym of A Student.
[edit] Mann-Whitney U-Test / Wilcoxon Rank Sum Test
|
Concept: The Mann-Whitney test and the Wilcoxon Rank Sum test are non-parametric tests for independent designs. |
The Mann-Whitney U test and the Wilcoxon Rank Sum test are used when the parametric assumptions for an independent groups t-test are not satisfied (i.e. the data are not normally distributed or not measured on an interval scale). These tests are the non-parametric correspondent of the independent groups t-test. The two tests are equivalent - they will both identical answers, and it doesn’t matter which one you use.
The Mann-Whitney U test is slightly simpler to calculate than the Wilcoxon rank sum test, and so if you are doing your analysis by hand, you should use that . (There is a second Wilcoxon test, which we will come across later on, so if you use the Mann-Whitney U test, you will be at less risk of confusing the two.)
To report a Mann-Whitney U test, you should give the value of U (the test statistic), the size of both of the groups, and the probability value. For example U=22.5, N1=6, N2=8, p = 0.85.
A Wilcoxon rank-sum test is reported in the same way: W=43.5, N1=6, N2=8, p=0.85.
If you have a large sample, you might have to convert the U or W statistic to a Z statistic. In this case, you should also report the Z statistic.
[edit] Three or more levels of independent variable
In a slightly more complex experiment, we might test participants under three levels of an independent variable. In a previous example when we were investigating the effects of noise on learning, we had two groups, silence, and noise. However, we might be interested in investigating the effects of different kinds of noise. If this were the case, we could then have three groups.
- Silence.
- Classical music.
- Hammering.
These three groups constitute the three levels of independent variable.
[edit] One-way Analysis of Variance
|
Concept: One-way analysis of variance is a parametric test which compares the means of three or more groups. |
Analysis of Variance (usually shortened to ANOVA) is a whole family of statistical tests. A one-way analysis of variance compares the means of three or more groups to see if the means of the three groups are different from one another.
ANOVA tests are parametric statistical tests, which means that it makes the same assumptions as those made by the t-test.
- Normal distribution
- Interval data
A one-way ANOVA, like the t-test, makes an assumption of homogeneity of variance and it assumes that the standard deviations in the different conditions are equal, or at least similar to each other. In the case of the t-test we saw that it is possible to carry out a correction for violation of homogeneity of variance, but such a correction is not possible for ANOVA. In addition, in the same way as a t-test, many computer programs will calculate Levene’s test for homogeneity of variance, which will test this assumption. However, a one-way ANOVA is robust against violation of this assumption (as the t-test was) as long as the sample size in each group is approximately equal.
When you report the results of an ANOVA, you should report the test statistic, F, two sets of degrees of freedom (not just one, as with the t-test) and the probability value. For example F=6.62, df=2, 15, p=0.009.
|
Tip: If you are using ANOVA, you do not have a distinction between one-tailed and two-tailed hypotheses. |
Finally, you should note that one-way ANOVA tests the null hypothesis that:
Because the null hypothesis states that the means are all equal, the null hypothesis can be false for more than one reason. It may be, for example, that Group 1 and Group 2 are significantly different from one another; it may be that Group 2 is significantly different from Group 3; and so on. Because a significant value for F can have more than one possible meaning, you may have to investigate your data further to find out what differences exist.
The first, and simplest approach to checking the differences is to plot the means, along with their 95% confidence intervals, on a graph. This graph will often demonstrate where the differences lie. For example, the figure below shows the mean and 95% confidence intervals for the number of words remembered by the three groups who learned in silence, classical music and hammering conditions. It can be seen that there is little difference between the silence and classical music group, but the overall score for the hammering group is much lower - there is also no overlap between the 95% confidence interval for the hammering group and the other two groups.
Another way to make comparisons between groups is to carry out post-hoc tests. There are two types of post-hoc tests - planned contrasts and unplanned contrasts. Use planned contrasts when you have a theoretical reason to believe that some specific differences will occur and you want to investigate those differences. Use unplanned contrasts to compare every group with every other group.
[edit] Kruskal-Wallis Test
|
Concept: The Kruskal-Wallis test is a non-parametric test for comparing three or more groups. |
The Kruskal-Wallis test is the non-parametric equivalent of a one-way ANOVA for comparing the distributions of 3 or more groups when the assumptions for using one-way ANOVA have been violated. The rank test gives a chi-square (written as chi-square if you don’t have the symbol available, and pronounced ‘ky’ - like ‘sky’ but without the ‘s’ - I don't seem to be able to produce the symbol). The chi-square, along with its degrees of freedom and associated probability should be reported, for example chi-square=9.1, df=2, p=0.011.
[edit] Within Participants Design
A within-participants design means that the same individuals are tested and measured under all conditions of the independent variable. Whilst this design is more efficient, in that fewer participants are required, however it does raise some complex methodological issues, which are discussed in repeated measures designs.
As with the independent groups design, the first thing that you must decide is whether to use a parametric or a non-parametric test. To remind you, the parametric test assumes that the data have:
- Normal distribution. (It's actually a little tricky what is assumed to be normal. For repeated measures, it is usually the difference between the scores).
- Interval level data.
[edit] Repeated Measures t-test
|
Concept: The 'repeated measures t-test' is a parametric test used to compare the means of two related groups (usually the same people measured twice). |
The repeated measures t-test is very similar to the independent groups t-test that we looked at in the previous section. You will sometimes see the repeated measures t-test referred to as the matched pairs t-test, the correlated t-test, or the within-participants t-test. All of these are the same thing.
To carry out a repeated measures t-test, each participant has to have been measured twice, so you should have pairs of values for each individual.
The repeated measures t-test assumes that the data are normally distributed, just like the independent groups t-test, but the assumption is a little bit different. This time, we do not assume that the raw scores are normally distributed; rather we assume that the differences between the scores are normally distributed. If we were carrying out an experiment on the effects of noise on learning, we would test each person under two conditions, once in silence, and once in a noisy environment.
We would then have two scores for each person, as shown in the table below - one from the noise condition, and one from the silence condition. The difference between these scores - shown in the ‘difference’ column in the table. It is these difference scores that we assume to have a normal distribution, although as with the independent groups t-test, the test is quite robust to violations of this assumption. The repeated measures t-test does not make any assumption about homogeneity of variance.
| Silence | Noise | Difference |
| 50 | 35 | 15 |
| 40 | 45 | -5 |
| 38 | 22 | 16 |
| 91 | 86 | 5 |
When you have carried out a t-test, you will sometimes find that the value for t is negative, and, as with the independent groups t-test, you can ignore the minus sign. When you report a repeated measures t-test, you should report the value for t, the degrees of freedom and the probability, for example, t=0.38, df=5, p=0.723.
[edit] The Wilcoxon Signed-Ranks Test
|
Concept: The Wilcoxon signed ranks test is a non-parametric test to compare two related groups (usually the same people measured twice). |
The Wilcoxon signed ranks test is used when the assumptions of the repeated measures t-test are violated. The test statistic from a Wilcoxon test is T, although sometimes this will be converted to a Z. To report a Wilcoxon signed-ranks test, you should report the T or Z statistic, along with the N and the probability. For example T=8.0, N=6, p=0.60, or Z=5.26, N=6, p=0.60.
[edit] Three or more levels of an Independent Variable
We have seen that the independent groups design can have three or more levels of the independent variable. A repeated measures study can also be designed with three or more levels of the independent variable. In the previous example we tested participants under two different conditions, silence and noise. We might be interested in investigating the effects of different kinds of noise. If this were the case, we might have three groups.
- Silence.
- Classical music.
- Hammering.
[edit] Repeated Measures Analysis of Variance
|
Concept: Repeated measures ANOVA is used to compare the from the same people measured on three (or more) occasions. |
Repeated measures analysis of variance is the second member of the ANOVA family that we have encountered. Repeated measures ANOVA makes the assumptions that we encountered before:
- Normal distribution.
- Interval data
A repeated measures ANOVA also makes a nasty third assumption, called sphericity. Sphericity, briefly, means that the variances of the differences between scores need to be equal. Andy Field has written [| A Bluffer's Guide to Sphericity]. Repeated measures ANOVA is not especially robust against violation of this assumption, so we do need to worry about it. Luckily, we only need to worry about it a little, because when you carry out a repeated measures ANOVA using a computer, it should automatically test the sphericity assumption for you, and automatically carry out a correction which compensates for the violation of the sphericity assumption. There is more than one type of correction, but the most commonly used is the Greenhouse-Geisser epsilon correction.
When you report a repeated measures ANOVA, you should report the F, df and p values, as with the one way ANOVA.
[edit] Friedman’s Test
|
Concept: Friedman’s test is a non-parametric test, used to compare three (or more) related groups (usually the same people measured on three occasions). |
Friedman’s test is used when the assumptions for a repeated measures analysis of variance have been violated. The test statistic is a 2, which should be reported with its associated df and probability value, e.g. chi-square = 17.3, df=5, p=0.0039.
[edit] Interactions: More than One Independent Variable
|
Concept: An interaction effect occurs when the effect of one variable is dependent upon the level of a second independent variable. |
Experiments are not limited to one independent variable. If an experiment has two independent variables, we are interested in the effect of each of those independent variables. However, we are also interested in a third effect - the interaction effect.
An interaction effect occurs when two variables do not simply add together their influence; rather they combine in some more complex way. An example: An experiment was carried out to investigate the effects of using the ‘method of loci’ in learning a list of words. (The method of loci is a method of improving memory, which involves visualising the words to be remembered in specific locations.) Participants were split into two groups. One group was trained in the use of the method of loci, the second group were given no such training. The first independent variable is the method of loci.
The participants were then further split. One half of each group was asked to learn a list of concrete words - words that are easily visualised and refer to real, concrete objects, such as ‘chair,’ ‘potato,’ ‘bottle’. The second half of each group was asked to remember a list of abstract words - words that are not easily visualised, and refer to abstract concepts, such as ‘love’, ‘justice’, and ‘reality’.
The results of the experiment are shown in the table below. Looking first at the mean scores in the end column, you can see that the mean score for the method of loci group (23) is considerably higher than the mean score for the control group (12). We would therefore conclude that the method of loci improves people’s ability to remember words. Second, if we look at the mean scores for the concrete and abstract words, we can see that the mean score for the concrete words (22) is higher than the mean score for the abstract words (13). We can therefore conclude that concrete words are easier to remember than abstract words. The two effects of the independent variables are referred to as main effects.
If we stop there, we have failed to tell the whole story - there is a third question to ask of the data. Is the effect of using the method of loci the same, regardless of whether we have concrete or abstract words? The easiest way to answer this third question is to plot a chart, such as the figure below. This chart shows that the effect of the method of loci is to improve both abstract and concrete words, but the improvement in the concrete words is much greater than the improvement in the abstract words. This difference in improvement because of the method of loci effect is called an interaction effect.
|
Tip: Different Slopes for Different Folks A different way of thinking about an interaction effect is to think of a main effect as being a slope. The improvement in number of words recalled, from concrete to abstract can be represented as a slope. If the improvement is different, according to whether the individuals used the method of loci, or not, the slopes will differ. Gary McClelland refers to this as being ‘different slopes for different folks.’ |
In an experiment that has two IVs, we are looking at three different effects, and testing three different null hypotheses. In an experiment where the two IVs are the method of loci (present or absent) and the word type (abstract or concrete) we test three hypotheses:
- There will be no difference between the number of abstract and concrete words recalled. This is a main effect.
- There will be no difference between the number of words recalled by the method of loci group and the control group. This is a main effect.
- The difference between the concrete words and the abstract words will not depend upon the use of the method of loci. This is an interaction effect.
In your practical reports, you need to make it clear which of these effects you are interested in testing, and which effects are relevant for your theory. You also need to report the F, df and p, for each of those tests.
Some more graphs, showing different sorts of interactions are shown in the4 figures below.
[edit] More Complex Designs
If an experiment has two independent variables (such as we have been describing), and each of the independent variables has two levels, it is described as a 2 × 2 design.
It is possible to extend this design in two ways. First, the independent variable can have three levels. If the study had one IV with two levels, and one independent variable with three levels, it is described as a 3×2 design. A study which had one IV with 4 levels, and a second independent variable with 3 levels is described as a 4×3 design.
It is also possible for a study to have more than two independent variables. A study that has three independent variables, each of which has two levels, is called a 2×2×2 design. If a study has three independent variables (we will call them A, B and C), six hypotheses are tested:
- The main effect of A
- The main effect of B
- The main effect of C
- The A×B interaction effect. (This is described as a two-way interaction.)
- The A×C interaction effect.
- The B×C interaction effect.
- The A×B×C interaction effect. (This is described as a three-way interaction.)
In the examples that we have been discussing so far, we have only been talking about independent groups (between-participants) designs. It is also possible to have a repeated measures design, with more than one independent variable. If we were to carry out an experiment to investigate the efficacy of the method of loci for remembering concrete and abstract words, each person would participate in all four conditions of the independent variable:
- Method of loci, concrete words
- Method of loci, abstract words
- No method of loci, concrete words
- No method of loci, abstract words
This would be called a 2×2 repeated measures design.
Finally, we can have an experiment in which one, or more, independent variables use independent groups, and one, or more, independent variables use repeated measures. These are referred to as a mixed design. If we were investigating the efficacy of the method of loci for abstract and concrete words with a mixed design, we would split participants into two groups, one group would use the method of loci, one would not, and both groups would be asked to remember both abstract and concrete words. To give the full title, we would refer to this as a 2×2 mixed design.







