Section 8.2: Confidence Intervals

From Research Methods in Psychology

Jump to: navigation, search

[edit] Confidence Intervals: Making Single Variable Inferences

The first type of inferential statistic that we shall look at is the confidence interval. The standard error is used for making inferences from a sample statistic about a particular value of a parameter in the population.

Concept: A parameter and a statistic refer to anything that can be measured and given a value. A parameter is the value in the population. A statistic is the value that we have found in our sample. If we are interested in the mean height of psychology students in the UK, the true mean height is a parameter - sometimes called a population parameter. If we measure the height of a sample of psychology students, and find the average, we will have a statistic - sometimes called a sample statistic. We estimate a population parameter, using a sample statistic.


When we have calculated our sample statistic, we have a sample estimate of a population parameter. The sample statistic is our best guess at the value of the population parameter. To help us to estimate the population parameter, we can calculate (or ask a computer to calculate for us) a confidence interval around the parameter estimate – this tells us within what range of numbers the true value is likely to be.

Concept: The standard error of a parameter estimate is the standard deviation of the sampling distribution of that statistic. This makes it rather confusing, but it's mentioned now, in case you come across it elsewhere. The standard error is not very interesting on its own, but it is used to calculate the confidence intervals, and other statistics.


Concept: If we calculate the 95% confidence intervals of a mean, that indicates that 95% of the times that we repeat the study, the population mean (the population parameter) will lie within those intervals.

This will make more sense when we look at an example. Suppose we are interested in knowing the mean number of textbooks on research methods and statistics owned by psychology students. We want to know the population parameter value. To find out the population value for sure, we would need to ask every student how many books they owned – this is unfeasible, as we do not have the time, or the inclination to do it. Instead, we take a representative sample (see Chapter 6 to see what we mean by a representative sample) of students, and find out how many textbooks the students in our sample own.

If we ask 10 students how many books they own, and we find that the mean number of books owned is 12, and the standard deviation is 4. Our sample statistic of the population value is 12, but we want to know what the population value is likely to be. We would be very lucky indeed to find that our sample statistic matched the population parameter.

We calculate something called the 95% confidence intervals and in this case, with our sample of 10 students, it is equal to 3.48. This result means that we can say, that the true value of the population mean is likely to be between 12-3.48=8.52, and 12+3.48=15.48. If we were to say that the true mean value number of textbooks by psychology students is between 8.52 and 15.48, we will only be wrong 5% of the time (i.e. 1 time in 20) and we will be correct 95% of the time (i.e. 19 times in 20).

Note, that this doesn't mean that we are 95% sure that the true mean lies within the confidence intervals.


Image:figure8.1.gif

Being wrong 1 time in 20 is reasonably accurate, but may not be accurate enough for our needs. Being right 19 times out of 20 is not enough if the consequences of being wrong are very serious. If we want a higher degree of accuracy, we can use different limits to calculate our confidence intervals. The 99% confidence limit, for example, is 4.85. If we want to be 99% sure that we have included the population parameter within our limits, we need to say that the true value of the mean is from 7.15 to 16.85.

In this first example, we only collected data from 10 students. We could improve the accuracy of our predictions by collecting from a larger number. If we asked 20 students, and we found we got the same mean and standard deviation (although this is unlikely), the confidence limit would shrink to 2.18, and therefore we could be 95% sure that the population parameter value was between 9.82 and 14.18.

There are two things to note here. The first is that, if you want to have more confidence that you have included the population value in your confidence interval, you are going to have larger confidence intervals. The second is that if you have a larger sample size, your confidence intervals will shrink. When you have a small sample sizes, a small increase in sample size causes a large reduction in confidence limits. With larger sample sizes, a large increase in sample size causes a smaller reduction in the confidence limits.


Confidence intervals are often plotted on graphs - a point representing the sample statistic, and a line extending to show the range of the intervals, such as is shown in the girue below.

Image:figure8.2.gif

To calculate the standard error of a mean, you have to make the two assumptions made by parametric tests: that you have a normal distribution and interval data. (Although the confidence interval of a mean is very robust to non-normality, particularly as the sample size increases.)

Summary: In this section, we used the mean to illustrate confidence limits and intervals, but confidence limits and intervals can be calculated for many different kinds of sample statistics. Most common is the mean, but you will also see (and can calculate) confidence intervals for correlations and regression lines. It is not possible to calculate a confidence interval for all sample statistics - most notably there is no formula to calculate confidence interval for the median.
Personal tools