Section 1: Initial Description of Data

From Research Methods in Psychology

Jump to: navigation, search

Contents

[edit] WHAT ARE YOU STUDYING?

IN THIS SECTION WE WILL CONSIDER HOW YOU CAN DESCRIBE THE DISTRIBUTION OF A SET OF DATA. WE WILL LOOK AT SOME FREQUENCY TABLES, AND THEN FREQUENCY PLOTS. WE WILL EXAMINE THE SHAPE OF DISTRIBUTIONS, PARTICULARLY WITH REFERENCE TO THE NORMAL DISTRIBUTION.

[edit] Descriptive Data Analysis

Descriptive data analysis is the use of techniques that take a set of raw scores and summarise them in a form that is more manageable. This chapter will show you several procedures that can be used to carry out descriptive data analysis.

Crucial Tip: Descriptive statistics simply describe what you have found in your sample. They do not attempt to go beyond the data obtained, they make no predictions as to whether more results are likely to be similar if the research was to be replicated, and they do not explain what caused the result. Descriptive statistics show the results of a study in a way that can be easily understood.

A good place to start to describe your data is to look at the frequencies of different scores. How often does a particular value come up?

One way of summarising data is presented in Table 1. This shows the number of times each score appears. This table shows the results correctly, but how long does it take you to see what the most common score is? And the second most common score? The table contains correct information, but it is not in a clear and easy to read form. Table 1: Frequency Table

Value (Score) Frequency (number of pupils achieving that score)
0 5
1 3
2 8
3 9
4 4
5 8
6 8
7 3
8 1
9 1

The information in Table 1 can be represented instead in a histogram.

Crucial Concept: Histograms show the frequency of particular numbers appearing in a dataset.

Histograms show the shape of a distribution (more on this later), and show outliers (more on this later). A histogram can be drawn with bars, or with a line joining up the points. Along the x (horizontal) axis you will find the possible scores. On the y (vertical) axis you will find the number of people who have that score.

A second way of presenting the same information is shown in Figure 1. Notice now that from the graph in Figure 1, it is much quicker and easier to answer questions such as “What is the most common score?” (Three is the most common score since nine pupils achieved it.)

Image:figure6.1.gif

Crucial Tip What is the difference between a histogram and a bar chart? A histogram (if drawn with bars) is a special kind of bar chart. A bar chart may show anything on the x-axis, or on the y-axis. A histogram always shows values on the x-axis, and frequency on the y-axis, in addition the values on the x-axis of a histogram represent a scale, and should be placed to represent the scale.

[edit] Histograms and Distributions

When a histogram is plotted, you can see the ‘shape’ of the distribution of the variable (i.e. how many had low scores, how many in the middle, how many high scores). Because they give a visual representation of the shape of a distribution, histograms are very important in data analysis.

Crucial Concept: Strictly speaking, a distribution is a mathematical function that describes the probability of any score occurring. Less strictly speaking, you can think of a distribution as the shape of a curve on a histogram. One of the most common distributions is the normal distribution (also known as the Gaussian distribution). A very large number of naturally occurring variables are normally distributed. A large number of statistical tests make the assumption that the data (or something about the data) form a normal distribution.

Image:figure6.2.gif Figure 2: A Normal Distribution

Crucial Concept: A normal distribution is symmetrical, and bell shaped. It curves outwards at the top, and then inwards nearer the bottom, the tails getting thinner and thinner.

Figure 2 shows a perfect normal distribution. Your actual data will never form a perfect normal distribution, but you can expect that it will be close to a normal distribution: if the distribution formed by your data is symmetrical, and approximately bell-shaped, then you have something close to a normal distribution.


[edit] Section Summary

When exploring your data, the first thing you should look at is the distribution of the data. This is done most easily using a histogram (frequency plot) or a box and whisker plot. The histogram will show you if the data are normally distributed. If they are not normally distributed, they may be non-normal because the data form the wrong shape, or because a small number of outliers. If the data form the wrong shaped distribution, the shape of the distribution may be skewed (non-symmetrical), or kurtosed (too flat, or too peaked). Skew is more serious than kurtosis. (The shape of the data may be changed using transformations.) Outliers might be errors. If they are errors, they should be fixed; if they are not errors, they should probably be removed.

Personal tools