Friday 13 January 2017

MODULE 3: ENTERING DATA INTO SPSS

3.1 ENTERING DATA INTO THE DATA EDITOR
When you first load SPSS it will provide a blank data editor with the title Untitled1. When inputting a new set of data, you must input your data in a logical way. The SPSS Data Editor is arranged such that each row represents data from one entity while each column represents a variable. There is no discrimination between independent and dependent variables: both types should be placed in a separate column. The key point is that each row represents one entity’s data (be that entity a human, mouse, tulip, business, or water sample). Therefore, any information about that case should be entered across the data editor. For example, imagine you were interested in sex differences in perceptions of pain created by hot and cold stimuli. You could place some people’s hands in a bucket of very cold water for a minute and ask them to rate how painful they thought the experience was on a scale of 1 to 10. You could then ask them to hold a hot potato and again measure their perception of pain. Imagine I was a participant. You would have a single row representing my data, so there would be a different column for my name, my gender, my pain perception for cold water and my pain perception for a hot potato: Abayomi, male, 8, 10. The column with the information about my gender is a grouping variable: I can belong to either the group of males or the group of females, but not both. As such, this variable is a between-group variable (different people belong to different groups). Rather than representing groups with words, in SPSS we have to use numbers. This involves assigning each group a number, and then telling SPSS which number represents which group. Therefore, between group variables are represented by a single column in which the group to which the person belonged is defined using a number. For example, we might decide that if a person is male then we give them the number 0, and if they’re female we give them the number 1. We then have to tell SPSS that every time it sees a 1 in a particular column the person is a female, and every time it sees a 0 the person is a male. Variables that specify to which of several groups a person belongs can be used to split up data files. Finally, the two measures of pain are a repeated measure (all participants were subjected to hot and cold stimuli). Therefore, levels of this variable can be entered in separate columns (one for pain to a hot stimulus and one for pain to a cold stimulus). The data editor is made up of lots of cells, which are just boxes in which data values can be placed. When a cell is active it becomes highlighted in blue. You can move around the data editor, from cell to cell, using the arrow keys (found on the right of the keyboard) or by clicking the mouse on the cell that you wish to activate. To enter a number into the data editor simply move to the cell in which you want to place the data value, type the value, then press the appropriate arrow button for the direction in which you wish to move. So, to enter a row of data, move to the far left of the row, type the value and then press (this process inputs the value and then moves you into the next cell on the right).
In summary, there is a simple rule for how variables should be placed in the SPSS Data Editor: data from different things go in different rows of the data editor, whereas data from the same things go in different columns of the data editor. As such, each person (or mollusc, goat, organization, or whatever you have measured) is represented in a different row. Data within each person (or mollusc etc.) go in different columns. So, if you’ve prodded your mollusc, or human, several times with a pencil and measured how much it twitches as an outcome, then each prod will be represented by a column. In experimental research this means that any variable measured with the same participants (a repeated measure) should be represented by several columns (each column representing one level of the repeated-measures variable). However, any variable that defines different groups of things (such as when a between-group design is used and different participants are assigned to different levels of the independent variable) is defined using a single column. This idea will become clearer as you learn about how to carry out specific procedures.


3.2 THE SPSS VARIABLE VIEW WINDOW
This sheet contains information about the data that is stored with the dataset. The following have to be defined for each variable:
  • Name

The first character of the variable name must be alphabetic
Variable names must be unique, and have to be less than 64 characters
Spaces are NOT allowed
  • Type

Click on the type box. The two basic types of variables that you will use are numeric and string. This column enables you to specify the type of variable.

  • Width

Width allows you to determine the number of characters SPSS will allow to be entered for the variable.



  • Decimals

Number of decimals, it has to be less than or equal to 16.

  • Label

You can specify the details of the variable. You can write characters with spaces up to 256 characters.



  • Values

This is used and to suggest which numbers represent which categories when the variable represents a category.

Defining the value labels
Click the cell in the values column as shown below
For the value, and the label, you can put up to 60 characters.
After defining the values click add and then click ok
  • Missing

This column is for assigning numbers to missing data.

  •  Columns

Enter a number into this column to determine the width of the column that is how many characters are displayed in the column. (this differs from ‘width’, which determines the width of the variable itself – you could have a variable of 10 characters but by setting the column width to 8 you would only see 8 of the 10 characters of the variable in the data editor) it can be useful to increase the column width if you have a string variable that exceeds 8 characters, or a coding variable with value labels that exceed 8 characters.
  • Align

You can use this column to select the alignment of the data in the corresponding column of the data editor. You can choose to align the data to the left or right or center.
  • Measure

This is where you define the level at which a variable was measured (nominal, ordinal or scale)
.
3.2.1 LEVELS OF MEASUREMENT
There are three levels of data. They are:

  • Nominal level: Data that is classified into categories and cannot be arranged in any particular order. E.g. eye colour, gender, religious affiliation.
  • Ordinal level: involves data arranged in some order, but the differences between data values cannot be determined or are meaningless. For Example: during a taste test of 4 soft drinks, Pessi was ranked number 1, sprite number 2, seven-up number 3, and Coca-cola number 4.
  • Scale: Scale can either be interval or ratio. Interval level:  to the ordinal level, with the additional property that meaningful amounts of differences between data values can be determined. There is no natural zero point. For Example: temperature on the Fahrenheit scale. While ratio level is the interval with an inherent zero starting point. Differences and ratios are meaningful for this level of measurement. For example: Monthly income of surgeons, or distance travelled by manufacturer’s representatives per month.

3.3 MISSING VALUES
Although as researchers we strive to collect complete sets of data, it is often the case that we have missing data. Missing data can occur for a variety of reasons: in long questionnaires participants accidentally miss out questions; in experimental procedures mechanical faults can lead to a datum not being recorded; and in research on delicate topics (e.g. sexual behaviour) participants may exert their right not to answer a question. However, just because we have missed out on some data for a participant doesn't mean that we have to ignore the data we do have (although it sometimes creates statistical difficulties). Nevertheless, we do need to tell SPSS that a value is missing for a particular case. The principle behind missing values is quite similar to that of coding variables in that we choose a numeric value to represent the missing data point. This value tells SPSS that there is no recorded value for a participant for a certain variable. The computer then ignores that cell of the data editor (it does not use the value you select in the analysis). You need to be careful that the chosen code doesn't correspond to any naturally occurring data value. For example, if we tell the computer to regard the value 9 as a missing value and several participants genuinely scored 9, then the computer will treat their data as missing when, in reality, they are not. To specify missing values, you simply click in the column labelled in the variable view and then click on to activate the Missing Values dialog box in Figure 3.9. By default, SPSS assumes that no missing values exist, but if you do have data with missing values you can choose to define them in one of three ways. The first is to select discrete values (by clicking on the circle next to where it says Discrete missing values) which are single values that represent missing data. SPSS allows you to specify up to three discrete values to represent missing data. The reason why you might choose to have several numbers to represent missing values is that you can assign a different meaning to each discrete value. For example, you could have the number 8 representing a response of ‘not applicable’, a code of 9 representing a ‘don’t know’ response, and a code of 99 meaning that the participant failed to give any response. As far as the computer is concerned it will ignore any data cell containing these values; however, using different codes may be a useful way to remind you of why a particular score is missing. Usually, one discrete value is enough and in an experiment in which attitudes are measured on a 100-point scale (so scores vary from 1 to 100) you might choose 666 to represent missing values because (1) this value cannot occur in the data that have been collected and (2) missing data create statistical problems.

3.4 SPSS KEYWORDS
Using SPSS keywords, especially TO and ALL greatly speeds up a myriad of typical tasks.

SPSS Main Keywords
Expression                              Meaning                                                           Returns
ALL                  all variables (not previously addressed in statement)            Variable(s)
TO                  all variables between and including split outcome of one.        Variable(s)
BY                 split outcome of one variable by values of another.                  Nothing

WITH            compare one variable with another                                           Nothing

Watch out for Module 4, where you will start applying what you have learnt in module 1-3. feel free to contact me for any questions.

No comments:

Post a Comment