Dealing with confounds (Reisberg, Ch. 3)
Imagine an experiment in which research participants are asked to recognize
letter strings briefly presented on a computer screen. In the first 50 trials,
the letter strings are random sequences ("okbo," "pmla," and so on). In the next
50 trials, the letter strings are all common four-letter words ("book," "lamp,"
"tree," and so on). Let's say that the participants are able, on average, to
read 30% of the random sequences and 65% of the words. This is a large
difference; what should we conclude from it?
In fact, we can conclude nothing from this (fictional) experiment, because the
procedure just described is flawed. The data tell us that participants did much
better with the words, but why is this? One possibility is that words are, in
fact, easier to recognize than nonwords. A different possibility, however, is
that we are instead seeing an effect of practice: Maybe the participants
did better with the word trials, not because words are special, but simply
because the words came later in the experiment, after the participants had
gained some experience with the procedure. Likewise, participants did worse with
the nonwords, not because these are hard to recognize, but because they were
presented before any practice or warm-up.
To put this in technical terms, the experiment just described is invalid
- that is, it does not measure what it is intended to measure, namely, the
difference between words and nonwords. The experiment is invalid because a
confound is present-an extra variable that could have caused the observed
data pattern. The confound, of course, is sequence, and the confound
makes the data ambiguous: Maybe words were better recognized because they're
words. Or maybe the words were better recognized simply because they came
second. With no way in these data to choose between these interpretations, we
cannot say which is the correct interpretation, and hence we can draw no
conclusions from the study.
How should this experiment have been designed? One possibility is to
counterbalance the sequence of trials: For half of the participants, we
would show the words first, then the random letters. For the other half, we
would use the reverse order. This setup doesn't remove the effect of practice,
but it arranges things so that practice has the same impact on both conditions,
and the effect of practice is counterbalanced. Specifically, practice would help
the words half the time, and would help the nonwords half the time. If,
therefore, we observe a difference between the conditions, it cannot be a result
of practice: Practice had the same effect on both conditions, and so could not
have caused a difference between the conditions.
As it turns out, we know how this experiment would turn out when properly done -
words are, in fact, easier to recognize. Our point here, though, lies in what it
takes for the experiment to be "properly done." In this and in all experiments,
we need to remove confounds, so that we can be sure what it was that lies
beneath the data pattern. We have various techniques available for dealing with
confounds; we've mentioned just one of them here. The key is that the confounds
must be removed; only then can we it legitimately draw conclusions from the
study.