Dealing with confounds (Reisberg, Ch. 3)

Imagine an experiment in which research participants are asked to recognize letter strings briefly presented on a computer screen. In the first 50 trials, the letter strings are random sequences ("okbo," "pmla," and so on). In the next 50 trials, the letter strings are all common four-letter words ("book," "lamp," "tree," and so on). Let's say that the participants are able, on average, to read 30% of the random sequences and 65% of the words. This is a large difference; what should we conclude from it?

In fact, we can conclude nothing from this (fictional) experiment, because the procedure just described is flawed. The data tell us that participants did much better with the words, but why is this? One possibility is that words are, in fact, easier to recognize than nonwords. A different possibility, however, is that we are instead seeing an effect of practice: Maybe the participants did better with the word trials, not because words are special, but simply because the words came later in the experiment, after the participants had gained some experience with the procedure. Likewise, participants did worse with the nonwords, not because these are hard to recognize, but because they were presented before any practice or warm-up.

To put this in technical terms, the experiment just described is invalid - that is, it does not measure what it is intended to measure, namely, the difference between words and nonwords. The experiment is invalid because a confound is present-an extra variable that could have caused the observed data pattern. The confound, of course, is sequence, and the confound makes the data ambiguous: Maybe words were better recognized because they're words. Or maybe the words were better recognized simply because they came second. With no way in these data to choose between these interpretations, we cannot say which is the correct interpretation, and hence we can draw no conclusions from the study.

How should this experiment have been designed? One possibility is to counterbalance the sequence of trials: For half of the participants, we would show the words first, then the random letters. For the other half, we would use the reverse order. This setup doesn't remove the effect of practice, but it arranges things so that practice has the same impact on both conditions, and the effect of practice is counterbalanced. Specifically, practice would help the words half the time, and would help the nonwords half the time. If, therefore, we observe a difference between the conditions, it cannot be a result of practice: Practice had the same effect on both conditions, and so could not have caused a difference between the conditions.

As it turns out, we know how this experiment would turn out when properly done - words are, in fact, easier to recognize. Our point here, though, lies in what it takes for the experiment to be "properly done." In this and in all experiments, we need to remove confounds, so that we can be sure what it was that lies beneath the data pattern. We have various techniques available for dealing with confounds; we've mentioned just one of them here. The key is that the confounds must be removed; only then can we it legitimately draw conclusions from the study.