In the association experiment first used by Galton (1880) subjects are asked to respond to a stimulus word with the first word that comes to their mind. These associative responses have been explained in psychology by the principle of learning by contiguity: ``Objects once experienced together tend to become associated in the imagination, so that when any one of them is thought of, the others are likely to be thought of also, in the same order of sequence or coexistence as before. This statement we may name the law of mental association by contiguity.'' (William James, 1890, p. 561).
When the association experiment is conducted with many subjects, tables are obtained which list the frequencies of particular responses to the stimulus words. These tables are called association norms. Many studies in psychology give evidence that there is a relation between the perception, learning and forgetting of verbal material and the associations between words.
If we assume that word-associations determine language production, then it should be possible to estimate the strength of an associative relation between two words on the basis of the relative frequencies that these words co-occur in texts. Church et al. (1989), Wettler & Rapp (1989) and Church & Hanks (1990) describe algorithms which do this. However, the validity of these algorithms has not been tested by systematic comparisons with associations of human subjects. This paper describes such a comparison and shows that corpus-based computations of word associations are similar to association norms collected from human subjects.
According to the law of association by contiguity, the association strength between two words should be a function of the relative frequency of the two words being perceived together, i.e. the relative frequency of the two words occuring together. Further more, the association strength between words should determine word selection during language or speech production: Only those words can be uttered or written down which associatively come to mind. If this assumption holds, then it should be possible to predict word associations from the common occurrences of words in texts.