In table 1 a few sample association lists as predicted by our system are compared to the associative responses as given by the subjects in the Russell & Jenkins experiment. A complete list of the predicted and observed responses is given in table 2. It shows for all 100 stimulus words used in the association experiment conducted by Russell & Jenkins, a) their corpus frequency, b) the primary response, i.e. the most frequent response given by the subjects, c) the number of subjects who gave the primary response, d) the predicted response and e) the number of subjects who gave the predicted response.
The valuation of the predictions has to take into account that association norms are conglomerates of the answers of different subjects which differ considerably from each other. A satisfactory prediction would be proven if the difference between the predicted and the observed responses were about equal to the difference between an average subject and the rest of the subjects. The following interpretations look for such correspondences.
For 17 out of the 100 stimulus words the predicted
response is equal to the observed primary response. This
compares to an average of 37 primary responses given by a
subject in the Russell & Jenkins experiment.
A slightly better result is obtained for the correspondence between
the predicted and the observed associations when it
is considered, how many
subjects had given the predicted response: Averaged over all stimulus words and all subjects, a predicted response was given by 12.6% of the subjects. By comparison, an associative response of an arbitrary subject was given by 21.9% of the remaining subjects.
When only those 27 stimulus words are considered, whose primary response was given by at least 500 subjects, an arbitrary response was given by 45.5% of the subjects on average. By comparison, the predicted response to one of these 27 stimulus words was given by 32.6% of the subjects. This means, that for stimulus words where the variation among subjects is small, the predictions improve.
On the other hand, 35 of the predicted responses were given by no subject at all, whereas an average subject gives only 5.9 out of 100 responses that are given by no other subject. In about half of the cases we attribute this poor performance to the lack of representativity of the corpus. For example, the predictions combustion to the stimulus bed or brokerage to house can be explained by specific verbal usage in the DOE scientific abstracts respectively in the Wall Street Journal.
In most other cases instead of paradigmatic associations (words that are used in similar contexts) syntagmatic associations (words that are often used together) are predicted. Examples are the prediction of term to the stimulus long, where most subjects answered with short, or the prediction of folk to music, where most subjects responded with song.
|stim||freq||par||f (par)||pred||f (pred)|
|stim||freq||par||f (par)||pred||f (pred)|
Using the corpora listed in section 4, the same simulation as described above was conducted for German. For the computation of the associative strengths, again formula 6 was used. For optimal results, only a small adjustment had to be made to parameter alpha (from 0.66 to 0.68). However, a significant change was necessary for parameters and , which again for ease of parameter optimization were assumed to be identical. and had to be reduced by a factor of approximately four from a value of 0.00002 to a value of 0.000005. Apart from these parameters, nothing was changed in the algorithm.
Table 3 compares the quantitative results as given above for both languages. The figures can be interpreted as follows: With an average of 21.9% of the other subjects giving the same response as an arbitrary subject, the variation among subjects is much smaller in English than it is in German (8.7%). This is reflected in the simulation results, where both figures (12.6% and 6.9%) have a similar ratio, however at a lower level.
This observation is confirmed when only stimuli with low variation of the associative responses are considered. In both languages, the decrease in variation is in about the same order of magnitude for experiment and simulation. Overall, the simulation results are somewhat better for German than they are for English. This may be surprising, since with a total of 33 million words the English corpus is larger than the German with 21 million words. However, if one has a closer look at the texts, it becomes clear, that the German corpus, by incorporating popular newspapers and spoken language, is clearly more representative to everyday language.
|percentage of subjects who give the predicted associative||12.6%||6.9%|
|percentage of other subjects who give the response of an||21.9%||8.7%|
|percentage of subjects who give the predicted associative||32.6%||15.6%|
|response for stimuli with little response variation|
|percentage of other subjects who give the response of an||45.5%||18.1%|
|arbitrary subject for stimuli with little response variation|
|percentage of cases where the predicted response is||17.0%||19.0%|
|identical to the observed primary response|
|percentage of cases where the response of an arbitrary||37.5%||22.5%|
|subject is identical to the observed primary response|
|percentage of cases where the predicted response is given||35.0%||57.0%|
|by no subject|
|percentage of cases where the response of an arbitrary||5.9%||19.8%|
|subject is given by no other subject|