next up previous
Next: Discussion and conclusion Up: No Title Previous: Computation of the association

Results

In table 1 a few sample association lists as predicted by our system are compared to the associative responses as given by the subjects in the Russell & Jenkins experiment. A complete list of the predicted and observed responses is given in table 2. It shows for all 100 stimulus words used in the association experiment conducted by Russell & Jenkins, a) their corpus frequency, b) the primary response, i.e. the most frequent response given by the subjects, c) the number of subjects who gave the primary response, d) the predicted response and e) the number of subjects who gave the predicted response.

The valuation of the predictions has to take into account that association norms are conglomerates of the answers of different subjects which differ considerably from each other. A satisfactory prediction would be proven if the difference between the predicted and the observed responses were about equal to the difference between an average subject and the rest of the subjects. The following interpretations look for such correspondences.

For 17 out of the 100 stimulus words the predicted response is equal to the observed primary response. This compares to an average of 37 primary responses given by a subject in the Russell & Jenkins experiment. A slightly better result is obtained for the correspondence between the predicted and the observed associations when it is considered, how many

Stim- Predicted tex2html_wrap_inline293 Observed No.
ulus Responses Responses Subj.
blue green 2.144 sky 175
red 1.128 red 160
yellow 1.000 green 125
white 0.732 color 66
flowers 0.614 yellow 56
sky 0.600 black 49
colors 0.538 white 44
eyes 0.471 water 36
bright 0.457 grey 28
color 0.413 boy 20
butter bread 0.886 bread 637
milk 0.256 yellow 81
eggs 0.197 soft 30
lb 0.179 fat 24
sugar 0.157 food 22
fat 0.147 knife 20
peanut 0.145 eggs 16
fats 0.138 cream 14
flavor 0.130 milk 13
wheat 0.128 cheese 9
baby mother 0.618 boy 162
foods 0.427 child 142
breast 0.353 cry 113
feeding 0.336 mother 71
infant 0.249 girl 51
birth 0.245 small 43
born 0.242 infant 27
milk 0.208 cute 21
her 0.206 little 18
nursing 0.202 blue 17
cold hot 1.173 hot 348
warm 1.164 snow 218
weather 0.736 warm 168
winter 0.603 winter 66
climate 0.474 ice 29
air 0.424 Minnesota 13
war 0.342 wet 13
wet 0.333 dark 10
water 0.330 sick 9
dry 0.315 heat 8

tex2html_wrap327

subjects had given the predicted response: Averaged over all stimulus words and all subjects, a predicted response was given by 12.6% of the subjects. By comparison, an associative response of an arbitrary subject was given by 21.9% of the remaining subjects.

When only those 27 stimulus words are considered, whose primary response was given by at least 500 subjects, an arbitrary response was given by 45.5% of the subjects on average. By comparison, the predicted response to one of these 27 stimulus words was given by 32.6% of the subjects. This means, that for stimulus words where the variation among subjects is small, the predictions improve.

On the other hand, 35 of the predicted responses were given by no subject at all, whereas an average subject gives only 5.9 out of 100 responses that are given by no other subject. In about half of the cases we attribute this poor performance to the lack of representativity of the corpus. For example, the predictions combustion to the stimulus bed or brokerage to house can be explained by specific verbal usage in the DOE scientific abstracts respectively in the Wall Street Journal.

In most other cases instead of paradigmatic associations (words that are used in similar contexts) syntagmatic associations (words that are often used together) are predicted. Examples are the prediction of term to the stimulus long, where most subjects answered with short, or the prediction of folk to music, where most subjects responded with song.

stim freq par f (par) pred f (pred)
afraid 692 fear 261 am 0
anger 615 mad 351 expression 0
baby 1157 boy 162 mother 71
bath 244 clean 314 hot 10
beautiful 812 ugly 209 love 0
bed 1295 sleep 584 combustion 0
Bible 593 God 236 Society 0
bitter 541 sweet 652 sweet 652
black 4250 white 751 white 751
blossom 50 flower 672 flower 672
blue 1676 sky 175 green 125
boy 1174 girl 768 girl 768
bread 863 butter 610 wheat 4
butter 426 bread 637 bread 637
butterfly 68 moth 144 fish 0
cabbage 116 head 165 potatoes 0
carpet 138 rug 460 red 27
chair 577 table 493 clock 0
cheese 566 crackers 108 milk 47
child 8897 baby(ies) 159 care 10
citizen 525 U.S.(A.) 114 senior 0
city 8125 town 353 pop 0
cold 2003 hot 348 hot 348
comfort 386 chair 117 ease 76
command 799 order 196 army 102
cottage 137 house 298 cheese 111
dark 1695 light 829 brown 1
deep 2418 shallow 318 sea 77
doctor 766 nurse 238 patient 11
dream 629 sleep 453 sleep 453
eagle 92 bird 550 bird 550
earth 1429 round 130 rare 0
eating 2823 food 390 habits 0
foot 1169 shoe(s) 232 square 0
fruit 1841 apple 378 vegetable 114
girl 1096 boy 704 boy 704
green 1686 grass 262 blue 122
hammer 173 nail(s) 537 string 0
hand 5146 foot(ee) 255 On 0
hard 3502 soft 674 hit 1
head 5350 hair 129 tail 17
health 11433 sickness 250 mental 0
heavy 3497 light 583 ion 0
high 25220 low 675 low 675
house 3059 home 247 brokerage 0
hungry 268 food 362 eat 174
joy 246 happy 209 fear 5
justice 1314 peace 250 criminal 1
king 1983 queen 751 emperor 1
lamp 330 light 633 light 633

Table 2, part 1. Observed and predicted associative responses to stimulus words 1 to 50. The abbreviations in the headline mean: stim = stimulus word; freq = corpus frequency of stimulus word; par = primary associative response; f (par) = number of subjects who gave the primary associative response; pred = predicted associative response; f (pred) = number of subjects who gave the predicted associative response.  

stim freq par f (par) pred f (pred)
light 7538 dark 647 dark 647
lion 182 tiger 261 sea 2
long 16437 short 758 term 0
loud 230 soft 541 noise 210
man 7472 woman(e) 767 woman 767
memory 3230 mind 119 deficits 0
moon 295 stars 205 sun 168
mountain 1066 hill(s) 266 ranges 0
music 3635 song(s) 183 folk 0
mutton 39 lamb 365 beef 32
needle 208 thread 464 sharing 0
ocean 1066 water 314 floor 6
priest 311 church 328 Catholic 189
quiet 673 loud 348 sleep 53
red 3029 white 221 yellow 19
religion 1224 church 285 Christianity 5
river 1624 water 246 flows 0
rough 457 smooth 439 smooth 439
salt 2158 pepper 430 sugar 83
scissors 25 cut 671 pair 1
sheep 854 wool 201 cattle 15
short 7388 tall 397 term 0
sickness 207 health 376 motion 0
sleep 1843 bed 238 hrs 0
slow 1858 fast 752 wave 0
smooth 690 rough 328 muscle 1
soft 1681 hard 445 drink 10
soldier 321 army 187 army 187
sour 154 sweet 568 sweet 568
spider 97 web 454 tail 0
square 1430 round 372 root 22
stem 796 flower 402 brain 2
stomach 501 food 211 cancer 1
stove 98 hot 235 kitchen 16
street 859 avenue 190 corner 20
sweet 700 sour 434 potatoes 0
swift 184 fast 369 rivers 0
table 2396 chair 840 honour 0
thief 63 steal 286 catch 2
thirsty 32 water 348 drink 296
tobacco 1056 smoke 515 textiles 0
trouble 1108 bad 89 ran 0
whiskey 63 drink(s) 284 beer 52
whistle 77 stop 131 train 89
white 4807 black 617 black 617
window 816 door 191 glass 171
wish 2061 want 124 I 2
woman 2995 man(e) 646 yr 0
working 5366 hard 132 class 3
yellow 1188 blue 156 green 89
MEAN: 2064.78 377.52 127.34

Table 2, part 2. Observed and predicted associative responses to stimulus words 51 to 100.  

Using the corpora listed in section 4, the same simulation as described above was conducted for German. For the computation of the associative strengths, again formula 6 was used. For optimal results, only a small adjustment had to be made to parameter alpha (from 0.66 to 0.68). However, a significant change was necessary for parameters tex2html_wrap_inline299 and tex2html_wrap_inline301 , which again for ease of parameter optimization were assumed to be identical. tex2html_wrap_inline299 and tex2html_wrap_inline301 had to be reduced by a factor of approximately four from a value of 0.00002 to a value of 0.000005. Apart from these parameters, nothing was changed in the algorithm.

Table 3 compares the quantitative results as given above for both languages. The figures can be interpreted as follows: With an average of 21.9% of the other subjects giving the same response as an arbitrary subject, the variation among subjects is much smaller in English than it is in German (8.7%). This is reflected in the simulation results, where both figures (12.6% and 6.9%) have a similar ratio, however at a lower level.

This observation is confirmed when only stimuli with low variation of the associative responses are considered. In both languages, the decrease in variation is in about the same order of magnitude for experiment and simulation. Overall, the simulation results are somewhat better for German than they are for English. This may be surprising, since with a total of 33 million words the English corpus is larger than the German with 21 million words. However, if one has a closer look at the texts, it becomes clear, that the German corpus, by incorporating popular newspapers and spoken language, is clearly more representative to everyday language.

Description English German
percentage of subjects who give the predicted associative 12.6% 6.9%
response
percentage of other subjects who give the response of an 21.9% 8.7%
arbitrary subject
percentage of subjects who give the predicted associative 32.6% 15.6%
response for stimuli with little response variation tex2html_wrap_inline315
percentage of other subjects who give the response of an 45.5% 18.1%
arbitrary subject for stimuli with little response variation tex2html_wrap_inline315
percentage of cases where the predicted response is 17.0% 19.0%
identical to the observed primary response
percentage of cases where the response of an arbitrary 37.5% 22.5%
subject is identical to the observed primary response
percentage of cases where the predicted response is given 35.0% 57.0%
by no subject tex2html_wrap_inline319
percentage of cases where the response of an arbitrary 5.9% 19.8%
subject is given by no other subject tex2html_wrap_inline319

Table 3: Comparison of results between simulation and experiment for English and German. Notes: tex2html_wrap_inline315 ) little response variation is defined slightly different for English and German: in the English study, only those 27 stimulus words are considered, whose primary response is given by at least 500 out of 1008 subjects. In the German study, only those 26 stimulus words are taken into account, whose primary response is given by at least 100 out of 331 subjects. tex2html_wrap_inline319 ) for comparison of English and German experimental figures, it should be kept in mind, that the American experiment was conducted with 1008, but the German experiment with only 331 subjects.


next up previous
Next: Discussion and conclusion Up: No Title Previous: Computation of the association

Reinhard Rapp
Tue Aug 13 18:20:02 MET DST 1996