next up previous
Next: Results Up: No Title Previous: Text corpora

Computation of the association strengths

The text corpora were read in word by word. Whenever one of the 100 stimulus words occured, it was determined which other words occured within a distance of twelve words to the left or to the right of the stimulus word, and for every pair a counter was updated. The so defined frequencies of co-occurrence tex2html_wrap_inline281 , the frequencies of the single words H(i) and the total number of words in the corpus Q were stored in tables. Using these tables, the probabilities in formula (4) can be replaced by relative frequencies:

equation88

In this formula the first term on the right side does not depend on j and therefore has no effect on the prediction of the associative response. With H(j) in the denominator of the second term, estimation errors have a strong impact on the association strengths for rare words. Therefore, by modifying formula (5), words with low corpus frequencies had to be weakened.

equation98

According to our model the word j with the highest associative strength tex2html_wrap_inline293 to the stimulus word i should be the associative response. The best results were observed when parameter tex2html_wrap_inline297 was chosen to be 0.66. Parameters tex2html_wrap_inline299 and tex2html_wrap_inline301 turned out to be relatively uncritical, and therefore to simplify parameter optimization were both set to the same value of 0.00002.

Ongoing research shows that formula (6) has a number of weaknesses, for example that it does not discriminate words with co-occurrence-frequency zero, as discussed by Gale & Church (1990) in a comparable context. However, since the results reported on later are acceptable, it probably gets the major issues right. One is, that subjects usually respond with common, i.e. frequent words in the free association task. The other is, that estimations of co-occurrence-frequencies for low-frequency-words are too poor to be useful.


next up previous
Next: Results Up: No Title Previous: Text corpora

Reinhard Rapp
Tue Aug 13 18:20:02 MET DST 1996