Symbols, Meaning and Statistics

Richard Sproat


Presenter: Richard Sproat (introduced by Philipp Koehn)
Type: Invited talk
Venue: EMNLP 2009 in Singapore
Date: August 6, 2009
Recording: Bonnie Dorr
Duration: 62 minutes



Of all artifacts left behind by ancient cultures, few are as evocative as symbols. When one looks at an inscription on stone or clay, it is natural to ask: What does it mean? Was this a form of writing, or some sort of non-linguistic system? If it was writing, can we hope to decipher it?

In this talk, I examine these and related questions, and the possible role of statistical methods in answering them. I start with some highlights from the history of successful decipherment. I review work on computational approaches to decipherment, and assess how useful this is likely to be. One area where it is clearly useful, in a sense, is in generating pseudodecipherments, and I present one such case as a reductio ad absurdum of attempts to decipher artifacts like the Phaistos disk. And I discuss a topic that made its rounds of the popular science press earlier this year: Namely, the claimed "entropic evidence" that the 4000-year-old Indus Valley symbol system constituted a script. I present simple counterevidence to the usefulness of the proposed measure to support any such claim; and I review the large amount of archaeological and comparative cultural evidence against the script hypothesis, which must in any case be taken into account in any complete discussion of this subject.

(Portions of this talk are based on joint work with Steve Farmer and Michael Witzel, and on a tutorial at NAACL 2009 co-presented with Kevin Knight.)