Sprachtechnologie für das Internet:
SGML, XML und UNICODE
 
Christian Otto
 
Universität Erlangen
Computerlinguistik
 
cnotto@linguistik.uni-erlangen.de
 
 
Abstract

The main goal of this tutorial is to introduce established technologies for dealing with textual data, especially the meta-languages SGML/XML and the character encoding system UNICODE. The main aspects are interchange, reuseability and customization of textual data with the special background of the media Internet.

The tutorial starts with an introduction to general problems of encoding textual data with examples from lexicographic work and corpus tagging leading to SGML, the Standard Generalized Markup Language. The SGML part gives a brief view on the meta language itself introducing basic elements of SGML to create a simple Document Type Definition for dictionaries.

Next, we will contrast SGML with the new Extensible Markup Language XML bearing its special features for Internet applications in mind.

Finally, the character encoding system UNICODE with the background of world wide publishing and distribution of textual data via the internet will be discussed.

The tutorial provides ideas on how to deal with natural language data in the new electronic medias, especially for academic work and research.

Keywords: text encoding, interchange, reuseability, customization, meta language, SGML, DTD, Stylesheets, DSSSL,  XML, XSL, HTML, online publishing, character encoding, ASCII, UNICODE, UCS, Internet
 
(This tutorial will be given in German.)