From Web Content Mining to Natural Language Processing
|Venue:||ACL 2007 in Prague|
|Date:||June 24, 2007|
pdf presentation slides (1.2 MB)
flv flash video (714 MB, 320 * 240 pixels)
mpeg2 video part 1 (1300 MB, 720 * 480 pixels)
mpeg2 video part 2 (800 MB, 720 * 480 pixels)
Web mining is a growing research area. It consists of Web usage mining, Web structure mining, and Web content mining. Web usage mining refers to the discovery of user access patterns from Web usage logs. Web structure mining tries to discover useful knowledge from hyperlinks. Web content mining aims to extract/mine useful information or knowledge from Web page contents. This tutorial focuses on Web content mining and its extensive connection with natural language processing (NLP).
In the past few years, there was a rapid expansion of activities in Web content mining. This is not surprising because of the huge amount of valuable information of almost any imaginable type on the Web and significant economic benefits of such mining. However, due to the heterogeneity and the lack of structure of the Web data, automated discovery of targeted or unexpected knowledge/information still presents many challenging problems. This tutorial introduces several such problems and some state-of-the-art techniques for dealing with them, e.g., data/information extraction, Web information integration, opinion mining, and information synthesis. These problems all have strong connections with NLP. In the tutorial, it is paid special attention to such connections and discuss how NLP researchers may contribute towards solving these problems. Many real-life examples are given to help participants understand research concepts and see how the technologies may be deployed to real-life applications. The tutorial thus has a mix of research and industry flavor, addressing seminal research ideas and looking at the technology from an industry angle.
For further reading, you may want to refer to
the presenter's related textbook "Web Data Mining":