From Web Content Mining to Natural Language Processing

Bing Liu


Here you should see a video. Please install the Adobe flash player browser plugin and activate Java script to watch it online, or download it using the links below.


Presenter: Bing Liu
Type: Tutorial
Venue: ACL 2007 in Prague
Date: June 24, 2007
Recording: Reinhard Rapp
Duration: 150 minutes



pdf presentation slides (1.2 MB)

flv flash video (714 MB, 320 * 240 pixels)

mpeg2 video part 1  (1300 MB, 720 * 480 pixels)

mpeg2 video part 2  (800 MB, 720 * 480 pixels)



Web mining is a growing research area. It consists of Web usage mining, Web structure mining, and Web content mining. Web usage mining refers to the discovery of user access patterns from Web usage logs. Web structure mining tries to discover useful knowledge from hyperlinks. Web content mining aims to extract/mine useful information or knowledge from Web page contents. This tutorial focuses on Web content mining and its extensive connection with natural language processing (NLP).

In the past few years, there was a rapid expansion of activities in Web content mining. This is not surprising because of the huge amount of valuable information of almost any imaginable type on the Web and significant economic benefits of such mining. However, due to the heterogeneity and the lack of structure of the Web data, automated discovery of targeted or unexpected knowledge/information still presents many challenging problems. This tutorial introduces several such problems and some state-of-the-art techniques for dealing with them, e.g., data/information extraction, Web information integration, opinion mining, and information synthesis. These problems all have strong connections with NLP. In the tutorial, it is paid special attention to such connections and discuss how NLP researchers may contribute towards solving these problems. Many real-life examples are given to help participants understand research concepts and see how the technologies may be deployed to real-life applications. The tutorial thus has a mix of research and industry flavor, addressing seminal research ideas and looking at the technology from an industry angle.

For further reading, you may want to refer to the presenter's related textbook "Web Data Mining":


Tutorial Outline

  1. Introduction
  2. Data extraction
  3. Information integration
  4. Opinion mining
  5. Information synthesis
  6. Web page pre-processing
  7. Some other topics
  8. Summary