WWW Data Mining
Web mining tasks
- Resource discovery: locating documents and services
- WebCrawler, AltaVista: too many irrelevant, outdated responses
- Future: automatic text categorization, construction of directories
- Information extraction: Automatic information extraction from newly discovered Web sources.
- Harvest: uses a model of semi-structured documents.
- Internet Learning Agent and Shopbot : learn about Web services.
- Generalization: Uncover general patterns at individual and multiple sites
- Relying on feedback from users to solve the labeling problem