Distributed hypertext resource discovery through examples
DSpace at IIT Bombay
View Archive InfoField | Value | |
Title |
Distributed hypertext resource discovery through examples
|
|
Creator |
CHAKRABARTI, S
VAN DEN BERG, MH DOM, BE |
|
Description |
We describe the architecture of a hypertext resource discovery system using a relational database. Such a system can answer questions that combine page contents, meta-data, and hyperlink structure in powerful ways, such as "find the number of links from an environmental protection page to a page about oil and natural gas over the last year." A key problem in populating the database in such a system is to discover web resources related to the topics involved in such queries. We argue that that a keyword-based "find similar" search based on a giant all-purpose crawler is neither necessary nor adequate for resource discovery. Instead we exploit the properties that pages tend to cite pages with related topics, and given that a page u cites a page about a desired topic, it is very likely that u cites additional desirable pages. We exploit these properties by using a crawler controlled by two hypertext mining programs: (1) a classifier that evaluates the relevance of a region of the web to the user's interest (2) a distiller that evaluates a page as an access point for a large neighborhood of relevant pages. Our implementation uses IBM's Universal Database, not only for robust data storage, but also for integrating the computations of the classifier and distiller into the database. This results in significant increase in I/O efficiency: a factor of ten for the classifier and a factor of three for the distiller. In addition, ad-hoc SQL queries can be used to monitor the crawler, and dynamically change crawling strategies. We report on experiments to establish that our system is efficient, effective, and robust.
|
|
Publisher |
MORGAN KAUFMANN PUB INC
|
|
Date |
2011-10-27T07:34:57Z
2011-12-15T09:12:18Z 2011-10-27T07:34:57Z 2011-12-15T09:12:18Z 1999 |
|
Type |
Proceedings Paper
|
|
Identifier |
PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES,375-386
1-55860-615-7 http://dspace.library.iitb.ac.in/xmlui/handle/10054/16222 http://hdl.handle.net/100/2582 |
|
Source |
25th International Conference on Very Large Data Bases,EDINBURGH, SCOTLAND,SEP 07-10, 1999
|
|
Language |
English
|
|