Record Details

Scaling up the ALIAS duplicate elimination system : a demonstration

DSpace at IIT Bombay

View Archive Info
 
 
Field Value
 
Title Scaling up the ALIAS duplicate elimination system : a demonstration
 
Creator SARAWAGI, S
KIRPAL, A
 
Description Duplicate elimination is an important stage in integrating data from multiple sources. The challenges involved are finding a robust deduplication function that can identify when two records are duplicates and efficiently applying the function on very large lists of records. In ALIAS the task of designing a deduplication function is eased by learning the function from examples of duplicates and non-duplicates and by using active learning to spot such examples effectively [1]. Here we investigate the issues involved in efficiently applying the learnt deduplication system on large lists of records. We demonstrate the working of the ALIAS evaluation engine and highlight the optimizations it uses to significantly cut down the number of record pairs that need to be explicitly materialized.
 
Publisher IEEE
 
Date 2011-10-24T14:16:07Z
2011-12-15T09:11:39Z
2011-10-24T14:16:07Z
2011-12-15T09:11:39Z
2003
 
Type Proceedings Paper
 
Identifier 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS,783-785
0-7803-7665-X
1063-6382
http://dx.doi.org/10.1109/ICDE.2003.1260867
http://dspace.library.iitb.ac.in/xmlui/handle/10054/15440
http://hdl.handle.net/100/2202
 
Source 19th International Conference on Data Engineering,BANGALORE, INDIA,MAR 05-08, 2003
 
Language English