Record Details

Queries over unstructured data : probabilistic methods to the rescue (Keynote)

DSpace at IIT Bombay

View Archive Info
 
 
Field Value
 
Title Queries over unstructured data : probabilistic methods to the rescue (Keynote)
 
Creator SARAWAGI, S
 
Subject imprecise data models
information extraction
duplicate elimination
conditional random fields
 
Description Unstructured data like emails, addresses, invoices, call transcripts, reviews, and press releases are now an integral part of any large enterprise. A challenge of modern business intelligence applications is analyzing and querying data seamlessly across structured and unstructured sources. This requires the development of automated techniques for extracting structured records from text sources and resolving entity mentions in data from various sources. The success of any automated method for extraction and integration depends on how effectively it unifies diverse clues in the unstructured source and in existing structured databases. We argue that statistical learning techniques like Conditional Random Fields (CRFs) provide a accurate, elegant and principled framework for tackling these tasks. Given the inherent noise in real-world sources, it is important to capture the uncertainty of the above operations via imprecise data models. CRFs provide a sound probability distribution over extractions but are not easy to represent and query in a relational framework. We present methods of approximating this distribution to query-friendly row and column uncertainty models. Finally, we present models for representing the uncertainty of de-duplication and algorithms for various Top-K count queries on imprecise duplicates.
 
Publisher SPRINGER-VERLAG BERLIN
 
Date 2011-10-22T08:05:25Z
2011-12-15T09:10:49Z
2011-10-22T08:05:25Z
2011-12-15T09:10:49Z
2010
 
Type Proceedings Paper
 
Identifier ENABLING REAL-TIME BUSINESS INTELLIGENCE,41,1-13
978-3-642-14558-2
1865-1348
http://dspace.library.iitb.ac.in/xmlui/handle/10054/14847
http://hdl.handle.net/100/1676
 
Source 3rd International Workshop on Business Intelligence for the Real-Time Enterprise,Lyon, FRANCE,AUG 24, 2009
 
Language English