Record Details

iDiff: Informative summarization of differences in multidimensional aggregates

DSpace at IIT Bombay

View Archive Info
 
 
Field Value
 
Title iDiff: Informative summarization of differences in multidimensional aggregates
 
Creator SARAWAGI, S
 
Subject multidimensional databases
olap
olap-mining integration
difference mining
data summarization
advanced aggregates
 
Description Multidimensional OLAP products provide an excellent opportunity for integrating mining functionality because of their widespread acceptance as a decision support tool and their existing heavy reliance on manual, user-driven analysis. Most OLAP products are rather simplistic and rely heavily on the user's intuition to manually drive the discovery process. Such ad hoc user-driven exploration gets tedious and error-prone as data dimensionality and size increases. Our goal is to automate these manual discovery processes. In this paper we present an example of such automation through a iDiff operator that in a single step returns summarized reasons for drops or increases observed at an aggregated level. We formulate this as a problem of summarizing the difference between two multidimensional arrays of real numbers. We develop a general framework for such summarization and propose a specific formulation for the case of OLAP aggregates. We develop an information theoretic formulation for expressing the reasons that is compact and easy to interpret. We design an efficient dynamic programming algorithm that requires only one pass of the data and uses a small amount of memory independent of the data size. This allows easy integration with existing OLAP products. Our prototype has been tested on the Microsoft OLAP server, DB2/UDB and Oracle 8i. Experiments using the OLAP benchmark demonstrate (1) scalability of our algorithm as the size and dimensionality of the cube increases and (2) feasibility of getting interactive answers with modest hardware resources.
 
Publisher KLUWER ACADEMIC PUBL
 
Date 2011-08-17T05:58:26Z
2011-12-26T12:55:24Z
2011-12-27T05:39:20Z
2011-08-17T05:58:26Z
2011-12-26T12:55:24Z
2011-12-27T05:39:20Z
2001
 
Type Article
 
Identifier DATA MINING AND KNOWLEDGE DISCOVERY, 5(4), 255-276
1384-5810
http://dx.doi.org/10.1023/A:1011494927464
http://dspace.library.iitb.ac.in/xmlui/handle/10054/9771
http://hdl.handle.net/10054/9771
 
Language en