iDiff: Informative summarization of differences in multidimensional aggregates
DSpace at IIT Bombay
View Archive InfoField | Value | |
Title |
iDiff: Informative summarization of differences in multidimensional aggregates
|
|
Creator |
SARAWAGI, S
|
|
Subject |
multidimensional databases
olap olap-mining integration difference mining data summarization advanced aggregates |
|
Description |
Multidimensional OLAP products provide an excellent opportunity for integrating mining functionality because of their widespread acceptance as a decision support tool and their existing heavy reliance on manual, user-driven analysis. Most OLAP products are rather simplistic and rely heavily on the user's intuition to manually drive the discovery process. Such ad hoc user-driven exploration gets tedious and error-prone as data dimensionality and size increases. Our goal is to automate these manual discovery processes. In this paper we present an example of such automation through a iDiff operator that in a single step returns summarized reasons for drops or increases observed at an aggregated level. We formulate this as a problem of summarizing the difference between two multidimensional arrays of real numbers. We develop a general framework for such summarization and propose a specific formulation for the case of OLAP aggregates. We develop an information theoretic formulation for expressing the reasons that is compact and easy to interpret. We design an efficient dynamic programming algorithm that requires only one pass of the data and uses a small amount of memory independent of the data size. This allows easy integration with existing OLAP products. Our prototype has been tested on the Microsoft OLAP server, DB2/UDB and Oracle 8i. Experiments using the OLAP benchmark demonstrate (1) scalability of our algorithm as the size and dimensionality of the cube increases and (2) feasibility of getting interactive answers with modest hardware resources.
|
|
Publisher |
KLUWER ACADEMIC PUBL
|
|
Date |
2011-08-17T05:58:26Z
2011-12-26T12:55:24Z 2011-12-27T05:39:20Z 2011-08-17T05:58:26Z 2011-12-26T12:55:24Z 2011-12-27T05:39:20Z 2001 |
|
Type |
Article
|
|
Identifier |
DATA MINING AND KNOWLEDGE DISCOVERY, 5(4), 255-276
1384-5810 http://dx.doi.org/10.1023/A:1011494927464 http://dspace.library.iitb.ac.in/xmlui/handle/10054/9771 http://hdl.handle.net/10054/9771 |
|
Language |
en
|
|