+ All Categories
Home > Documents > [Lecture Notes in Computer Science] Data Management. Data, Data Everywhere Volume 4587 || An...

[Lecture Notes in Computer Science] Data Management. Data, Data Everywhere Volume 4587 || An...

Date post: 20-Dec-2016
Category:
Upload: jessie
View: 216 times
Download: 1 times
Share this document with a friend
4
An Efficient Sheet Partition Technique for Very Large Relational Tables in OLAP Sung-Hyun Shin 1 , Hun-Young Choi 2 , Jinho Kim 2 , Yang-Sae Moon 2 , and Sang-Wook Kim 1 1 College of Information & Communications, Hanyang University, Korea 2 Department of Computer Science, Kangwon National University, Korea {shshin, hychoi, jhkim, ysmoon}@kangwon.ac.kr, [email protected] 1 Introduction Spreadsheets such as Microsoft Excel are OLAP (On-Line Analytical Process- ing) [2] applications to easily analyze complex multidimensional data. In gen- eral, spreadsheets provide grid-like graphical interfaces together with various chart tools [4,5]. However, previous work on OLAP spreadsheets adopts a naive approach that directly retrieves, transmits, and presents all the resulting data at once. Thus, it is difficult to use the previous work for very large relational tables with millions of rows or columns due to the communication and space overhead. In this paper we propose an efficient spreadsheet-based interface to incremen- tally browse the query result on very large relational tables. The proposed inter- face exploits the sheet partition technique that selectively browses small parts of the resulting table. Our sheet partition technique first divides a large resulting table into many small-sized sheets, called partitions, and then browses the par- titions one by one according to the user’s request. More precisely, the technique works as follows: (1) the client, i.e., the user requests a query with a specific column (or row); (2) the server stores the query result on the given column (or row) as the temporary data; (3) the server provides an initial partition, which are constructed by using the initial column (or row) range of temporary data; and (4) the client repeatedly interacts with the server to browse more partitions. Since the sheet partition technique enables us to use a few small-sized partitions instead of a single large-sized sheet, we can reduce the communication and space overhead. Also, we can easily analyze large relational tables by exploiting the concept of divide and conquer in spreadsheet applications. 2 The Proposed Sheet Partition Technique There have been many efforts to provide spreadsheet-like views for easy analysis on multidimensional data. In [5], Witkowski et al. defined spreadsheet-typed tables in relational databases by extending standard SQL statements. In [6], Witkowski et al. also proposed an Excel-based analysis method to exploit various powerful Excel functions in handling the original relational data. These works, however, do not consider very large relational tables with millions of rows or R. Cooper and J. Kennedy (Eds.): BNCOD 2007, LNCS 4587, pp. 176–179, 2007. c Springer-Verlag Berlin Heidelberg 2007
Transcript
Page 1: [Lecture Notes in Computer Science] Data Management. Data, Data Everywhere Volume 4587 || An Efficient Sheet Partition Technique for Very Large Relational Tables in OLAP

An Efficient Sheet Partition Techniquefor Very Large Relational Tables in OLAP

Sung-Hyun Shin1, Hun-Young Choi2, Jinho Kim2, Yang-Sae Moon2,and Sang-Wook Kim1

1 College of Information & Communications, Hanyang University, Korea2 Department of Computer Science, Kangwon National University, Korea

{shshin, hychoi, jhkim, ysmoon}@kangwon.ac.kr, [email protected]

1 Introduction

Spreadsheets such as Microsoft Excel are OLAP (On-Line Analytical Process-ing) [2] applications to easily analyze complex multidimensional data. In gen-eral, spreadsheets provide grid-like graphical interfaces together with variouschart tools [4,5]. However, previous work on OLAP spreadsheets adopts a naiveapproach that directly retrieves, transmits, and presents all the resulting data atonce. Thus, it is difficult to use the previous work for very large relational tableswith millions of rows or columns due to the communication and space overhead.

In this paper we propose an efficient spreadsheet-based interface to incremen-tally browse the query result on very large relational tables. The proposed inter-face exploits the sheet partition technique that selectively browses small parts ofthe resulting table. Our sheet partition technique first divides a large resultingtable into many small-sized sheets, called partitions, and then browses the par-titions one by one according to the user’s request. More precisely, the techniqueworks as follows: (1) the client, i.e., the user requests a query with a specificcolumn (or row); (2) the server stores the query result on the given column (orrow) as the temporary data; (3) the server provides an initial partition, whichare constructed by using the initial column (or row) range of temporary data;and (4) the client repeatedly interacts with the server to browse more partitions.Since the sheet partition technique enables us to use a few small-sized partitionsinstead of a single large-sized sheet, we can reduce the communication and spaceoverhead. Also, we can easily analyze large relational tables by exploiting theconcept of divide and conquer in spreadsheet applications.

2 The Proposed Sheet Partition Technique

There have been many efforts to provide spreadsheet-like views for easy analysison multidimensional data. In [5], Witkowski et al. defined spreadsheet-typedtables in relational databases by extending standard SQL statements. In [6],Witkowski et al. also proposed an Excel-based analysis method to exploit variouspowerful Excel functions in handling the original relational data. These works,however, do not consider very large relational tables with millions of rows or

R. Cooper and J. Kennedy (Eds.): BNCOD 2007, LNCS 4587, pp. 176–179, 2007.c© Springer-Verlag Berlin Heidelberg 2007

Page 2: [Lecture Notes in Computer Science] Data Management. Data, Data Everywhere Volume 4587 || An Efficient Sheet Partition Technique for Very Large Relational Tables in OLAP

An Efficient Sheet Partition Technique for Very Large Relational Tables 177

columns [1] in presenting the tables as the spreadsheets. Therefore, in this paperwe focus on the spreadsheet interface for the large relational tables.

The sheet partition technique enables us to analyze a few small-sized sheet par-titioned from a single large-sized sheet. Figure 1 shows an overall working frame-work for our sheet partition technique. In Step (1) a user indicates column-basedor row-based partitions by providing a specific column or row together with aquery. In Step (2), the server evaluates the query and stores the result as tempo-rary data. In Step (3), the server provides an initial partition to the user as theform of a spreadsheet. Finally, in Steps (4) and (5) the client repeatedly interactswith the server to get more partitioned sheets for the user’s additional request.Likewise, by using the sheet partition technique, we do not need to handle alarge-sized table at once, but we can incrementally process the table with a fewsmall-sized partitions that are selected by the continuous user interactions.

Temporary Data(on a Column or Row)

Database(Source Tables)

RDBMSEngine

SheetPartitionEngine

PartitionedSheets

• • •

(1) Column or rowwith a query

(3) Sheets

(4) Request more

(5) More sheets

(2)

(3)~(5)

(2)

(3)~

(5)

* (2): Build the temporary data based on the given column or row

Fig. 1. An overall query processing framework using the sheet partition technique

We first explain the column-based sheet partition. This method uses a user-specified column to partition a large table into small sheets, and provides a fewselected sheets as the resulting views. Figure 2 shows an algorithm StoreAColumnthat retrieves the distinct values of the user-specified column and stores them ina temporary array, which will be used to construct the partitions based on thecolumn values. In Line 1, we first declare a temporary array ColValues[1..count]to store the values of the given column. Here, count is the total number of distinctvalues of the column. In Line 2, we then declare Cursor as the select statementthat retrieves values of the specific column. Since the column may have duplicatevalues, we explicitly specify the quantifier distinct. In Lines 3 to 6, we finallystore the resulting values obtained by Cursur in ColValues[]. Eventually, thearray ColValues[] contains the distinct values of the given column.

Figure 3 shows an algorithm ColumnBasedSheet that retrieves tuples whosecolumn values in the user-specified range. We note that the algorithm uses thearray ColValues[] obtained by StoreAColumn in Figure 2. The inputs to thealgorithm are from and to of a sheet (Line 1). Using them as indexes of ColVal-ues[] we select the tuples from the table (Line 2). We then store the tuples inresult (Line 3), and return them as the partitioned sheet (Line 5). Therefore,using ColumnBasedSheet we can interactively and repeatedly access the parti-tioned sheets by changing the input range (i.e., from and to).

Page 3: [Lecture Notes in Computer Science] Data Management. Data, Data Everywhere Volume 4587 || An Efficient Sheet Partition Technique for Very Large Relational Tables in OLAP

178 S.-H. Shin et al.

Algorithm StoreAColumn(Table table, Column column)

1 declare string ColValues[1..count];

2 cursor Cursor is select distinct column from table order by column;

3 open Cursor; i : = 1;

4 while Cursor is not null

5 fetch Cursor into ColValues[i++];

6 close Cursor;

Fig. 2. An algorithm StoreAColumn for retrieving distinct values of the given column

Algorithm ColumnBasedSheet(int from, int to)

1 for i := from to to

2 select ∗ from [table name] where [column name] = ColValues[i];

3 store the selected tuples into result;

4 end-of-for

5 return result as the current column-based sheet;

Fig. 3. An algorithm ColumnBasedSheet for constructing a column-based sheet

We now briefly explain the row-based sheet partition. The method inserts anadditional index attribute to the source table (or the join table) to generate serialnumbers to be accessed. We consider five methods of assigning an index attributeto the table: (1) adding an index attribute to the source table, (2) creating aduplicated table containing an index attribute, (3) creating a virtual source tableusing a cursor, (4) creating a join table using primary keys of source tables, and(5) creating a virtual join table using two or more cursors. These row-basedmethods are more complex than the column-based ones due to using an index.We omit the details on the row-based sheet partition algorithms due to spacelimitation. We are now trying to find an optimal strategy by implementing allthe five methods.

3 Implementation of the Sheet Partition Technique

The hardware platform is an Intel Pentium IV PC. The software platform isMicrosoft Windows XP and Microsoft SQL Server 2005 DBMS [3]. As the ex-perimental data, we use a fact table sales fact 1988 of FoodMart2000 providedin SQL Server 2005.

Figure 4 shows an example of screen captures obtained by the sheet partitiontechnique. Figure 4(a) shows a screen capture caused by the column-based sheetpartition. The sheet in Figure 4(a) is obtained from a value ‘Apple’ with respectto the column ‘fruit.’ Figure 4(b) shows a screen capture caused by the row-based sheet partition. The sheet in Figure 4(b) is obtained by dividing a largesource table into small sheets, each of which contains ten tuples.

Page 4: [Lecture Notes in Computer Science] Data Management. Data, Data Everywhere Volume 4587 || An Efficient Sheet Partition Technique for Very Large Relational Tables in OLAP

An Efficient Sheet Partition Technique for Very Large Relational Tables 179

(a) Column-based sheet partition (b) Row-based sheet partition

Fig. 4. An example of screen captures for the partitioned sheets

4 Conclusions

Spreadsheets are widely used in OLAP for efficient and easy analysis on complexdata. In this paper we have proposed the sheet partition technique that dividesa large-sized table into small-sized sheets and incrementally browses only a fewselected sheets. Our sheet partition technique employs the column-based or row-based methods. The column-based method partitions a large table based onranges of the given column, and the row-based method does based on serialnumbers of an index attribute. We have designed and implemented the parti-tion algorithms to confirm practical effectiveness of our technique. We are nowperforming various experiments to find an optimal strategy for the row-basedpartition method.

Acknowledgements

This work was supported by the Ministry of Science and Technology (MOST)/Korea Science and Engineering Foundation (KOSEF) through the Advanced In-formation Technology Research Center (AITrc).

References

1. Agrawal, R., et al.: Storage and Querying of E-Commerce Data. In: Proc. the 27thInt’l Conf. on Very Large Data Base, Roma, Italy, pp. 149–158 (September 2001)

2. Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology.SIGMOD Record 26(1), 65–74 (1997)

3. Microsoft SQL Server (2005) http://www.microsoft.com/sql/4. Raman, V., et al.: Scalable Spreadsheets for Interactive Data Analysis. In: Proc.

ACM SIGMOD Workshop on DMKD, Philadelphia (May 1999)5. Witkowski, A. et al.: Spreadsheets in RDBMS for OLAP. In: Proc. Int’l Conf. on

Management of Data, ACM SIGMOD, San Diego, California, pp. 52–63 (June 2003)6. Witkowski, A., et al.: Query By Excel. In: Proc. the 31st Int’l Conf. on Very Large

Data Bases, Trondheim, Norway, pp. 1204–1215 (September 2005)


Recommended