Online Data Migration

Project Award Number IIS-0328393

This material is based upon work supported by the National Science Foundation under Grant No IIS-0328393 Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Principal Investigator

Betty J. Salzberg
College
of Computer and Information Science
Northeastern University

Boston, MA., 02115
Phone: (617) 373-2229
Fax : (617) 373-5121
Email: salzberg@ccs.neu.edu
URL: http://www.ccs.neu.edu/home/salzberg

Keywords

reorganization, concurrency, recovery, parallel systems, indexing, load balancing

Project Summary

This project explores techniques for solving several data migration problems. The requirements of such techniques include availability, search correctness, restartability without excessive log growth, low impact on system throughput, completion in reasonable time, limited extra disk space usage and incremental improvement as reorganization progresses. Topics include (1) movement of data in a parallel system for load balancing, (2) merging of indexes when data is moved and  (3) new hashing algorithms for load balancing.

Publications and Products .

“The BTR-Tree: Path-Defined Version-Range Splitting in a Branched and Temporal Structure,” with L. Jiang, D.Lomet and M.Barrena, 2003 SSTD, pp. 28-45.

“On Spatial-Range Closest-Pair Query,” with J. Shan and D.Zhang, 2003 SSTD, pp. 252-269.

“The CenSSIS Image Database,” with H. Wu, B. Norum, J. Newmark, C. Warner, C. DiMarzio, and D. Kaeli, SSDBM, 2003, pp. 117-126.

``A Framework for Access Methods for  Versioned Data,'' with Linan Jiang, David Lomet, Jing Shan and Manuel Barrena and Evangelos Kanoulas, EDBT 2004, pp.730-747.

“Online Event-Driven Subsequence Matching over Financial Data Steams,” with Huanmei Wu and Donghui Zhang, SIGMOD 2004, pp. 23-34.

 

``Supporting Load Balancing and Efficient Reorganization During System Scaling,'' with Feng Zhu, Xiaowei Sun, and Svein-Olaf Hvasshovd, IPDPS 2005, pager 49a.

“Subsequence Matching for Tumor Respiratory Motion Analysis,''
with Huanmei Wu, Gregory Sharp, Steve Jiang, Hiroki Shirato and David
Kaeli, SIGMOD 2005, pp. 682-693. 

“Online B-tree Merging,” with X. Sun, R.Wang, and C. Zou, SIGMOD 2005, pp.335-346.

“Close Pair Queries in Moving Object Databases,” with P. Zhou, D. Zhang, G. Cooperman and G. Kollios, GIS, 2005, pp. 2-11.

"The hB-pi* Tree: An Optimized Comprehensive Access Method for Frequent-Update Multidimensional Point Data," with P. Zhou, NU-CCIS-05-07, Boston, MA, 2005.

“Log-Based Recovery for Service-Oriented Applications ” with R. Wang, submitted for publication.

"Derivation of the tumor position from external respiratory surrogates with periodical updating of external/internal correlation," E. Kanoulas, J.Aslam, S. Jiang, and G. Sharp, Accepted as a poster in AAPM 48th Annual Meeting July30-Aug 3 2006

 

Project Impact

  1. Human Resources  Mr. Rui Wang and Mr. Panfeng Zhou, Northeastern University Ph.D. students, are working on this project. Our working group also includes Mr. Evangelos Kanoulas, Ms. Jing Shan, and Ms. Huanmei Wu, also Northeastern University Ph.D. students and Prof. Donghui Zhang and Prof. David Kaeli of Northeastern University.
  2. Education and curriculum development at all levels.
    The PI teaches graduate courses on parallel and distributed databases and on transaction processing. Her research has highly influenced the contents of these courses.
  3. Industry collaboration
    This work is in collaboration with Prof. Svein-Olaf Hvasshovd of ClustRa and of NUST in Norway.  In addition, Dr. Chendong Zou of IBM has participated as has Dr. Linan Jiang of Oracle and Dr. David Lomet of Microsoft, and Drs. Gregory Sharp and Steve Jiang from Massachusetts General Hospital.

 

Goals, Objectives and Targeted Activities

The goal of this project is to enable 24/7 availability in the face of rapidly expanding and contracting databases which are distributed on collections of computer nodes. The targeted activities are:

 

  1. Simulation experiments and papers about migrating data in parallel systems.
  2. Algorithms for merging of indexes when data is migrated.
  3. Hashing algorithms to improve load balancing.

Area Background

Understanding online reorganization requires a careful study of the different modules of a large DBMS which must interact with the reorganization process. If we want the process to be restartable after a system failure, we need to carefully log its progress. If we want other users to see a consistent picture of the database, we must do locking. We also try to make the amount of logging needed for reorganization small and we try to avoid unnecessary I/O. Thus a deep understanding of the logging, recovery and indexing components of the DBMS is necessary.

There is a trend towards making changes to systems automatic, eliminating many tasks of the database administrator. This means that reorganization which may be necessary for load balancing or because a node in a parallel system has failed, or because a node is added to a parallel system should be done by software. We are hoping to write some of that software.

The area of e-commerce especially needs to be scalable and have 24/7 service. So the databases supporting e-commerce must be able to be expanded and changed without going off-line. There is thus enhanced interest in these problems.

Area References

C. Zou and B. Salzberg, ``Safely and Efficiently Updating References During On-line Reorganization'', with C. Zou, VLDB 1998.

C. Zou and B. Salzberg ``On-line Reorganization of Sparsely-Populated B$^+$-trees,'' ACM SIGMOD Conference, June 1996, Montreal, pp. 115-124.

C. Zou and B. Salzberg, ``Towards Efficient Online Database Reorganization,'' invited article, Data Engineering Bulletin, vol 19, no 2, June 1996, pp. 33-40.

C. Zou, B. Salzberg and R. Ladin, ``On-line Reorganization: A position paper,'' High Performance Transaction systems workshop, 1995, (In workshop notes).

B. Salzberg and A. Dimock, ``Principles of Transaction-Based On-Line Reorganization,'' Proc. 18th VLDB 1992, Vancouver, pp. 511-520.

J. Gray and A. Reuter, "Transaction Processing: Techniques and Concepts", Morgan Kaufmann 1993

Svein-Olaf Hvasshovd and Oystein Torbjornsen and Svein Erik Bratsberg and Per Holager, The ClustRa Telecom Database: High Availability, High Throughput and Real-Time Response, Very Large Databases Conference, 1995, Montreal, pp 469--477

C. Mohan and Inderpal Narang, "Algorithms for Creating Indexes for Very Large Tables Without Quiescing Updates", ACM SIGMOD Conference, 1992, pp. 361--370.

V. Srinivasan and Mike Carey, "On-line Index reconstruction algorithms", High Performance Transaction Processing Workshop, 1991.

G.H. Sockut, T. A. Beavin and C. C. Chang, "A method for online reorganization of a database", IBM Systems Journal, vol.36 no. 3, pp. 411-436, 1997.

Nagavamsi Ponnekanti and Hanuma Kodavalla, "Online index rebuild", Technical report, Sybase, 1999

Potential Related Projects

Mr. Rui Wang and the P.I. are working on application recovery for distributed systems with multithreaded memory-sharing recovery units. Because this is also needed for 24/7 availability and also includes logging and interaction with DBMSs, this can be considered a related project.

Project Websites

http://www.ccs.neu.edu/home/salzberg/proj2004.html

 

this page

 

and

 

http://www.ccs.neu.edu/research/dblab/

 

a description of the database projects and the database lab at Northeastern University.