Report EDI-INF-RR-0632

Informatics Report Series

Report

EDI-INF-RR-0632

Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home

Title:Archiving Scientific Data

Authors: Peter Buneman ; Sanjeev Khanna ; Keishi Tajima ; Wang-Chiew Tan

Date: 2004

Publication Title:ACM Transactions on Database Systems (TODS) (SIGMOD/PODS Special Issue)

Publisher:ACM

Publication Type:Journal Article Publication Status:Published

Volume No:29(1) Page Nos:2-42

DOI:10.1145/974750.974752

Abstract:: Archiving is important for scientifc data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each element of the hierarchy. In this paper we exploit these properties to devlop an archiving technique that is both efficient in its use of space and preserves the continuity of elements through versions of the database, something that is not provided by traditional minimum-edit-distance diff approaches. The approach also uses timestamps. All versions of the data are merged into one hierarchy where an element appearing in multiple versions is stored only once along with a timestamp. By identifying the semantic continuity of elements and merging them into one data structure, our technique is capable of providing meaningful change descriptions, the archive allows us to easily answer certain temporal queries such as retrieval of any specific version from the archive and finding the history of an element. This is in contrast with approaches that store a sequence of deltas where such operations may require undoing a large number of changes or significant reasoning with the deltas. A suite of experiments also demonstrates that our archive does not incur any significant space overhead when contrasted with diff approaches. Another useful property of our approach is that we use XML format to represent hierarchical data and the resulting archive is also in XML. Hence, XML tools can be directly applied on our archive. In particular, we apply an XML compressor on our archive, and our experiments show that our compressed archive outperforms compressed diff-based repositories in space efficiency. We also show how we can extend our archiving tool to an external memory archiver for higher scalability and describe various index structures that can further improve the ef

Links To Paper
1st Link

Bibtex format
@Article{EDI-INF-RR-0632,: author = { Peter Buneman and Sanjeev Khanna and Keishi Tajima and Wang-Chiew Tan },; title = {Archiving Scientific Data},; journal = {ACM Transactions on Database Systems (TODS) (SIGMOD/PODS Special Issue)},; publisher = {ACM},; year = 2004,; volume = {29(1)},; pages = {2-42},; doi = {10.1145/974750.974752},; url = {http://www.soe.ucsc.edu/%7Ewctan/papers/2004/tods.pdf},
}

Home : Publications : Report