Informatics Report Series


Report   

EDI-INF-RR-0826


Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home
Title:Tradeoffs in XML Database Compression
Authors: James Cheney
Date:Mar 2006
Publication Title:Proceedings of the 2006 IEEE Data Compression Conference
Publication Type:Conference Paper
Abstract:
Large XML databases, or XML databases, are now a common way to distribute scientific and bibliographic data, and storing such data efficiently is an important concern. A number of approaches to XML compression have been proposed in the last five years. The most competitive approaches employ one or more statistical text compressors based on PPM or arithmetic coding in which some of the context is provided by the XML document structure. The purpose of this paper is to investigate the relationship between the extant proposals in more detail. We review the two main statistical modeling approaches proposed so far, and evaluate their performance on two representative XML databases. Our main finding is that while a recently-proposed multiple-model approach can provide better overall compression for large databases, it uses much more memory and converges more slowly than a single-model approach.
Links To Paper
1st Link
Bibtex format
@InProceedings{EDI-INF-RR-0826,
author = { James Cheney },
title = {Tradeoffs in XML Database Compression},
book title = {Proceedings of the 2006 IEEE Data Compression Conference},
year = 2006,
month = {Mar},
url = {http://doi.ieeecomputersociety.org/10.1109/DCC.2006.79},
}


Home : Publications : Report 

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh