Informatics Report Series


Report   

EDI-INF-RR-0321


Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home
Title:An Empirical Evaluation of Simple DTD-Conscious Compression Techniques
Authors: James Cheney
Date:Jun 2005
Publication Title:Proc. WebDB Workshop 2005
Publication Type:Conference Paper
Abstract:
Since XML markup often displays a high degree of redundancy, ordinary text compressors (gzip [7], bzip2 [15], etc.) are frequently used for XML storage and transmission. Text compressors perform adequately for archiving XML files in many situations; however, they are blind to the underlying structure of the XML document so may miss compression opportunities. Because of this, researchers have studied, and companies have marketed, XML compression tools. In previous work [5], we developed a streaming XML-conscious compressor xmlppm, and showed that it provides compression superior to other contemporary text and XML-conscious compression techniques (including XMill [13]). The purpose of this paper is to investigate whether DTD information can be used to improve compression in xmlppm enough to justify the added implementation effort. We consider the minimum-length coding problem for valid XML: Given a data source producing XML conforming to a DTD, find the smallest possible encoding. We assume both sender and receiver have access to identical copies of the DTD.
Links To Paper
Local copy
Online proceedings
Bibtex format
@InProceedings{EDI-INF-RR-0321,
author = { James Cheney },
title = {An Empirical Evaluation of Simple DTD-Conscious Compression Techniques},
book title = {Proc. WebDB Workshop 2005},
year = 2005,
month = {Jun},
url = {http://homepages.inf.ed.ac.uk/jcheney/publications/cheney05webdb.pdf},
}


Home : Publications : Report 

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh