Title:An Empirical Evaluation of Simple DTD-Conscious Compression Techniques
Authors: James Cheney
Date:Jun 2005
Publication Title:Proc. WebDB Workshop 2005
Publication Type:Conference Paper
Since XML markup often displays a high degree of redundancy, ordinary text compressors (gzip [7], bzip2 [15], etc.) are frequently used for XML storage and transmission. Text compressors perform adequately for archiving XML files in many situations; however, they are blind to the underlying structure of the XML document so may miss compression opportunities. Because of this, researchers have studied, and companies have marketed, XML compression tools. In previous work [5], we developed a streaming XML-conscious compressor xmlppm, and showed that it provides compression superior to other contemporary text and XML-conscious compression techniques (including XMill [13]). The purpose of this paper is to investigate whether DTD information can be used to improve compression in xmlppm enough to justify the added implementation effort. We consider the minimum-length coding problem for valid XML: Given a data source producing XML conforming to a DTD, find the smallest possible encoding. We assume both sender and receiver have access to identical copies of the DTD.
