- Abstract:
-
Sorting hierarchical data in external memory is needed in a wide variety of applications including archiving scientific data and dealing with large XML datasets. The topic of sorting hierarchical data has received little attention form the research community so far. In this paper, we focus on sorting arbitrary hierarchical datasets that exceed the size of physical memory. We propose HErMeS, an algorithm that generalizes the most widely-used techniques for sorting flat data in external memory, namely multiway merge-sort and replacement selection. HErMeS efficiently takes into consideration the hierarchical nature of the data in order to minimize the number of disk accesses and optimize the usage of available memory. The algorithm's theoretical bounds with respect to the structure of the hierarchical dataset are extracted, while an experimental study demonstrates that our implementation of the algorithm meets its theoretical expectations. Using several workloads, we compare our algorithm to existing approaches and show that it outperforms them by a significant factor. These results, we believe, prove our technique to be a viable and scalable solution.
- Links To Paper
- 1st Link
- Bibtex format
- @Misc{EDI-INF-RR-1217,
- author = {
Ioannis Koltsidas
and Heiko Mueller
and Stratis Viglas
},
- title = {Sorting Hierarchical Data in External Memory},
- year = 2007,
- url = {http://homepages.inf.ed.ac.uk/s0679010/hermes-TR.pdf},
- }
|