Informatics Report Series


Report   

EDI-INF-RR-0856


Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home
Title:Efficient implementation of content models with numerical occurrence constraints
Authors: Henry Thompson
Date:May 2006
Publication Title:XTech 2006, Amsterdam, Netherlands
Publisher:IDE Alliance
Publication Type:Conference Paper Publication Status:Published
Page Nos:12
Abstract:
W3C XML Schema allows numerical occurrence ranges in the regular expressions over element declarations and wildcards which it uses to express content models, to e.g. to allow between 2 and 10 occurrences of some element. In [ThomTob03] we introduced an approach to translating such regular expressions into deterministic finite-state automata by extending the standard translation for regular expressions without numerical occurrence ranges [KThom68], [AhoUllman77] to unfold terms with numerical occurrence ranges. Although straightforward, the Thompson and Tobin algorithm is expensive in terms of space. When numerical occurrence ranges are nested, the resulting FSA has a number of states proportional to the product of the length of the ranges. In the worst case, this renders the algorithm unusable. In this paper I present a new approach to implementing such models without unfolding. It uses FSAs augmented with ranges, and is space-efficient even when nested ranges are involved. The deterministic interpretations (greedy or modest) of such FSAs are also time-efficient, but fail to recognise languages involving certain configurations of nested ranges correctly. Implementers may none-the-less find the construction described herein useful, as the configurations which are not recognised correctly by the deterministic interpretations are a) detectable at conversion time and b) extremely rare in practice, so falling back to a non-deterministic interpretation in such cases is probably acceptable.
Links To Paper
1st Link
Bibtex format
@InProceedings{EDI-INF-RR-0856,
author = { Henry Thompson },
title = {Efficient implementation of content models with numerical occurrence constraints},
book title = {XTech 2006, Amsterdam, Netherlands},
publisher = {IDE Alliance},
year = 2006,
month = {May},
pages = {12},
url = {http://xtech06.usefulinc.com/schedule/paper/118},
}


Home : Publications : Report 

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh