Relaxell (A RELAX NG schema validator written in Haskell)

Proposer: David Aspinall

Self-Proposed: No

Supervisor: David Aspinall, 6505177, da@inf.ed.ac.uk

Subject Areas: Algorithm Design, Formal methods: Specification Verification and Testing, Programming Languages and Functional Programming, Software Engineering, WWW Tools and Programming,

Suitable for the following degrees: MSc in Informatics, MSc in Computer Science,

Principal goal of the project: To write an XML validator for the schema language RELAX NG, in particular, to support type-safe programming with schema-valid XML documents

Description of the project:

RELAX NG is a powerful and elegant schema language for XML which supports a human readable "compact" notation that allows one to write grammars in an understandable way.

The aim of this project is to implement a validator for RELAX NG in Haskell, that, given a schema and a document as input, is able to check that the document indeed satisfies the schema. Once some technicalities of parsing XML have been mastered (hopefully by reusing parts of existing Haskell tools), the job of validation should be reasonably straightforward by following the RELAX specification and descriptions of the algorithm for validation.

The next part of the project will use the validator to support type-safe programming with XML conforming to a RELAX NG schema. In essence, the idea is to provide a RelaxToHaskell tool similar to HaXml's "DTDtoHaskell" (see reference below), which converts a schema into a Haskell data type definition, together with parsing and unparsing functions. A parsing function accepts a document which is valid wrt the schema and produces (in memory) an element of the datatype reflecting its structure in terms of the schema. Conversely, an element of the datatype can be written out as XML data which is automatically guaranteed to validate with respect to the schema. There are interesting possibilities for using Haskell's type system to support this in clever ways, supporting conversion between XSD schema datatypes (supported by RELAX NG) and Haskell values.

To demonstrate the tools some examples should be provided.

Resources Required: Nothing special (but installation of Haskell compiler required)

Degree of Difficulty: Challenging

Background Needed: Ideally, knowledge of Haskell (or another functional programming language so the learning curve will be less steep)

References: