| 
      
    
    Abstract:
 Normalization as a way of producing good database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While in the relational world the criteria  for being well-designed are usually very intuitive and clear to state,  they become more obscure when one moves to more complex data models.
 Our goal is to provide a set of tools for testing when a condition on a database design, specified by a {\em normal form}, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms. 
    Links To Paper1st Link 
    Bibtex format@Article{EDI-INF-RR-0833,author	= {
  Leonid Libkin
   and Marcelo Arenas
},title   = {An information-theoretic approach to normal forms for relational and XML data.},journal = {Journal of the ACM},publisher = {ACM},year = 2005,volume = {52},pages = {246-283},doi = {10.1145/1059513.1059519},url = {http://homepages.inf.ed.ac.uk/libkin/papers/jacm-pods03.ps.gz},} |