Database Systems Implementation

Semester 2, 3rd year

Lecturer: Stratis Viglas
Appleton Tower, AT 2.11
Tel: +44 (0) 131 650 5183

This course has stopped being offered from the 2008/2009 academic year onwards.

Pre-requisites
Announcements
Schedule
Syllabus
Reading List
Assignments
Software

Pre-requisites

Students must have taken the Database Systems course. If you have not taken Database Systems or an equivalent course and still want to enroll in Database Systems Implementation please contact the instructor to obtain consent.

Announcements

Watch this space for any announcements concerning the course.

Lecture Schedule

Syllabus

Relational Databases Overview

Relational Schema

The relational schema is a table-oriented storage approach for the data we want to keep in the database. We will see what the table layout is and how this simple layout is used to manipulate the data.

Storage Structures

There are different storage structures associated with a relational table. In this part of the course, we will see what these structures are and how such a structure is tailored more towards certains operations and evaluation algorithms.

Query Evaluation

SQL

The first step at issuing a query to a dabase system is to declaratively specify the properties of the result data. This is usually achieved through a query language like SQL. The ouput of this, as far as the system is concerned, is an annotated syntax tree containing all the relevant information for the query. The system will then translate this tree into a procedural algebraic representation that is easier to optimise.

Relational Algebra and Equivalence Rules

The first step towards manipulating relational data is coming up with a procedural abstraction of the operations that need to be carried out over them. This is usually done through relational algebra. We will spend some time discussing the most commonly used relational operators, those that lead us to the query paradigm known as "conjunctive queries." We will the focus on the properties of the operators, namely the equivalence rules that allow us to transform expressions into equivalent forms. This is a key aspect of query optimisation.

Tuple-based Evaluation

Most relational engines employ what is called a tuple-based evaluation paradigm. That is, the evaluation plan is organized as a tree of relational operators; once the plan is to be executed, the operators pass tuples from one to another, until the inputs are exhausted and the complete result set is generated.

Cost-based Query Optimisation

Join Algorithms

One of the most important database operators is the join. That is because of its powerful semantics for the relational schema, as well as its cost as an operation. As such, a lot of research has gone into coming up with good join evaluation algorithms. Notice that there is no single good join evaluation algorithm; this is largely dependent on the input data themselves. We will deal with the different join evaluation algorithms and with what is called "join ordering." That is, ways of specifying the order of join evaluation in a query with multiple joins. This leads to ways in which the search space for a given query is explored in the course of query optimisation.

Cost of an Evaluation Plan

The algebraic rewrites of a query, along with the different evaluation algorithms, might be equivalent in terms of operator semantics, but they are quite different in terms of computational cost. We will see what this cost is in the context of relational databases and how this cost can be used to assess the relative performance of different evaluation plans.

Search Space Exploration

One of the fundamentals of query optimisation is the way the search space is explored. Each algebraic representation for a query, given the algebraic rewrite rules along with the different join evaluation algorithms, defines a search space that the optimiser has to explore in order to pich the best possible evaluation plan. It turns out that even for simple queries, this search space can be quite large and the time spent exploring it can, under circumstances, be comparable to the time needed to evaluate the query. As such, the optimisers use a set of heuristics and of pruning methods to limit the search space and explore a subset that contains good evaluation plans.

Selection of Evaluation Plan

This is where it all comes together. A query optimiser, after having an initial algebraic version of the query, explores the search space for that query and according to its cost model picks the evaluation plan that bears the minimum estimated cost for the portion of the search space explored. We will see how this is done.

Reading List

Raghu Ramakrishnan and Johannes Gehrke, Database Management Systems. (Third Edition) McGraw-Hill 2003.

Note that this is not required text; it is merely recommended. You should be fine with the slides used during the lectures alone. There might be additional course material with respect to the software we will be using for the assignments. This material will be handed out in due time.

Assignments

Software

In the context of the course, we will be using an experimental relational database system, written in Java, called attica.

Attica can be found at this location. Visit the page to obtain the software and installation instructions.

Home : Teaching : Courses