Advanced Databases

Coursework assignment 2: Merge Join

In this assignment, you will have to implement the Merge Join algorithm on attica.

Your implementation

The steps you need to follow are the following:

In more detail:

MergeJoin.java is under org/dejave/attica/engine/operators of your source installation (or in some other place you may want to store it depending on what environment you are using). This is the main class that you will have to modify. The class's constructor has the following signature:

/**
  * Construct a new mergejoin operator
  *
  * @param left the left input operator
  * @param right the right input operator
  * @param sm the storage manager
  * @param predicate the predicate evaluated by this join operator
  * @throws EngineException thrown whenever the operator cannot be
  * properly constructed
  */
public MergeJoin (Operator left,
            Operator right,
            StorageManager sm,
            int leftSlot,
            int rightSlot,
            Predicate predicate)
throws EngineException

The first three arguments are obvious and their proper values will be taken care of by the caller of the constructor -- in this case, attica's heuristics-based optimizer. The fourth and fifth argument should concern you, but only minimally. They are the indexes of the sort-attributes for either input relation (leftSlot for the left input, rightSlot for the right input.) You will need to access these attributes of the corrresponding tuple to "advance" your relation pointers when implementing the merging phase. The last argument is the predicate itself. The predicate evaluator will compute the value of the predicate (details following), while the argument value to the constructor will, again, be taken care of by the optimiser.

Moving on into MergeJoin.java, there is a method called initTempFiles(). You should write your own code here in order to generate any temporary files that may be used for join evaluation. You already know how to do so from your previous assignment.

The next step is implementing the merge join algorithm. For this, you will have to modify the setup() method of the MergeJoin class. Of course, not your entire implementation has to be in this single method! But when the method exits, the join result should be stored in the designated output file. In order to implement the merging phase, you will need to access the attributes on which the input relations are sorted (indexed by leftSlot for the left input and rightSlot for the right input.)

To look at another join implementation, consult NestedLoopsJoin.java. A crucial point is making the right call to the predicate evaluator. Assuming that the two tuples the join predicates is being evaluated on are leftTuple and rightTuple, then the call you need to make is the following:

Tuple rightTuple = rightMan.nextTuple();
PredicateTupleInserter.insertTuples(leftTuple, rightTuple, getPredicate());
boolean value = PredicateEvaluator.evaluate(getPredicate());
if (value) {
    // the predicate is true -- store the new tuple
    Tuple newTuple = combineTuples(leftTuple, rightTuple);
    outputMan.insertTuple(newTuple);
}

After you have implemented the merge phase of merge join, the final result should be stored in the output file pointed to by the outputFile field of MergeJoin (which outputManager writes to.) This file will then be scanned during the retrieval of the output relation. Again, you might want to take a look at NestedLoopsJoin.java to see how this is done.

The final step you should take: open PlanBuilder.java under attica/engine/optimiser of your source installation. Go to line 971 and uncomment all lines until line 1020.

Then substitute the calls to ExternalSort with your own implementation of ExternalSort -- remember to change the name if you have named it differently (if you have not submitted the first assignment, a solution will be provided.) .

Then comment out lines 1023 and 1024. That way, the next time a join is specified in a query, your code will be executed if it is an equi-join. In all other cases the system will choose nested loops.

What you need to hand in

You will have to hand in the compiled version of your source tree, along with the source for your implementation of MergeJoin.java.

The submission deadline is Monday, 17 March, 4:00 pm.

Use the submit program to make the submission. For instance, to submit the compiled version of the source tree, use:

submit adbs 2 attica.jar

You get the idea for the source file. If you have modified any other source files of the code base, please submit them as well. You might also think of handing in a description (a text file will do) of what you did if you think something is worth mentioning. It is not compulsory, but it might make marking the assignment easier. You can write whatever you want in that file, ranging from implementation issues and problems you faced (hopefully, along with the solutions you provided!) to comments about the code in general.

That is all. For any questions, contact me.

Good luck.
--stratis


Home : Teaching : Courses : Adbs 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh