In this assignment, you will have to implement the Merge Join algorithm on attica.
The steps you need to follow are the following:
MergeJoin.java
file of your
source distribution, PlanBuilder.java
) so that
it will pick up your code.In more detail:
MergeJoin.java
is under
org/dejave/attica/engine/operators
of your source
installation (or in some other place you may want to store it depending
on what environment you are using). This is the main class that you will
have to modify. The class's constructor has the following signature:
/**
  * Construct a new mergejoin operator
  *
  * @param left the left input operator
  * @param right the right input operator
  * @param sm the storage manager
  * @param predicate the predicate evaluated by this join
operator
  * @throws EngineException thrown whenever the operator cannot
be
  * properly constructed
  */
public MergeJoin (Operator left,
            Operator right,
            StorageManager sm,
            int leftSlot,
            int rightSlot,
            Predicate predicate)
throws EngineException
The first three arguments are obvious and their proper values will
be taken care of by the caller of the constructor -- in this case,
attica's heuristics-based optimizer. The fourth and
fifth argument should
concern you, but only minimally. They are
the indexes of the sort-attributes for either input relation
(leftSlot
for the left input, rightSlot
for the right input.) You will need to access these
attributes of the corrresponding tuple to "advance" your relation
pointers when implementing the merging phase.
The last argument
is the predicate itself. The predicate
evaluator will compute the value of the predicate (details following),
while the argument value to the constructor will, again,
be taken care of by the optimiser.
Moving on into MergeJoin.java
, there is a method
called initTempFiles()
. You should write your own
code here in order to generate any temporary files that
may be used for join evaluation. You already
know how to do so from your previous assignment.
The next step is implementing the merge join algorithm. For this,
you will have to modify the setup()
method of the
MergeJoin
class. Of course, not your
entire implementation has to be in this single method! But
when the method exits, the join result should be stored
in the designated output file. In order to implement
the merging phase, you will need to access
the attributes on which the input relations are sorted
(indexed by leftSlot
for the left input and
rightSlot
for the right input.)
To look at another join implementation, consult
NestedLoopsJoin.java
. A crucial point is making the right
call to the predicate evaluator. Assuming that the two tuples the join
predicates is being evaluated on are leftTuple
and
rightTuple
, then the call you need to make is the
following:
Tuple rightTuple = rightMan.nextTuple();
PredicateTupleInserter.insertTuples(leftTuple, rightTuple,
getPredicate());
boolean value = PredicateEvaluator.evaluate(getPredicate());
if (value) {
    // the predicate is true -- store the new tuple
    Tuple newTuple = combineTuples(leftTuple, rightTuple);
    outputMan.insertTuple(newTuple);
}
After you have implemented the merge phase of merge join,
the final result should be stored in the output file pointed
to by the outputFile
field of MergeJoin
(which
outputManager
writes to.) This file will then be scanned
during the retrieval of the output relation. Again, you might
want to take a look
at NestedLoopsJoin.java
to see how this is done.
The final step you should take: open PlanBuilder.java
under attica/engine/optimiser
of your source installation.
Go to line 971
and uncomment all lines until
line 1020
.
Then substitute the calls to ExternalSort with your own implementation of ExternalSort -- remember to change the name if you have named it differently (if you have not submitted the first assignment, a solution will be provided.) .
Then comment out lines
1023
and 1024
. That way,
the next time a join is specified in a query, your code
will be executed if it is an equi-join. In all other
cases the system will choose nested loops.
You will have to hand in the compiled version of your source tree, along
with the source for your implementation of MergeJoin.java
.
The submission deadline is Monday, 17 March, 4:00 pm.
Use the submit
program to make the submission. For instance,
to submit the compiled version of the source tree, use:
submit adbs 2 attica.jar
You get the idea for the source file. If you have modified any other source files of the code base, please submit them as well. You might also think of handing in a description (a text file will do) of what you did if you think something is worth mentioning. It is not compulsory, but it might make marking the assignment easier. You can write whatever you want in that file, ranging from implementation issues and problems you faced (hopefully, along with the solutions you provided!) to comments about the code in general.
That is all. For any questions, contact me.
Good luck.
--stratis
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Logging and Cookies Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |