SWS Coursework 2015/16
Submission should be a zipped file, cointaining the files described below, submitted via the on-line submit system as follows:
submit sws 2 student_matriculation_id.zip
Deadline |
Friday 25th of March 2016, 4pm |
Marks |
50 |
Design three SPARQL queries that provide a representative sample of information that can be extracted from the dataset assigned to you in Assignment 1. In this case, representative means:
- Coverage of the queries with respect to the whole dataset. Your queries should compute results using different parts of the dataset and, altogether, they should be able to return a significant part of the data as results. To assess the coverage of your queries you can think about the following questions:
- Is there a lot of data that is not used by the queries? How much of the dataset could you remove while still being able to run the queries without losing any result?
- Imagine that these queries are your only way to access the dataset, how much of the original data could you retrieve?
- Usefulness. The results of the queries should represent interesting views of the dataset. For example:
- by highlighting some interesting connection between the entities in the dataset,
- by selecting some entities with some important properties,
- by computing results which would be useful in some realistic applications
- etc.
Evaluate the queries against your dataset and store the results. In order to do so, you might want to use one of the many existing
tools and libraries to evaluate SPARQL queries against local RDF files. Apache Jena is a good tool to work with RDF and SPARQL, and it also allows federated queries. You might also want to look at this tutorial on how to query RDF with SPARQL:
http://www.inf.ed.ac.uk/teaching/courses/masws/Coding/build/html/index.html.
In addition to the three SPARQL queries mentioned before, create a SPARQL query that combines your local dataset with 3rd party data. This query should produce results that could not be computed from any single source alone (e.g. only using your dataset). You have two options to do so:
- You can create a federated query that combines your local dataset with one or more remote endpoints. Note: this is the preferred option and, if done correctly, it should be the fastest to implement.
- If you have problems executing a federated query you can also create a small local data set which combines a sample of your data with a sample of the 3rd party data into a single file. You should then be able to query this small combined set of data relatively easily. Note: you will not lose marks if you choose this option.
When you have completed this task, write the report for the first part of this assignment by answering the following questions (the number in brackets indicates the percentage of marks for each question):
- [30/100] Execute your queries against your RDF dataset, using a standard
SPARQL query engine.
- Briefly explain how you executed the queries (e.g. which tools have you used and how).
Then for each query:
- List the query.
- Briefly describe its coverage and usefulness (i.e. why it is representative).
- Include the result set of the query, limited to no more than the first 10 results.
- [20/100] Question about the SPARQL query that combines your data with 3rd party sources:
- List the SPARQL query.
- Briefly describe the meaning of the query, i.e. what it is supposed to compute.
- Explain which external data sources are being included and why they are they needed.
- Execute this query and include in your report the result set, limited to the first 10 results.
- Only in case you did not create a federated query, and you are querying 3rd party data locally: explain how you obtained this data, and how you combined it with your own dataset.
Aim of coursework:
You will build your ontology to model the domain of Computing (Computer Science or Informatics) and related domain of humans (students, lecturers, etc) at the university level. The detail and granularity of your analysis should be adapted to a student's perspective of the domain. Your formalism should capture the most significant aspects of the domain. You will be required to model your ontology using Protege, an ontology editor (http://protege.stanford.edu/).
To build your ontology, you must complete the tasks below:
1. Analysis of your domain [5/100]
You should identify and discuss the key concepts and relations of your domain. Briefly explain what kind of queries your domain should be capable of assisting a domain expert to answer. List 5 queries that you expect your ontology to be capable of answering.
2. Concepts and Concept Hierarchies [10/100]
In this task, you must identify the most significant concepts (classes) that exist among objects in your domain. Also identify any concept hierarchies (class/sub-class) that exist. Concepts can be physical objects or intangible notions of objects which exists within the domain. You should specify these concepts and hierarchies in Description Logic.
Some examples of concepts in this domain include: Person, Student, Staff, Course, etc. An example of a concept hierarchy is Student ⊑ Person.
It is also important to show where classes are disjoint (e.g. Staff ⊓ Student ≡ ⊥) or equivalent (e.g. Staff ≡ Employee ) and explain why in a plain sentence.
You should have at least 8 concepts, 5 concept hierarchies, 4 disjoint classes and 3 equivalent classes in addition to the examples given. You should have no more than 30 concepts, subclasses, disjoint and equivalence classes in total.
3. Relations (Properties) and Relation Hierarchies [10/100]
Identify the most significant relations (both object properties and data properties) that hold among objects in your domain. Also identify any relation hierarchies that exist, where one relation between two objects is more general than another. You will also identify inverse relations of some object properties.
Represent your identified relations and relation hierarchies in Description Logic. Explain or describe the meaning of each in a line or two of plain text.
You should have at least 5 object properties and 2 data properties, 2 inverse properties and 2 relation hierarchies. You should indicate the correct domains and ranges for properties.
4. Defining Semantic Axioms of Domain [10/100]
In order to give meaning to your ontology, you have to define theoretical axioms that link several concepts in your domain. You should also use axioms to give meaning to new concepts in terms of logical combinations of other concepts in your domain. Encode your axioms using Description Logic.
For example, in first order logic, we can give meaning to a concept Teacher as:
∀x[Teacher(x)↔Staff(x) ∧ ∃y[ Course(y) ∧ teaches(x,y)]]
You should formally define and explain at least 5 axioms using the concepts and relations already defined in your model.
5. Modelling Concepts and Relations in Protege [8/100]
Using Protege, create your ontology by specifying your concept, class hierarchies, relations, relation hierarchies and axioms of the domain. Include at least one instance of each concept in your vocabulary. Generate a visual output of your ontology in Protege. Your OWL script should be valid and be capable of answering queries. You will submit the OWL script that Protege generates.
6. Evaluation of OWL File [7/100]
Evaluate the correctness of your ontology by generating queries in Protege to check the output of your answers. Using the 5 queries listed in question 1, query your ontology using the DL Query Tab in Protege and save the output generated. Creating queries which require the use of at least one axiom will earn you extra marks. You can use the default reasoner in Protege, or download one if your installation of Protege has no reasoner. See http://protege.stanford.edu/.
You will submit each query and the results that Protege outputs. In no more than two sentences for each query, explain whether or not the Protege outputs of your query are correct based on the formalized vocabulary. Show any limitations of your vocabulary or axioms.
Submission
Create a PDF file containing your answers to the questions of the first part of the assignment, along with your representation and explanations of your concepts, relations and axioms of this second part of the assignment. Your ontology in Protege should be saved as an OWL or RDF file. The graphical representation of your ontology generated from Protege should be saved as a JPEG file. Zip these three files and save with your student number as the filename.
Submit these files together in a single zipped folder to the assignment submission system:
submit sws 2 student_matriculation_id.zip