========================== Querying RDF with SPARQL ========================== Introduction ------------ General material on SPARQL can be found on the MASWS Wiki `SPARQL/ARQ page `_. Implementations --------------- ARQ There are a number of SPARQL implementations around. So far, the best one I have found is ARQ, which is based on the `Jena toolkit `_. It is available as the command :command:`arq` within the set of command-line tools which come with the Jena download. Installing ARQ See these `notes on Jena `_ for more information on installing and using Jena. ARQ Documentation Documentation can be found in the `ARQ Tutorial `_, which is pretty good, and links to a few other resources. ARQ 2.8.8 --------- ARQ-2.8.8 should also be available on DICE in the directory :file:`/usr/share/java/arq/bin/`. If you want to use this version, you will have to set a different environmental variable, namely :envvar:`ARQROOT`:: export ARQROOT=/usr/share/java/arq export PATH="${ARQROOT}/bin:${PATH}" However, these notes all work for the earlier version of ARQ available :file:`/usr/share/java/jena/bin/`. Querying -------- Here's an example of a simple SPARQL query (downloadable as :download:`example-01.rq <../sparql/example-01.rq>`): .. include:: ../sparql/example-01.rq :literal: Most of this should be familiar to you, but the ``FROM`` clause may be new. This says that the query should be run against the RDF data to be found at ``http://homepages.inf.ed.ac.uk/ewan/foaf.n3``. Note that the URI has to be addressable via HTTP when the query is executed. To call the ARQ SPARQL query engine using the SPARQL query in :file:`example-01.rq`, we use an instruction like the following on the command-line, :: dice:~> arq --query example-01.rq Given my FOAF file, :file:`arq` will print the following output to the terminal:: --------------------------------- | name1 | name2 | ================================= | "Ewan Klein" | "Harry Halpin" | --------------------------------- We are allowed to have more than one ``FROM`` clause in a query, and the resulting graphs are merged. This is shown in the next example (downloadable as :download:`example-02.rq <../sparql/example-02.rq>`), where we query both my FOAF file and Harry Halpin's. .. include:: ../sparql/example-02.rq :literal: As you can observe here, SPARQL triple patterns allow the same abbreviatory syntax as we have already seen for N3 / Turtle. Running the query gives the following result set:: -------------------------------------- | name1 | name2 | ====================================== | "Harry Halpin" | "Daniel Weitzner" | | "Harry Halpin" | "Tim Berners-Lee" | | "Harry Halpin" | "Dan Connolly" | | "Harry Halpin" | "Ian Davis" | | "Harry Halpin" | "Paolo Bouquet" | | "Ewan Klein" | "Harry Halpin" | -------------------------------------- The next query is run against a couple of data files, namely :download:`knows.n3 <../rdf/knows.n3>` and :download:`cafes.n3 <../rdf/cafes.n3>`. In this case, I want to recover cafes that are loved by people who know someone or else who someone knows (recall that we are not treating ``foaf:knows`` as symmetric). In order to match these two alternatives, I use the ``UNION`` keyword, as shown here (downloadable as :download:`example-03.rq <../sparql/example-03.rq>`): .. include:: ../sparql/example-03.rq :literal: Notice that I don't have a ``FROM`` clause in the SPARQL query. In order to get the data, I can specify one or more local files on the command line using the ``--data`` option:: arq --query=example-03.rq --data=../rdf/knows.n3 --data=../rdf/cafes.n3 Here are the results:: -------------------- | cafe | x | ==================== | :vittoria | :stu | | :pyard | :amy | | :aroast | :stu | | :ebagel | :amy | | :ebagel | :bea | | :ebagel | :bea | -------------------- As you can see, there's a duplicate line at the bottom of the results. To eliminate this, we can use the ``distinct`` keyword. .. include:: ../sparql/example-04.rq :literal: Exercises --------- #. Download the relevant queries and try running the examples given above. #. Next, modify the queries in various ways and see what results you get. #. Finally, run some queries against your own RDF data, both local and remote.