SPARQL is pronounced ‘sparkle’ and stands for Semantic Protocol and RDF Query Language. It looks similar to SQL but also totally different since it has to encompass the non-existing field and table names, yet include named graphs and links attached to links.

The easiest way to play with SPARQL is to navigate to the YASGUI demo or to clone the Github repo. This UI allows you access many open SPQRQL endpoints, including DBPedia, PubChem and WordNet.

There are a few gotchas when fiddling with SPARQL:

  • triple stores have ‘databases’ and the SPARQL endpoint usually includes the name of the database.
  • furthermore, endpoints to select are different from endpoints to update data. For example, if you use Fuseki with a ‘Test’ database you have a read-endpoint ‘http://localhost:3030/Test/sparql’ and an update-endpoint ‘http://localhost:3030/Test/update’
  • every dataset has a default graph which you are using unless you specificly use a different one.
  • field names in SQL are replaced with placeholders starting with a question-mark
  • a triple ends with a dot, for example ‘?s ?p ?o.’. If you do not use a dot the parser will interprete whatever comes next as part of the triple specification.
  • explicit URI’s are enclosed with ‘<‘ and ‘>’.
  • you can use prefixes (like XML) which you define at the beginning of the query like so ‘PREFIX dbpedia-owl: http://dbpedia.org/ontology/
  • a semi-colon can be used to specify multiple triples with the same subject. For example,

will put a constraint on the subject ‘s’ such that it has both a predicate ‘has_friend’ and ‘has_car’. Note the dot at the end to tell the parser that the constraint is finished.
– the “?s ?p ?o” triple stands for ‘subject predicate object’ but so does ‘?q ?x2 ?card’.

The Learning SPARQL book Bob Du Charme is an excellent overview and teaches you way more than you probably will ever need.

Below is a collection of SPARQL queries I often copy/paste. A kinda cheat-sheet, if you wish.

All graph triples

Selecting everything in the default graph is as simple as:

If you want all triple in all (named) graphs:

or a specific named graph:

Delete all

Deleting everything is similar to the selection above:

or something more specific:

or

Note that one cannot select a single node or predicate. Nodes and links are concepts from a graph database and do not exist in a triple store.

Collecting triples

A common situation is to select all triples of nodes which are themselves the result of a query/constraint. Say, you have triples of the type

but you actually want all the triples attached to ‘?something’. This is done with a subquery:

where we have used an alias to convert ‘something’ to ‘s’ though the main query could have been ‘?something ?k ?m’ as well.

Counting

Counting things requires an alias like so:

Using text

You can place various additional constrains on the placeholders, like filtering the content of the URI:

You can also use regular expression ans all that.

Get degrees

Performing typical graph-like operations and traversals is challenge with SPARQL. Some vendors (like Stardog) implement both SPARQL and Tinkerpop Gremlin in order to make such operations possible. Gremlin complements SPARQL in a way.

Still, if you only need, for example, simple degree measurements you can use:

DBPedia

Querying DBPedia can be a lot of fun but be aware that some queries will time out and that DBPedia consists of billion of triples:

or

Constrain to English only

Triple stores ans SPARQL are intrinsically multi-lingual. Literals can be adorned with languages and if you only want for example the English ones you can use a filter with a language match:

Using dates

Datetime literals can be filtered out with an appropriate schema prefix like so:

Insert data

Note that you need to execute this on the ‘update’ endpoint.

If you need to insert mutliple triples you’d use

Update a counter

There are sometimes SQL queries which are quite challenging in SPARQL. For example, updating a counter is one of them

Representing results in Python with Pandas

If you use Jupyter or nteract for inline experimentation you can use something like the following to output the result to Pandas. Once in a Pandas frame things towards machine learning are easier.

## Using SPARQL in R

You can install the SPARQL package in R via

and once present you can use something like this to return semantic data: