Linked data

An overview of how dotnetRDF works in-memory and with a semantic store.

This formatter is used below to display things in a more readable fashion.

In-memory graphs

Creating graphs and triples is as easy as you’d think it is:

There are various formats around to deal with graphs and triples. The package allows to use pretty much all of them.

For example, the N-triples format is basically a CSV format where each line is subject-predicate-object triple:

If you open the file you will see something like the following:

<http://www.pandoraintelligence.com/person/Swa> <http://www.pandoraintelligence.com/relation/works_with> <http://www.pandoraintelligence.com/person/Peter> . <http://www.pandoraintelligence.com/person/Swa> <http://www.pandoraintelligence.com/relation/works_with> <http://www.pandoraintelligence.com/person/Dave> .

The Turtle format (aka* terse RDF triple*) is a superset of NTriples and would give something like:

person:Swa relation:works_with person:Dave, person:Peter.

The RDF format is the most verbose one and is pure XML.

The IGraph thing has of course the features you would expect.

To get all the nodes:

To get the links emanating from the Swa-node:

One can use SPARQL on an in-memory graph:

The dotnetRDF package also comes with a set of SPARQL extensions, for example calculating the hash of a star-graph:

which is quite a strong feature. This can be compared, to some extend, to a LINQ query against an SQL result set.

Using Wikidata

This section shows how one typically uses linked data via LINQ and SPARQL. The whole API is really easy and the only thing that you need to inprint in your mind is that a triple is organized as subject-predicate-object.

⚠ Triple \<=> [Subject, Predicate, Object]

So, when using the API to filter out some nodes you first filter the triples and then use the triple SPO structure to fetch one of the nodes or the link.

There is complete data dump in RDF format but to experiment here we’ll use this sample (in N-triples format) which consists of around 1000 triples.

Looking at the entities you can see that the information is not contained in the entity itself but everything is linked. The neighborhood of a node is its identity.

Most of the nodes are informational nodes attached to an entity. For example, the Q1 node has 297 related nodes.

In these 297 nodes there is a lot of redundancy in the shape of localization. There are indeed only three different links:

and if we look at the description for example we get the description of the entity in 61 languages:

How can one link the Wikidata with our own data? Here things different with the backend story. If you have a backend store you can simply save a triple linking two nodes. The in-memory requires to merge the two graphs (i.e. g and wikiGraph). It’s possible to add a triple to either graphs but this will not create a link between the two graphs since they exist completely separate in memory. The optional boolean parameter when merging allows you to keep the two separate namespaces or to unify it.

Note that the dotnetRDF documentation is overall very good.

The following merges the wiki graph with our own little graph, resulting in having (2 + 1000) triples:

Now you can link things together. Here the node ‘Peter’ is linked to the Wikidata entity ‘universe’ with label ‘belongs_to’:

You can see that this info is now present by looking at the nodes linked to Peter:

If you wonder why the triple between Swa and Peter is not there: it’s because that triple looks like [Swa, works_with, Peter] and is hence not part of the triples with subject Peter.

Finally, if you have a backend you can simply save this graph and the novel information will be persisted. Everything remains unique up to URI.

With a backend

The rest of this document shows how to deal with graphs and data when there is linked data server. All of the servers agree on the various formats (Turtle, RDF…) in use and they also all agree on the SPARQL query language. So, all of what can be done with dotnetRDF actually works with all of the servers (Jena, BrightstarDB, Virtuoso….).

⚠ You need a Stardog server for the rest of this document.

If you would use AllegroGraph you would use similarly

var store = new AllegroConnector("http://localhost:10035", "simple", "admin", "admin");

The createLittleGraph method creates a simple graph. You can save it to the backend simply like this:

Fetching a graph based on its URI is just as simple:

Fetching a graph in this way is called a named graph and you can also do it via SPARQL:

To fetch all subject-object couples you can use for example:

If we now repeat the process explained in the Wikidata section and save the result into the backend:

Note that you can also explore things in the Stardog management:

![IMAGE](quiver-image-url/8FFCC87DE8471698C600A14757F2FF2A.jpg =828×776)

Now we’ll show that one can create information in the store by saving triples.\
Assuming that you know the URI of Peter and the universe node (e.g. via a SPARQL query) you can proceed like so:

Save this to the backend and ensure yourself that this information is not (uniquely) there. That is, you can create a graph in memory and be sure that the nodes in the backend are used correctly based on their URI. A different URI is a different node.

Inference in-memory

Inference can occur in-memory or in the backend. In both cases the inference happens by means of rules defined through some RDF (or any related format). A simple example is the hierarchy \
\
Vehicle – Car – SportsCar

which can be decribed in turtle format like this:

:Vehicle a rdfs:Class . \
:Car rdfs:subClassOf :Vehicle . \
:SportsCar rdfs:subClassOf :Car .

Below you can see how without the additional schema info the SportsCar fails to be recognized as a Car (through simple transitive inference):

Inference in the backend

Inference can also happen in the backend on the fly. That is, the inferred insights can be fetched on/off when sending a SPARQL query.

To show it in action we’ll use a simple gender inference based on a Facebook profile node with gender info.

![IMAGE](quiver-image-url/EB7ED51664BB77208621EDA3788EEA5D.jpg =582×433)

In the ontology we have the class ‘men’ being a sub-class of ‘person’ but we donnot assign this class to the individual ‘Swa’. The inference is however such that if any person has a Facebook profile and this profile has gender info ‘male’ the person will be inferred to be in the class ‘men’.

![IMAGE](quiver-image-url/253FF154BEBC834A1007B79C8B03E616.jpg =349×231)

The way this is done in RDF/Turtle is not very difficult and is easily done in the Protégé UI:

![IMAGE](quiver-image-url/1686B2EAFA9F1B2314AF450961F7D740.jpg =509×233)

The ontology is in a file called ‘Inference.owl’:

Below we ask via SPARQL the things related to the node ‘Swa’. The inference is switched on/off with a simple boolean parameter when the query is sent:

Custom inference

In the backend there are various inference engines. This is the case for all vendors and every inference engine has its strengths. It all depends what you look for.

From the point of view of dotnetRDF there are various engines as well, all implementing the IInferenceEngine interface. This interface is really not magical: you get a graph and you can do whatever you like with it.

In the implementation below, the Swa-node is looked for and then adorned with a tag if found:

To use this engine, fetch the named graph we created before and apply the engine:

Now, it’s clear that one does not really need the interface. You can simply fetch a graph, manipulate it and then save it again as well.