RDF hacking ILRT Home

Statements/Statings

A summary of the threads 'A triple is not unique' and 'Statements/Reified statements' from the RDF Interest mailing list, November 2000 [1]

Author: Libby Miller <libby.miller@bristol.ac.uk>
Date: 2000-11-27
Latest version: http://ilrt.org/discovery/2000/11/statements/

Abstract

This paper sets out to summarise two lengthy threads from the RDF Interest group mailing list in November 2000. The argument centred around some different approaches to modelling RDF statements in RDF (and the corresponding confusion as to what exactly an RDF statement amounts to). It was recognised that when statements are reified, a distinction can be made between the statement itself (represented as a subject, predicate and object) and the stating of the statement, for example who stated it and when. Difficulties arise because of a perception that the identity conditions for a statement (triple) preclude the representation of statings in the RDF model. This is also a topic on the RDF Interest group issues list [2]. Note also that Sefan Decker's summary of a previous discussion on the RDF Interest group list [3] covers similar ground.

Below I provide a brief expression of the problem and three solutions and their variants gathered from these threads in the RDF Interest group mailing list. I have not examined the entire interest group archives, so this is not intended to be a complete expression of the problem, but it is hoped that it will be useful as a summary of the recent discussions.

The discussion also ranged into implementation issues, problems of inference and complex querying, and a discussion of contexts in the sense of Graham Klyne [4]. I briefly set out some issues relating to these below.

Status of this document

This is a draft document, and a summary of the statements made on the RDF Interest list. It should not be taken as a definitive account of the views of anyone on that list. It has not been agreed on or read over by the members of that list.

Contents

  1. Some definitions
  2. An expression of the problem
  3. Some possible resolutions
    1. 'reified statements are statings'
    2. 'statements are resources'
    3. 'change the model'
  4. Implementation issues
  5. Querying, inference and contexts
  6. Conclusions

1. Some definitions

'statement'

A member of the set Statements consisting of the 3-tuple (triple) subject, predicate, object ([5] M&S section 5)

'reified statement'

An object of type rdf:Statement and also a resource representing the reification of a triple ([5] M&S section 5). A resource of type rdf:Statement must have exactly one each of rdf:subject, rdf:predicate and rdf:object properties.

'stating'

It is useful to make the distinction between a statement (a triple) and the instance of someone having stated it, generating a stating. See main discussion below.

'model'

'the rdf model' is the data model behind rdf: it is a way of representing RDF in a syntax neural way and is used for evaluating equivalence between RDF expressions.
'a model' is usually a way of grouping statements together in RDF; sometimes referring to the XML document statements came from.
In addition, people may speak of a reified statement as 'modelling' a statement. Modelling is used in this way to mean that the reified statement represents a statement - see Graham Kline's post for example.

'context' ('space')

A term currently used in the sense of Klyne's paper [4]. It is a way of grouping statements together, and is modelled as a bag of reified statements. It is also used in logic e.g. R.V. Guha's PhD thesis [6].



2. An expression of the problem

(see also: [2], [3], Dan Brickley's posts: http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0236.html
http://lists.w3.org/Archives/Public/www-rdf-interest/1999Dec/0068.html
http://lists.w3.org/Archives/Public/www-rdf-interest/1999Dec/0071.html
and contributions from many others... )

Suppose you have two refied statements (in two documents)

(example from Jonas Liljegren)


Model 1: 

  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:a="http://description.org/schema/">
    <rdf:Description>
      <rdf:subject resource="http://www.w3.org/Home/Lassila" />
      <rdf:predicate resource="http://description.org/schema/Creator" />
      <rdf:object>Ora Lassila</rdf:object>
      <rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
      <a:attributedTo>Ralph Swick</a:attributedTo>
      <a:attributedTime>1999-02-22</a:attributedTime>
    </rdf:Description>
  </rdf:RDF>

Model 2: 

  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:a="http://description.org/schema/">
    <rdf:Description>
      <rdf:subject resource="http://www.w3.org/Home/Lassila" />
      <rdf:predicate resource="http://description.org/schema/Creator" />
      <rdf:object>Ora Lassila</rdf:object>
      <rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
      <a:attributedTo>Jonas Liljegren</a:attributedTo>
      <a:attributedTime>2000-11-20</a:attributedTime>
    </rdf:Description>
  </rdf:RDF>

Now suppose that the identity conditions for a reified statement are that the properties and values rdf:subject, rdf:predicate, rdf:object are identical in each case. This would imply that the expression that

Ralph Swick stated that [http://www.w3.org/Home/Lassila http://description.org/schema/Creator Ora Lassila] on 1999-02-22

would get mixed up with the expression that

Jonas Liljegren stated that [http://www.w3.org/Home/Lassila http://description.org/schema/Creator Ora Lassila] on 2000-11-20

giving us


  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:a="http://description.org/schema/">
    <rdf:Description>
      <rdf:subject resource="http://www.w3.org/Home/Lassila" />
      <rdf:predicate resource="http://description.org/schema/Creator" />
      <rdf:object>Ora Lassila</rdf:object>
      <rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
      <a:attributedTo>Jonas Liljegren</a:attributedTo>
      <a:attributedTo>Ralph Swick</a:attributedTo>
      <a:attributedTime>1999-02-22</a:attributedTime>
      <a:attributedTime>2000-11-20</a:attributedTime>
    </rdf:Description>
  </rdf:RDF>

So this means that we are losing the information that there was one stating by Ralph Swick on on 1999-02-22 and a different stating by Jonas Liljegren on 2000-11-20. We can think of two separate parts to the statement - the abstract statement

[http://www.w3.org/Home/Lassila http://description.org/schema/Creator Ora Lassila]

and the statings of it by Ralph Swick and Jonas Liljegren. In the example above, the abstract statement and the actual statings are getting mixed up.

Why would we assume that the identity conditions for a reified statement were that the properties and values rdf:subject, rdf:predicate, rdf:object were identical?

One answer is the following:

Because there currently is no way to mandate the conditions under which anonymous nodes are identical (although schemas could be written which do this - see ?? Shyam Sarkar ) this means that the anonymous node (the reified statement) must be given a URI. This means that there is a problem with who defines what the URI of a unique statement is (see for example post by Dan Brickley). A possible solution was suggested by Sergey Melnik.
In turn, this difficulty could be resolved if it were possible to state that two URIs were equivalent, which would mean taking a view on whether a URI represents a reource or an entity (see for example Brian McBride's post).

Another answer is about implementation:

If we are attempting to optimise storage of RDF, and we are ignoring the model that the data came from, we could store the first reified statement as a quadruple: subject, predicate, object, generated identifier. Then when we get to the second reified statement, we see that we already have this statement triple in the database. If we do not generate another identifier for it but instead reuse the existing identifier and triple to hang other properties off, then we will lose some information (see, for example Jonas Liljegren's post ). Note that if we also store a model identifier for each statement (for example where the data came from), this problem could still occur, because within a given model it would be possible for two different people to state the same thing.

Again, this is a question about the underlying model, brought to focus by the ineffiency of storing reified statements as triples. The distinction, as before, is between regarding the subject, predicate and object of the reified statement as refering to some abstract statement, which is unique within the set of statements, and regarding the reified statement as refering to a stating, unique of itself.



3. Some possible resolutions

I describe three main solutions, derived from disussions on the RDF Interest list.

1. 'reified statements are statings'

Jonas Liljegren, Seth Russell(?), Graham Klyne, Pierre-Antoine Champin, Jonathan Borden

On this view, although a reified statement represents a statement, it is only one possible representation of it. There is therefore not necessarily a one-to-one correspondence between a statement and its reification (Graham Klyne).
Statements may or may not be unique (there seems to be a preference for uniqueness, in some sense, maybe within a context (space) or model). Introduction of spaces: Jonathan Borden.

This means that reified statements should be regarded as unique of themselves and as statings. Each stating is unique. If someone else makes a reified statement with the same subject, predicate and object properties, then we cannot regard that stating as being the same as the first. The loss of information that occurs in Liljegren's example would not occur.

However, when a reified statement is given a URI via the ID attribute then this implies that any reified statement with that URI is referring to the same stating.

problems


variations

note that if one rejects the one-to-one correspondence between statements and their reifications, these variations are interesting but not relevant to the question in hand.

2. 'statements are resources'

Sergey Melnik

Make statements (triples) resources, and make statements unique within the set of statements, defined by their subject, predicate and object and values. Generate a unique ID using a Skolem function for each triple, and hang contextual and reification information off this triple, external to the model ( Sergey Melnik).

A model implementation of this might be to allow arcs to terminate on arcs ( Seth Russell ).

The reification mechanism is syntactic. The information is preserved. The identifier for the triple is the same whoever calculates it, so that aggregation can occur.

problems:

3. 'change the model'

Brian McBride

The model is incorrect because it does not distinguish between statements and statings. The statement is abstract ( Brian McBride) and statements are uniquely defined within the set of statements by their subject, predicate and object values ( Brian McBride).

What is really required is something like



  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:a="http://description.org/schema/">
    <rdf:Description ID="S1">
      <rdf:subject resource="http://www.w3.org/Home/Lassila" />
      <rdf:predicate resource="http://description.org/schema/Creator" />
      <rdf:object>Ora Lassila</rdf:object>
      <rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
    </rdf:Description>
    <rdf:Description>
      <a:states resource="#S1">
      <a:attributedTo>Ralph Swick</a:attributedTo>
      <a:attributedTime>1999-02-22</a:attributedTime>
    </rdf:Description>
  </rdf:RDF>

(example from Jonas Liljegren from Brian McBride's suggestion.

Giving the reified statement a URI using ID causes a great deal of controversy, because of the question of who first names the statement. But maybe one could mandate in a schema that subject, predicate and object properties uniquely define an anonymous node of type rdf:Statement, and then hang stating events off it (this would require a schema with cardinality constraints):


 <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:a="http://description.org/schema/">
    <rdf:Description>
      <rdf:subject resource="http://www.w3.org/Home/Lassila" />
      <rdf:predicate resource="http://description.org/schema/Creator" />
      <rdf:object>Ora Lassila</rdf:object>
      <rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
        <a:stating>
                <a:StatingEvent>
                      <a:attributedTo>Ralph Swick</a:attributedTo>
		      <a:attributedTime>1999-02-22</a:attributedTime>
                </a:StatingEvent>
        </a:stating>
    </rdf:Description>
  </rdf:RDF>

problem:



4. Implementation issues

There was a discussion about how to implement reification and contextualization. Jonas Liljegren uses a model identifier to represent whether (or by whom) the statement was reified.

He has a practical solution to the problem of handling the various cases of: different URIs for the same statement, same URI for different statements and the same URI for the same statement: creating a unique key from the model and the URI of the statement. He argued that the resulting statement does have a representation in the model, as a reified statement in a model container.

see
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0340.html
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0354.html

Graham Klyne talked about a similar implementation, but using "properties to create the association between statement-resource and context (model)"

again, hanging them off the triple ID, whether generated or not.

Klyne uses the triple ID and the context as a unique key. This is also the implementation method favoured by Dave Beckett.

see
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0343.html
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0368.html

This discussion continues....



5. Querying, inference and contexts

Problems relating to querying, inference and contexts were raised by various people, including Seth Russell, Jos de Roo, Sergey Melnik, Shyam Sankar, Gabe Beged-Dov.

One difficulty is that querying becomes very cumbersome when reification is used as in the RDF model (e.g. Seth Russell)

e.g. here's a query from a database that accepts as 'facts' triples describing the friends of the person with mailbox libby.miller@bristol.ac.uk are:


select ??a, ?b, ?c where 
(mbox ?a libby.miller@bristol.ac.uk)  
(friend ?a ?b) 
(mbox ?b ?c) 


compare with a case where the friends of the person with mailbox libby.miller@bristol.ac.uk are described by reification:


select ?a, ?c, ?d where 
(mbox ?a libby.miller@bristol.ac.uk)  
(?st ?a ?b) 
(rdf:type ?b rdf:Statement) 
(rdf:subject ?b ?a) 
(rdf:object ?b ?c) 
(mbox ?c ?d)   

At least sometimes we really want to be able to query as if statements were not reified, while retaining contextual information. This is where Sergey's proposal would be very useful.

Another difficulty is with inferencing: we would like to be able to reason within a context or model, so that we can regard the statements from within the model as 'facts'. This is possible, although cumbersome if we regard contexts as bags, and model them within the RDF model. Again, if we would hang contextual information off statements in a way that is hidden from the RDF model, this would be useful here.

These seem to generally be seen as annoying implementation problems.



Conclusions

Conclusions are for the RDF Interest group to decide; however I have a couple of points to make as someone who has attempted to implement RDF storage.

The first is that we need to remember that we can distinguish between the model a serialization of a model, the triples produced by a parser from a serialization, and the storage of triples.
Personally I have tended to store triples in a very naive way, simply as the triples that I get from the parser. I have come to the conclusion that for my purposes, which include storing data from diverse sources and querying it, that I need to think about storing meta information about reification and contexts and models in a non-naive fashion, so that I can query the triples as if they were statements, while still retaining the contextual information about reification and so on, perhaps using something like Sergey's suggestion.

However, this means that I need to have examples of good practice (or preferably a decision made) about the uniqueness conditions for statements and reifications, and about how to serialize contextual information.



References

[1] RDF Interest group mailing list, November 2000
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/

[2] RDF Interest Group - Issue Tracking
http://www.w3.org/2000/03/rdf-tracking/#rdfms-identity-of-statements

[3] Proposed Updates of RDF
http://www-db.stanford.edu/~stefan/updates.html

[4] Contexts for RDF Information Modelling
http://public.research.mimesweeper.com/RDF/RDFContexts.html

[5] Resource Description Framework (RDF) Model and Syntax Specification
http://www.w3.org/TR/REC-rdf-syntax/

[6] Contexts: A Formalization and Some Applications
http://www.guha.com/guha-thesis.ps