Author: Libby Miller <libby.miller@bristol.ac.uk>
Date: 2000-11-27
Latest version: http://ilrt.org/discovery/2000/11/statements/
This paper sets out to summarise two lengthy threads from the RDF Interest group mailing list in November 2000. The argument centred around some different approaches to modelling RDF statements in RDF (and the corresponding confusion as to what exactly an RDF statement amounts to). It was recognised that when statements are reified, a distinction can be made between the statement itself (represented as a subject, predicate and object) and the stating of the statement, for example who stated it and when. Difficulties arise because of a perception that the identity conditions for a statement (triple) preclude the representation of statings in the RDF model. This is also a topic on the RDF Interest group issues list [2]. Note also that Sefan Decker's summary of a previous discussion on the RDF Interest group list [3] covers similar ground.
Below I provide a brief expression of the problem and three solutions and their variants gathered from these threads in the RDF Interest group mailing list. I have not examined the entire interest group archives, so this is not intended to be a complete expression of the problem, but it is hoped that it will be useful as a summary of the recent discussions.
The discussion also ranged into implementation issues, problems of inference and complex querying, and a discussion of contexts in the sense of Graham Klyne [4]. I briefly set out some issues relating to these below.
This is a draft document, and a summary of the statements made on the RDF Interest list. It should not be taken as a definitive account of the views of anyone on that list. It has not been agreed on or read over by the members of that list.
A member of the set Statements consisting of the 3-tuple (triple) subject, predicate, object ([5] M&S section 5)
An object of type rdf:Statement and also a resource representing the reification of a triple ([5] M&S section 5). A resource of type rdf:Statement must have exactly one each of rdf:subject, rdf:predicate and rdf:object properties.
It is useful to make the distinction between a statement (a triple) and the instance of someone having stated it, generating a stating. See main discussion below.
'the rdf model' is the data model behind rdf: it is a way of representing
RDF in a syntax neural way and is used for evaluating equivalence between
RDF expressions.
'a model' is usually a way of grouping statements together in
RDF; sometimes referring to the XML document statements came from.
In addition, people may speak of a reified statement as 'modelling' a
statement. Modelling is used in this way to mean that the reified statement
represents a statement - see
Graham Kline's post for example.
A term currently used in the sense of Klyne's paper [4]. It is a way of grouping statements together, and is modelled as a bag of reified statements. It is also used in logic e.g. R.V. Guha's PhD thesis [6].
Suppose you have two refied statements (in two documents)
(example from Jonas Liljegren)
Model 1:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:a="http://description.org/schema/">
<rdf:Description>
<rdf:subject resource="http://www.w3.org/Home/Lassila" />
<rdf:predicate resource="http://description.org/schema/Creator" />
<rdf:object>Ora Lassila</rdf:object>
<rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
<a:attributedTo>Ralph Swick</a:attributedTo>
<a:attributedTime>1999-02-22</a:attributedTime>
</rdf:Description>
</rdf:RDF>
Model 2:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:a="http://description.org/schema/">
<rdf:Description>
<rdf:subject resource="http://www.w3.org/Home/Lassila" />
<rdf:predicate resource="http://description.org/schema/Creator" />
<rdf:object>Ora Lassila</rdf:object>
<rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
<a:attributedTo>Jonas Liljegren</a:attributedTo>
<a:attributedTime>2000-11-20</a:attributedTime>
</rdf:Description>
</rdf:RDF>
|
Now suppose that the identity conditions for a reified statement are that the properties and values rdf:subject, rdf:predicate, rdf:object are identical in each case. This would imply that the expression that
Ralph Swick stated that [http://www.w3.org/Home/Lassila http://description.org/schema/Creator Ora Lassila] on 1999-02-22
would get mixed up with the expression that
Jonas Liljegren stated that [http://www.w3.org/Home/Lassila http://description.org/schema/Creator Ora Lassila] on 2000-11-20
giving us
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:a="http://description.org/schema/">
<rdf:Description>
<rdf:subject resource="http://www.w3.org/Home/Lassila" />
<rdf:predicate resource="http://description.org/schema/Creator" />
<rdf:object>Ora Lassila</rdf:object>
<rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
<a:attributedTo>Jonas Liljegren</a:attributedTo>
<a:attributedTo>Ralph Swick</a:attributedTo>
<a:attributedTime>1999-02-22</a:attributedTime>
<a:attributedTime>2000-11-20</a:attributedTime>
</rdf:Description>
</rdf:RDF>
|
So this means that we are losing the information that there was one stating by
Ralph Swick on on 1999-02-22 and a different stating by Jonas Liljegren on
2000-11-20. We can think of two separate parts to the statement - the abstract
statement
[http://www.w3.org/Home/Lassila http://description.org/schema/Creator Ora Lassila]
and the statings of it by Ralph Swick and Jonas Liljegren. In the example
above, the abstract statement and the actual statings are getting mixed up.
Why would we assume that the identity conditions for a reified statement were that the properties and values rdf:subject, rdf:predicate, rdf:object were identical?
One answer is the following:
Because there currently is no way to mandate the conditions under which anonymous
nodes are identical (although schemas could be written which do this - see
?? Shyam
Sarkar
) this means
that the anonymous node (the reified statement) must be given a URI. This means
that there is a problem with who defines what the URI of a unique statement is
(see for example post by
Dan Brickley).
A possible solution was suggested by
Sergey Melnik.
In turn, this difficulty could be resolved if it were possible to state that two
URIs were equivalent, which would mean taking a view on whether a URI represents a
reource or an entity (see for example
Brian McBride's post).
Another answer is about implementation:
If we are attempting to optimise storage of RDF, and we are ignoring the model
that the
data came from, we could store the first reified
statement as a quadruple: subject, predicate, object, generated identifier. Then
when we get to the second reified statement, we see that we already have this
statement triple in the database. If we do not generate another identifier for it
but instead reuse the existing identifier and triple to hang other properties off,
then we will lose some information
(see, for example
Jonas Liljegren's post
).
Note that if we also store a model identifier
for each statement (for example where the data came from), this problem could
still occur, because within a given model it would be possible for two different
people to state the same thing.
Again, this is a question about the underlying model, brought to focus by the
ineffiency of storing reified statements as triples. The distinction, as before,
is between regarding the subject, predicate and object of the reified statement as
refering to some abstract statement, which is unique within the set of
statements, and regarding the reified statement as refering to a stating, unique
of itself.
I describe three main solutions, derived from disussions on the RDF Interest list.
On this view, although a reified statement represents a statement, it is only
one possible representation of it. There is therefore not necessarily a one-to-one
correspondence between a statement and its reification
(Graham
Klyne).
Statements may or may not be unique (there seems to be a preference for
uniqueness, in some sense, maybe within a context (space) or model).
Introduction of spaces: Jonathan
Borden.
This means that reified statements should be regarded as unique of themselves and as statings. Each stating is unique. If someone else makes a reified statement with the same subject, predicate and object properties, then we cannot regard that stating as being the same as the first. The loss of information that occurs in Liljegren's example would not occur.
However, when a reified statement is given a URI via the ID attribute then this implies that any reified statement with that URI is referring to the same stating.
problems
note that if one rejects the one-to-one correspondence between statements and their reifications, these variations are interesting but not relevant to the question in hand.
Make statements (triples) resources, and make statements unique within the set of statements, defined by their subject, predicate and object and values. Generate a unique ID using a Skolem function for each triple, and hang contextual and reification information off this triple, external to the model ( Sergey Melnik).
A model implementation of this might be to allow arcs to terminate on arcs ( Seth Russell ).
The reification mechanism is syntactic. The information is preserved. The identifier for the triple is the same whoever calculates it, so that aggregation can occur.
problems:
The model is incorrect because it does not distinguish between statements and
statings. The statement is abstract (
Brian McBride)
and statements are uniquely defined within the set of statements by their subject,
predicate and object values (
Brian McBride).
What is really required is something like
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:a="http://description.org/schema/">
<rdf:Description ID="S1">
<rdf:subject resource="http://www.w3.org/Home/Lassila" />
<rdf:predicate resource="http://description.org/schema/Creator" />
<rdf:object>Ora Lassila</rdf:object>
<rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
</rdf:Description>
<rdf:Description>
<a:states resource="#S1">
<a:attributedTo>Ralph Swick</a:attributedTo>
<a:attributedTime>1999-02-22</a:attributedTime>
</rdf:Description>
</rdf:RDF>
|
(example from Jonas Liljegren from Brian McBride's suggestion.
Giving the reified statement a URI using ID causes a great deal of controversy, because of the question of who first names the statement. But maybe one could mandate in a schema that subject, predicate and object properties uniquely define an anonymous node of type rdf:Statement, and then hang stating events off it (this would require a schema with cardinality constraints):
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:a="http://description.org/schema/">
<rdf:Description>
<rdf:subject resource="http://www.w3.org/Home/Lassila" />
<rdf:predicate resource="http://description.org/schema/Creator" />
<rdf:object>Ora Lassila</rdf:object>
<rdf:type resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
<a:stating>
<a:StatingEvent>
<a:attributedTo>Ralph Swick</a:attributedTo>
<a:attributedTime>1999-02-22</a:attributedTime>
</a:StatingEvent>
</a:stating>
</rdf:Description>
</rdf:RDF>
|
problem:
There was a discussion about how to implement reification and contextualization. Jonas Liljegren uses a model identifier to represent whether (or by whom) the statement was reified.
He has a practical solution to the problem of handling the various cases of: different URIs for the same statement, same URI for different statements and the same URI for the same statement: creating a unique key from the model and the URI of the statement. He argued that the resulting statement does have a representation in the model, as a reified statement in a model container.
see
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0340.html
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0354.html
Graham Klyne talked about a similar
implementation, but using
"properties to
create the association between statement-resource and context (model)"
again, hanging them off the triple ID, whether generated or not.
Klyne uses the triple ID and the context as a unique key. This is also the implementation method favoured by Dave Beckett.
see
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0343.html
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0368.html
This discussion continues....
Problems relating to querying, inference and contexts were raised by various people, including Seth Russell, Jos de Roo, Sergey Melnik, Shyam Sankar, Gabe Beged-Dov.
One difficulty is that querying becomes very cumbersome when reification is used as in the RDF model (e.g. Seth Russell)
e.g. here's a query from a database that accepts as 'facts' triples describing the friends of the person with mailbox libby.miller@bristol.ac.uk are:
select ??a, ?b, ?c where (mbox ?a libby.miller@bristol.ac.uk) (friend ?a ?b) (mbox ?b ?c) |
select ?a, ?c, ?d where (mbox ?a libby.miller@bristol.ac.uk) (?st ?a ?b) (rdf:type ?b rdf:Statement) (rdf:subject ?b ?a) (rdf:object ?b ?c) (mbox ?c ?d) |
At least sometimes we really want to be able to query as if statements were not reified, while retaining contextual information. This is where Sergey's proposal would be very useful.
Another difficulty is with inferencing: we would like to be able to reason within a context or model, so that we can regard the statements from within the model as 'facts'. This is possible, although cumbersome if we regard contexts as bags, and model them within the RDF model. Again, if we would hang contextual information off statements in a way that is hidden from the RDF model, this would be useful here.
These seem to generally be seen as annoying implementation problems.
Conclusions are for the RDF Interest group to decide; however I have a couple of
points to make as someone who has attempted to implement RDF storage.
The first is that we need to remember that we can distinguish between the model
a serialization of a model, the triples produced by a parser from a serialization,
and the storage of triples.
Personally I have tended to store triples in a very naive way, simply as the
triples that I get from the parser. I have come to the conclusion that for my
purposes, which include storing data from diverse sources and querying it, that I
need to think about storing meta information about reification and contexts and
models in a non-naive fashion, so that I can query the triples as if they were
statements, while still retaining the contextual information about reification and
so on, perhaps using something like Sergey's suggestion.
However, this means that I need to have examples of good practice (or preferably a
decision made) about the uniqueness conditions for statements and reifications,
and about how to serialize contextual information.
[1] RDF Interest group mailing list, November 2000
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/
[2] RDF Interest Group - Issue Tracking
http://www.w3.org/2000/03/rdf-tracking/#rdfms-identity-of-statements
[3] Proposed Updates of RDF
http://www-db.stanford.edu/~stefan/updates.html
[4] Contexts for RDF Information Modelling
http://public.research.mimesweeper.com/RDF/RDFContexts.html
[5]
Resource Description Framework (RDF) Model and Syntax Specification
http://www.w3.org/TR/REC-rdf-syntax/
[6] Contexts: A Formalization and Some Applications
http://www.guha.com/guha-thesis.ps