Author: Dan Brickley
This version: 2001-01-25
Latest version: http://ilrt.org/discovery/2000/09/metamesh/
This is a quick demonstration of the use of W3C's Resource Description Framework RDF) for the representation of forward-knowledge gathering patterns in distributed search system such as ROADS. The intention is to provide a brief discussion document to stimulate discussion about the connections between ROADS and more recent developments under the 'peer to peer' (P2P) banner.
This is a very drafty draft. feedback welcomed.
Our concern is with characterising the patterns of information-sharing amongst metadata search services.
Distributed search applications such as ROADS are designed to operate in an environment in which networked search services can exchange 'forward knowledge'(see [ROADS-FK] for more detailed overview) about the contents of other services in the Web. In ROADS version 2.0, database installations learned about remote search services through hand-coded configuration files. This document outlines a technique that could contribute to a more flexible approach, through the use of an RDF representation of the connectivity amongst a mesh or 'web' of cooperating (and perhaps competing) search services. By representing the foward-knowledge structures in such a Web using XML/RDF, we can reason more effectively about the most appropriate query routing strategy for some particular search, since data about the inter-server relationships can be exchanged in a common format.
ROADS makes use of the IETF CIP (common indexing protocol) data format for inter-service knowledge exchange. The CIP data format, a generalisation of WHOIS++ centroids mechanism, provides a rather low-level characterisation of the textual contents of some database. In effect, CIP/centroids allow a database to expose a list of the unique tokens appearing in each named field of each named record type of some database. The intention behind CIP was to generalise from the WHOIS++ model by abstracting away from the details of the particular kind of search service being consulted. This was only partially achieved, since the default CIP index type provides no namespace mechanism to distinguish a database field called 'title' (as in 'the title of a book') from 'title' (as in 'mr, mrs, ms').
While a case could be made for the use an XML/RDF index type for this kind of forward knowledge representation, the current discussion does not address such specifics. Rather, we are concerned with the exploring an XML/RDF representation of knowledge-sharing patterns amongst search services, rather than with the fine-grained detail of the information being shared. Similarly, we are not concerned here with establishing a practise for database characterisation ("collection level descriptions", service description etc), but with one specific aspect of that problem: representing the connections between such services.
The original application which motivated this note was the visualisation of query-routing meshes, particularly amongst Internet cataloguing services such as those using the ROADS system. The use of RDF is particularly appropriate for this task since the RDF information model is based around the notion of a Web of typed-relationships amongst uniquely identified resources. Our application here is focused on the characterisation of the types of relationship that hold between uniquely identified search services. We also expect this data to be intermingled with meta-information from various other sources (eg. collection description registries, rating services etc) to more effectively characterise the various search services available. RDF again is a good fit here, since RDF data graphs can be merged through a simple algorithm. It is behind the scope of this brief note to provide an introduction to RDF; see [Bray2001] or [W3CRDF] for introductory material.
For the sake of this discussion, we describe a hypothetical RDF vocabulary the defines some basic concepts: a class of things 'mesh:Server' and a relationship 'mesh:collectsFrom' that represents a forward-knowledge sharing arrangement.
We both acknowledge and side-step the problem of uniquely identifying arbitrary machine interfaced "search services" through identifying services via the Web home page of the (socially conceived) service that they represent. Thus we do not in this representation distinguish between the Z39.50 and WHOIS++ servers run by the Social Science Information Gateway (SOSIG); either/both are covered by an RDF statement that talks in terms of SOSIG as an abstraction, but picked out unambiguously through use of the SOSIG home page URL.
A simple example in RDF: Dutchess, Biz/ed, SOSIG and WoPEC.
This description represents a real-world application of ROADS distributed search tools built by the author in 1998. The configuration of the servers has by now changed; it is the purpose of this document to show how we could keep up-to-the-minute records of these configurations using XML/RDF.
<?xml version="1.0"?>
<!-- vocabulary for demo purposes only, see index.html for more. dan -->
<web:RDF xmlns:web="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:mesh="http://ilrt.org/discovery/2000/09/metamesh/schema1#"
xmlns:dc = "http://purl.org/dc/elements/1.1/">
<mesh:Server web:about="http://www.konbib.nl/dutchess/">
<dc:title>Dutchess, the Dutch National Library internet catalogue</dc:title>
<mesh:collectsFrom web:resource="http://catalogue.bized.ac.uk/"/>
</mesh:Server>
<mesh:Server web:about="http://catalogue.bized.ac.uk/" >
<dc:title>Biz/ed, ILRT's Business and Economics internet catalogue</dc:title>
<mesh:collectsFrom web:resource="http://www.sosig.ac.uk/"/>
<mesh:collectsFrom web:resource="http://netec.mcc.ac.uk/WoPEc.html"/>
</mesh:Server>
<mesh:Server web:about="http://www.sosig.ac.uk/">
<dc:title>SOSIG, the Social Science Information Gateway</dc:title>
<mesh:collectsFrom web:resource="http://catalogue.bized.ac.uk/"/>
</mesh:Server>
<mesh:Server web:about="http://netec.mcc.ac.uk/WoPEc.html">
<dc:title>WoPEc, Working papers in Economics</dc:title>
</mesh:Server>
</web:RDF>
|
This can be visualised using the Rudolf-RDFViz tool ( visualise)
GIF version (local copy, select for full image)

We can merge this basic RDF data with additional claims about
the services described. In this simple example we present a second
chunk of RDF/XML which provides hypothetical mesh:qos
('quality of service') properties for these servers, along with
contact information where available.
<?xml version="1.0"?>
<!-- vocabulary for demo purposes only, see index.html for more. dan -->
<web:RDF xmlns:web="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:mesh="http://ilrt.org/discovery/2000/09/metamesh/schema1#"
xmlns:dc = "http://purl.org/dc/elements/1.1/">
<mesh:Server web:about="http://www.konbib.nl/dutchess/">
<mesh:qos>10</mesh:qos>
</mesh:Server>
<mesh:Server web:about="http://catalogue.bized.ac.uk/" >
<mesh:contactInfo web:resource="mailto:bized-info@bized.ac.uk"/>
<mesh:qos>10</mesh:qos>
</mesh:Server>
<mesh:Server web:about="http://www.sosig.ac.uk/">
<mesh:contactInfo web:resource="mailto:sosig-info@sosig.ac.uk"/>
<mesh:qos>10</mesh:qos>
</mesh:Server>
</web:RDF>
|
Assume a simple RDF query system based around queries that are in effect RDF data graphs with some nodes and edges marked as unknown (ie. the variables in the query). Here we use the Squish strawman syntax.
SELECT ?x, ?y, ?collector, ?exporter FROM http://www.ilrt.bris.ac.uk/discovery/2000/09/metamesh/server-annot.rdf, http://www.ilrt.bris.ac.uk/discovery/2000/09/metamesh/example1.rdf WHERE (dc::title ?x ?collector) (dc::title ?y ?exporter) (mesh::collectsFrom ?x ?y) USING mesh for http://ilrt.org/discovery/2000/09/metamesh/schema1# dc for http://purl.org/dc/elements/1.1/ |
The results from such a query are essentially tabular (though can also be considered as an RDF data graph, ie. the sub-graph of the original data that was implicated in answering the query.
Our example query here returns a table:
(note: column ordering is wrong; @@todo)
| ?exporter | ?y | ?x | ?collector |
| SOSIG, the Social Science Information Gateway | http://www.sosig.ac.uk/ | http://catalogue.bized.ac.uk/ | Biz/ed, ILRT's Business and Economics internet catalogue |
| WoPEc, Working papers in Economics | http://netec.mcc.ac.uk/WoPEc.html | http://catalogue.bized.ac.uk/ | Biz/ed, ILRT's Business and Economics internet catalogue |
| Biz/ed, ILRT's Business and Economics internet catalogue | http://catalogue.bized.ac.uk/ | http://www.sosig.ac.uk/ | SOSIG, the Social Science Information Gateway |
| Biz/ed, ILRT's Business and Economics internet catalogue | http://catalogue.bized.ac.uk/ | http://www.konbib.nl/dutchess/ | Dutchess, the Dutch National Library internet catalogue |
Now we show how this basic information about the topology of the search mesh can be augmented by other metadata sources. Our example queries use two RDF data files, which are merged to facilitate queries which couldn't be answered by a single data source alone. In this simple case we ask for the Dublin Core title and the (fictitious!) 'quality of service' properties for each service.
SELECT ?x, ?title, ?qos
FROM
http://www.ilrt.bris.ac.uk/discovery/2000/09/metamesh/server-annot.rdf,
http://www.ilrt.bris.ac.uk/discovery/2000/09/metamesh/example1.rdf
WHERE
(mesh::qos ?x ?qos)
(dc::title ?x ?title)
USING mesh for http://ilrt.org/discovery/2000/09/metamesh/schema1#
dc for http://purl.org/dc/elements/1.1/
|
The result table for this query is...
| ?qos | ?title | ?x |
| 10 | SOSIG, the Social Science Information Gateway | http://www.sosig.ac.uk/ |
| 10 | Dutchess, the Dutch National Library internet catalogue | http://www.konbib.nl/dutchess/ |
| 10 | Biz/ed, ILRT's Business and Economics internet catalogue | http://catalogue.bized.ac.uk/ |
...@todo: show simple RDF queries against this and a larger example. How does the scale of ROADS-like meshes compare to P2P's more fine grained search problem?
Distributed search may never work but at least we can visualise the problem...
[Bray2001] What is RDF?, Tim Bray, XML.com)
[W3CRDF] W3C RDF home page.
[GNUTP2P] Gnutella: Alive, Well, and Changing Fast, by Kelly Truelove (2001-01-25, openp2p.com).
[ROADS-FK] Cross-Searching Subject Gateways: The Query Routing and Forward Knowledge Approach, (January 1998, issn:1082-9873), John Kirriemuir, Dan Brickley,Susan Welsh, Jon Knight, Martin Hamilton