IMeshTk ILRT Home

Search Mesh Topology and Visualisation

Author: Dan Brickley

This version: 2001-01-25

Latest version: http://ilrt.org/discovery/2000/09/metamesh/

Abstract

This is a quick demonstration of the use of W3C's Resource Description Framework RDF) for the representation of forward-knowledge gathering patterns in distributed search system such as ROADS. The intention is to provide a brief discussion document to stimulate discussion about the connections between ROADS and more recent developments under the 'peer to peer' (P2P) banner.

Status

This is a very drafty draft. feedback welcomed.



Query Routing: representing forward knowledge patterns

Our concern is with characterising the patterns of information-sharing amongst metadata search services.

Introduction

Distributed search applications such as ROADS are designed to operate in an environment in which networked search services can exchange 'forward knowledge'(see [ROADS-FK] for more detailed overview) about the contents of other services in the Web. In ROADS version 2.0, database installations learned about remote search services through hand-coded configuration files. This document outlines a technique that could contribute to a more flexible approach, through the use of an RDF representation of the connectivity amongst a mesh or 'web' of cooperating (and perhaps competing) search services. By representing the foward-knowledge structures in such a Web using XML/RDF, we can reason more effectively about the most appropriate query routing strategy for some particular search, since data about the inter-server relationships can be exchanged in a common format.

ROADS and forward knowledge data formats

ROADS makes use of the IETF CIP (common indexing protocol) data format for inter-service knowledge exchange. The CIP data format, a generalisation of WHOIS++ centroids mechanism, provides a rather low-level characterisation of the textual contents of some database. In effect, CIP/centroids allow a database to expose a list of the unique tokens appearing in each named field of each named record type of some database. The intention behind CIP was to generalise from the WHOIS++ model by abstracting away from the details of the particular kind of search service being consulted. This was only partially achieved, since the default CIP index type provides no namespace mechanism to distinguish a database field called 'title' (as in 'the title of a book') from 'title' (as in 'mr, mrs, ms').

While a case could be made for the use an XML/RDF index type for this kind of forward knowledge representation, the current discussion does not address such specifics. Rather, we are concerned with the exploring an XML/RDF representation of knowledge-sharing patterns amongst search services, rather than with the fine-grained detail of the information being shared. Similarly, we are not concerned here with establishing a practise for database characterisation ("collection level descriptions", service description etc), but with one specific aspect of that problem: representing the connections between such services.

Using RDF

The original application which motivated this note was the visualisation of query-routing meshes, particularly amongst Internet cataloguing services such as those using the ROADS system. The use of RDF is particularly appropriate for this task since the RDF information model is based around the notion of a Web of typed-relationships amongst uniquely identified resources. Our application here is focused on the characterisation of the types of relationship that hold between uniquely identified search services. We also expect this data to be intermingled with meta-information from various other sources (eg. collection description registries, rating services etc) to more effectively characterise the various search services available. RDF again is a good fit here, since RDF data graphs can be merged through a simple algorithm. It is behind the scope of this brief note to provide an introduction to RDF; see [Bray2001] or [W3CRDF] for introductory material.

Representing FK Topology in RDF

For the sake of this discussion, we describe a hypothetical RDF vocabulary the defines some basic concepts: a class of things 'mesh:Server' and a relationship 'mesh:collectsFrom' that represents a forward-knowledge sharing arrangement.

We both acknowledge and side-step the problem of uniquely identifying arbitrary machine interfaced "search services" through identifying services via the Web home page of the (socially conceived) service that they represent. Thus we do not in this representation distinguish between the Z39.50 and WHOIS++ servers run by the Social Science Information Gateway (SOSIG); either/both are covered by an RDF statement that talks in terms of SOSIG as an abstraction, but picked out unambiguously through use of the SOSIG home page URL.

Examples

A simple example in RDF: Dutchess, Biz/ed, SOSIG and WoPEC.

This description represents a real-world application of ROADS distributed search tools built by the author in 1998. The configuration of the servers has by now changed; it is the purpose of this document to show how we could keep up-to-the-minute records of these configurations using XML/RDF.

(example1.rdf)

<?xml version="1.0"?>

<!-- vocabulary for demo purposes only, see index.html for more. dan -->

<web:RDF xmlns:web="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:mesh="http://ilrt.org/discovery/2000/09/metamesh/schema1#"
         xmlns:dc = "http://purl.org/dc/elements/1.1/">

<mesh:Server web:about="http://www.konbib.nl/dutchess/">
<dc:title>Dutchess, the Dutch National Library internet catalogue</dc:title>
<mesh:collectsFrom web:resource="http://catalogue.bized.ac.uk/"/>
</mesh:Server>

<mesh:Server web:about="http://catalogue.bized.ac.uk/" >
<dc:title>Biz/ed, ILRT's Business and Economics internet catalogue</dc:title>
<mesh:collectsFrom web:resource="http://www.sosig.ac.uk/"/>
<mesh:collectsFrom web:resource="http://netec.mcc.ac.uk/WoPEc.html"/>
</mesh:Server>

<mesh:Server web:about="http://www.sosig.ac.uk/">
<dc:title>SOSIG, the Social Science Information Gateway</dc:title>
<mesh:collectsFrom web:resource="http://catalogue.bized.ac.uk/"/>
</mesh:Server>

<mesh:Server web:about="http://netec.mcc.ac.uk/WoPEc.html">
<dc:title>WoPEc, Working papers in Economics</dc:title>
</mesh:Server>

</web:RDF>

This can be visualised using the Rudolf-RDFViz tool ( visualise)

GIF version (local copy, select for full image)

example visualisation

RDF Aggregation: adding quality of service data

We can merge this basic RDF data with additional claims about the services described. In this simple example we present a second chunk of RDF/XML which provides hypothetical mesh:qos ('quality of service') properties for these servers, along with contact information where available.

(server-annot.rdf)

<?xml version="1.0"?>

<!-- vocabulary for demo purposes only, see index.html for more. dan -->

<web:RDF xmlns:web="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:mesh="http://ilrt.org/discovery/2000/09/metamesh/schema1#"
         xmlns:dc = "http://purl.org/dc/elements/1.1/">

<mesh:Server web:about="http://www.konbib.nl/dutchess/">
<mesh:qos>10</mesh:qos> 
</mesh:Server>

<mesh:Server web:about="http://catalogue.bized.ac.uk/" >
<mesh:contactInfo web:resource="mailto:bized-info@bized.ac.uk"/>
<mesh:qos>10</mesh:qos> 
</mesh:Server>

<mesh:Server web:about="http://www.sosig.ac.uk/">
<mesh:contactInfo web:resource="mailto:sosig-info@sosig.ac.uk"/>
<mesh:qos>10</mesh:qos> 
</mesh:Server>

</web:RDF>

RDF Query examples

Assume a simple RDF query system based around queries that are in effect RDF data graphs with some nodes and edges marked as unknown (ie. the variables in the query). Here we use the Squish strawman syntax.

SELECT ?x, ?y, ?collector, ?exporter
FROM
  http://www.ilrt.bris.ac.uk/discovery/2000/09/metamesh/server-annot.rdf,
  http://www.ilrt.bris.ac.uk/discovery/2000/09/metamesh/example1.rdf
WHERE 
   (dc::title ?x ?collector)
   (dc::title ?y ?exporter)
   (mesh::collectsFrom ?x ?y) 
USING mesh for http://ilrt.org/discovery/2000/09/metamesh/schema1#
  dc for http://purl.org/dc/elements/1.1/

The results from such a query are essentially tabular (though can also be considered as an RDF data graph, ie. the sub-graph of the original data that was implicated in answering the query.

Our example query here returns a table:

(note: column ordering is wrong; @@todo)

?exporter ?y ?x ?collector
SOSIG, the Social Science Information Gateway http://www.sosig.ac.uk/ http://catalogue.bized.ac.uk/ Biz/ed, ILRT's Business and Economics internet catalogue
WoPEc, Working papers in Economics http://netec.mcc.ac.uk/WoPEc.html http://catalogue.bized.ac.uk/ Biz/ed, ILRT's Business and Economics internet catalogue
Biz/ed, ILRT's Business and Economics internet catalogue http://catalogue.bized.ac.uk/ http://www.sosig.ac.uk/ SOSIG, the Social Science Information Gateway
Biz/ed, ILRT's Business and Economics internet catalogue http://catalogue.bized.ac.uk/ http://www.konbib.nl/dutchess/ Dutchess, the Dutch National Library internet catalogue

Now we show how this basic information about the topology of the search mesh can be augmented by other metadata sources. Our example queries use two RDF data files, which are merged to facilitate queries which couldn't be answered by a single data source alone. In this simple case we ask for the Dublin Core title and the (fictitious!) 'quality of service' properties for each service.

SELECT ?x, ?title, ?qos 
FROM
   http://www.ilrt.bris.ac.uk/discovery/2000/09/metamesh/server-annot.rdf,  
   http://www.ilrt.bris.ac.uk/discovery/2000/09/metamesh/example1.rdf 
WHERE
  (mesh::qos ?x  ?qos)
  (dc::title ?x ?title)
USING mesh for http://ilrt.org/discovery/2000/09/metamesh/schema1#
        dc for http://purl.org/dc/elements/1.1/

The result table for this query is...

?qos ?title ?x
10 SOSIG, the Social Science Information Gateway http://www.sosig.ac.uk/
10 Dutchess, the Dutch National Library internet catalogue http://www.konbib.nl/dutchess/
10 Biz/ed, ILRT's Business and Economics internet catalogue http://catalogue.bized.ac.uk/

Discussion

...@todo: show simple RDF queries against this and a larger example. How does the scale of ROADS-like meshes compare to P2P's more fine grained search problem?

Conclusion

Distributed search may never work but at least we can visualise the problem...

References

[Bray2001] What is RDF?, Tim Bray, XML.com)

[W3CRDF] W3C RDF home page.

[GNUTP2P] Gnutella: Alive, Well, and Changing Fast, by Kelly Truelove (2001-01-25, openp2p.com).

[ROADS-FK] Cross-Searching Subject Gateways: The Query Routing and Forward Knowledge Approach, (January 1998, issn:1082-9873), John Kirriemuir, Dan Brickley,Susan Welsh, Jon Knight, Martin Hamilton