An inductively created 'thesaurus' for Regard

NA Jacobs <Neil.Jacobs@bristol.ac.uk>
Libby Miller <libby.miller@bristol.ac.uk>

2001-07-19

Some rough notes...

Neil said:
The background to this is that I'm trying to think about subject access to the Regard database of social and economic research. The graphs represent different ways of expressing the extent to which significant words in a part of the database (discipline='economics') co-occur. That is, the relative frequency with which any pair of words occurs in the same field.

Ideally, the RDFed result would enable users to select a region or number of words to represent what they're interested in, and this would trigger the appropriate boolean keyword search on the database. However, I'm a bit hazy as to how this might happen at the moment. I don't really understand how simple associations could be RDFed, since I understand RDF is based on a subject-predicate-object grammar. (By the way, if that's true, it is the extent of my knowledge of rdf).

Libby suggested that the links between the words could simply be 'isRelatedTo' or something similar. The result is a very simple schema with one class (regard:Node; could be regard:Word or somesuch) and one property (regard:isRelatedTo).

Neil had created some diagrams from the statistical analysis he had done, so Libby transformed a sample .dot file to an RDF version using a Perl script. The Perl script is very simple - one thing to note is that since RDF properties are directional, the script created bi-directional links between the related words. Libby made a quick Inkling/SquishQL demo for the following scenario:

Following a user search on Regard, say for 'Inflation', Regard could do an RDF query on the word relationships in the catalogue, for example:

SELECT ?target 
FROM
http://ilrt.org/discovery/2001/07/regard/regard1.rdf 
WHERE
 (regard::isRelatedTo ?source regard::INFLATION) 
 (rdfs::value ?source ?target)  
USING
 rdfs FOR http://www.w3.org/2000/01/rdf-schema#
 regard for http://ilrt.org/discovery/2001/07/regard#  

We would return the results of the search but allow the user to filter them using the results of the RDF search of the 'inductive thesaurus', e.g.

Did you mean

wage and inflation?
monetary and inflation?

And then allow filtering of the results with respect to these keywords

NJ: Other options and speculations by Neil included linking the co-word graphs (as representations of Regard's content) with similar maps of the search strings entered (as representations of users' needs). Not sure how this could be done and it remains very much speculation. More realistically perhaps, could the co-word graphs form a basis for some kind of inductively-derived graphical / spatial subject index to Regard? I guess this could be merely an image linked to various boolean searches, but is there a more elegant option?

References

Demo
Regard upper level word-relationships in RDF
schema.rdf
original .dot file
Perl conversion script .dot -> RDF
new .dot file (generated by RDFViz).