Using RDF query to manage metadata vocabularies

libby.miller@bristol.ac.uk, danbri@w3.org

2001-06-28

Contents

  1. Metadata vocabularies
  2. Organisations which produce metadata vocabularies
  3. The problem we are trying to solve
  4. The process lifecycle
  5. Registries aren't enough
  6. Choosing metadata vocabularies
  7. Naming objects
  8. Using RDF to link objects together
  9. Databases for metadata vocabularies
  10. Sample RDF queries
  11. Issues and conclusions

This paper argues that RDF can be used as a powerful tool for organising and querying information about metadata vocabularies and the organisations that manage them.

1. Metadata Vocabularies

Metadata vocabularies describe how to use metadata for certain objects, for example documents, images, music and real world objects such as books. They describe the properties of objects, such as title, description, format, and length; and also the relationships of objects to one another: a document could have citations and references, for example.
Metadata vocabularies can be described formally or informally, in a machine or human readable way. As the web expands, metadata vocabularies have become very important. Organisations require ways of describe their documents, or ways of making existing metadata about document or other objects interoperable.

2. Organisations which produce metadata vocabularies

These can be major metadata initiatives (Dublin Core); individuals on a mailing list (RSS); or companies (CDDB). There are very many initiatives, large and small. Keeping up with even large initiatives is difficult.

3. The problem we are trying to solve

1. There are many metadata vocabularies, often for similar sorts of objects. There are a plethora of organisations producing their own vocabularies. For the end-user, choosing between these vocabularies is extremely difficult.

2. Metadata vocabularies are subject to change and revision over time, as the initiating group or organisation consults and implements, and as circumstances change. Even for experts in the metadata area, keeping up with these changes is time-consuming. For the organisations themselves, managing the process of change is difficult.

The second difficulty is our primary focus in this paper, although the suggested solution may also provides clues for the user about which metadata vocabulary is the best for their purpose. However our main purpose is to suggest that the organisations that create metadata vocabularies - whatever the size or composition of the organisation - could benefit from machine-readable descriptions of the status of the metadata vocabularies.

4. The process lifecycle

Metadata vocabulary producers usually have rules about procedures: and particularly about the circumstances under which the organisation approves a vocabulary. Often the vocabulary goes through iterations of change; and within each iteration there is a cycle of proposal - consulation - implementation - and approval. This cycle may be more or less formally defined within a vocabulary producing body, but it will usually be defined somewhere. These procedure documents are usually textual, human-readable descriptions.

5. Registries aren't enough

Databases of vocabularies (sometimes called 'schema registries') have been suggested as a solution to the problem of disseminating metadata vocabularies.
There are a number of initiatives to create databases of schemas. OCLC's Open Metadata Registry for example allows free-text search of an RDF database of schemas and then allows browsing from schema element to the full schema. The SWAG dictionary is notable because it provides both human readable and machine-readable access to the RDF schemas. The MetaData Server at SUB Gšttingen is a system which provides cross-walks between schemas in a searchable interface. Oasis have a database of XML schemas and DTDs, and there are numerous other lists of human-readable and machine readable schemas of various kinds available.

Like the rest of the web, these registries get out of date. Updating them requires humans to keep up with the enormous number of metadata vocabularies being created. What is needed is a way of automating the process of updating the information about metadata vocabularies.

6. Choosing metadata vocabularies

End-users would like to be able to find the answers to questions like

"which is the best version of Dublin Core for me to uses at the moment?"

We can't define 'best' except in the context of the (many) specific organisations' own processes of proposal, development and stability. However, we can talk about what is useful to use by relating metadata vocabulary documents to metadata vocabulary producers and their policies. To do this we need to link the documents to the policies and describe the sort of link that exists between them.

7. Naming objects on the web

In order to describe the relationships between objects on the web it is necessary to give the objects names. These could be direct names (such as the URL of a webpage) or indirect names (e.g. the organisation with a homepage of http://www.w3.org).
In the example of naming the procedures of a metadata vocabulary producing organisation there are several approaches we could take.

The difficulty with all methods of pointing into documents except for the RDF Schema approach is that XPointer - or pointing at the first paragraph of the second page, say, depends on the syntactic representation of the document rather than its meaning. Practically this means that if the document is changes, the meaning of the pointers will change, which could be an accidental effect of edits made to the document.

The RDF schema case and the case of giving each important part an <a name element have in common that they continue to point to the object referenced, even if the document is edited and its syntax altered.

8. Using RDF to link objects

RDF is a very flexible way of describing objects and their relationships. It can be used for encoding metadata vocabularies themselves. For this circumstance, it can be used to write a metadata vocabulary which allows you to describe the relationship between documents and organisations' policies and document lifecycle points, for example

"this document is at the Recommendation point in the W3C process lifecycle"

We need also to be able to state who says this about the document. Ideally, it should be the the organisation itself, as the entity with the best knowledge of its own process. It would be very simple for an organisation to issue a digitally signed press release using a standard metadata vocabulary, describing a change in the position of a document within its own process lifecycle. An example is given below:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	xmlns="http://ilrt.org/discovery/2001/02/dcmisw/meta1.rdf#"
         xmlns:dc = "http://purl.org/dc/elements/1.1/"
         xmlns:wn = "http://xmlns.org/wordnet/0.6/"
         xmlns:foaf = "http://xmlns.org/foaf/0.1/"
         xmlns:eg = "http://example.com/process/"
>


<wn:Document dc:title="Example.com recommends example metadata
vocabulary"/>
<dc:description>the standards organisation recommends example
schema for documents</dc:description>
<dc:date>2001-06-29</dc:date>

<!-- 

this document expresses that example.com says that a certain document
has been accepted under #recommendation of
http://example.com/procedures

-->

<process:states>

<rdf:Description> 
<process:target rdf:resource="http://example.com/some-vocab"/>
<process:accordsWith
rdf:resource="http://example.com/procedures#recommendation" />
</rdf:description>

</process:states>

<dc:creator>
<wn:Agent>
<dc:title>example metadata vocabulary perusing
committee</dc:title>
<foaf:mbox rdf:resource="mailto:example-committee@example.com"
/>
</wn:Agent>
</dc:creator>

</rdf:RDF>

9. Databases for metadata vocabularies

Information from these machine-readable 'press-releases' can be harvested into an RDF database about metadata vocabularies from the producers of the vocabularies. We can use the maturing RDF tools for storing and querying this information. Some sample queries in the SQL-like RDF query language Squish are provided below, with explanations.

10. Sample RDF queries

Find all the schemas endorsed by the organisation whose mailing list mailbox is mailto:dc-usage@jiscmail.ac.uk

SELECT ?schema
WHERE
(process::states ?x ?y)
(process::target ?y ?schema)
(dc::creator ?x ?agent)
(foaf::mbox ?agent mailto:dc-usage@jiscmail.ac.uk)
USING 
process FOR http://example.com/metadat_vocab/process/ 
dc FOR http://purl.org/dc/elements/1.1/ 
foaf FOR http://xmlns.org/foaf/0.1/

Find all the schemas about music and the titles of the organisations which have endorsed them.

SELECT ?schema, ?title
WHERE
(process::states ?x ?y)
(process::subject ?y ?schema)
(dc::creator ?x ?agent)
(dc::title ?agent ?title)
(dc::subject ?schema music)
USING 
process FOR http://example.com/metadat_vocab/process/ 
dc FOR http://purl.org/dc/elements/1.1/ 
foaf FOR http://xmlns.org/foaf/0.1/

Find all the schemas with a title property, plus the names and contact for endorsing organisations, plus the documents which state what an endorsement means.

SELECT ?schema, ?name, ?contact, ?document 
WHERE
(process::states ?x ?y)
(process::subject ?y ?schema)
(registry::accordsWith ?y ?document)
(dc::creator ?x ?agent)
(dc::title ?agent ?name)
(foaf::mbox ?agent ?contact)
(process::property ?x ?property)
(rdfs::label ?property ?test)
(squish::textmatch ?test title)
USING 
squish FOR http://swordfish.rdfweb.org/rdfquery/schema/
process FOR http://example.com/metadat_vocab/process/ 
dc FOR http://purl.org/dc/elements/1.1/ 
foaf FOR http://xmlns.org/foaf/0.1/
rdfs FOR http://www.w3.org/2000/01/rdf-schema#

11. Issues and conclusions

There are various advantages to this approach. One is from the point of view of the metadata creator, which needs to know interoperable formats for metadata - which are endorsed by whom and when; which may be endorsed soon; which are just useful schemas that people have created.

Another advantage is to the vocabulary-developing organisations themselves. This technique decouples the identifier for a schema from the status of the schema within the standards process. As long as a schema has an unchanging name, it can progress through an iteration of the vocabulary development process while having suitable records of information attached to it.

There are various issues with trust. Users have to know and trust the organisations themselves who produce these vocabularies. The information about the process is no good to the user if a user is coming cold to the area and knows nothing about the organisations.
The position of a metadata vocabulary within a given process may be an indication of whether it would be a good idea to use a vocabulary, but there is no certainty about this conclusion.

Finally there are lots of other things one might want to say about a metadata vocabulary, for example, who exactly was using it; what were there experiences; what mappings are available from this vocabulary to others. RDF can also provide some answers these types of questions, although this is beyond the scope of this paper. The technique outlined here is a basic methodology that can be used across multiple independent metadata communities, and provides a foundation for additional annotations and commentary to be exchanged in machine-processable form.

Acknowledgements

Thanks to Damian Steer for reading earlier drafts of this paper.

References


W3C process document
http://www.w3.org/Consortium/Process/

EARL: Evaluation and repair langauge
http://www.w3.org/2001/03/earl/

RDF Model and Syntax
http://www.w3.org/TR/REC-rdf-syntax

RDF Schema
http://www.w3.org/TR/rdf-schema

Schema regsitries

http://www.schemas-forum.org/

Three 'Metadata Watch' reports containing brief textual descriptions of schemas and ontologies in various sectors: http://www.schemas-forum.org/metadata-watch/1.html
http://www.schemas-forum.org/metadata-watch/2.html
http://www.schemas-forum.org/metadata-watch/2.html
Plus a list of categories of schema and links to the relevant parts of these Metadata Watch reports. http://www.schemas-forum.org/registry/registry.html

Knowledge organisation schema registry
http://www.isko.org/ko-schemata.html

Includes a pointer to the NKOS schema for registry information: http://nkos.slis.kent.edu -> http://staff.oclc.org/~vizine/NKOS/Thesaurus_Registry_version3_rev.htm
and the Controlled vocabularies resource guide:
http://www.fit.qut.edu.au/InfoSys/middle/cont_voc.html
and the Unfamiliar Metadata search project: http://www.sims.berkeley.edu/research/projects/metadata/GrantSupported/unfamiliar.html which is collection of search interfaces to a number of subject-based schema repositories. The results of a keyword search come back as descriptions of individuals elements of schemas. This is very comprehensive indeed.
and Beyond Bookmarks: Schemes for Organizing the Web http://www.public.iastate.edu/~CYBERSTACKS/CTW.htm - a collection of subject ordered links to schemas and controlled vocabularies. Very comprehensive.

MetaData Server at SUB Gšttingen
http://www2.sub.uni-goettingen.de/index.html

An information gateway and a database of crosswalks, cross cuts and mappings between DC and other schemas (Metaform). The mappings are good clear tables of elements, with textual, human-readable descriptions. Metaform doesn't seem to be searchable though.

SWAG Dictionary
http://webns.net/

Links to 12 schemas, all available on html/RDF/N3 formats. A very nice resource. Searchable too.

EOR
http://eor.dublincore.org/
http://wip.dublincore.org:8080/registry/jsp/registry.jsp

Searchable database of schemas; you can also browse them. The results are available in HTML only. In the interface you can switch from viewing an element to a schema, or a schema name to the whole schema.

There's also a new prototype RDF version: http://reg.ukoln.ac.uk/registry/jsp/swatch.jsp