Authors: Dan Brickley , Libby Miller
Date: Nov 2000
Latest version: http://ilrt.org/discovery/2000/11/rss-query/
This document explores some examples based around the idea of extending RSS using RDF-based modularisation, and then querying the resulting data in ways that exploit those extensions.
The examples explored here are based on the RSS 1.0 Specification, as published by the RSS Working Group (RSS-DEV ). You are are looking at a preliminary draft - in particular, we have not written schemas for the extension vocabulary used, nor polished the example(s).
RSS is often used to expose a structured view of data from web-sites whose content has some richer consistent structure. For example, RSS channels might represent items from a Job-listing service, online auctions, an aggregation of personal Weblog feeds, or descriptions of houses for sale. In these examples we explore ways of extending RSS to expose more of this structure to RSS aggregation and query services.
See jobs-rss.rdf test file.
<?xml version="1.0"?> <rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wn="http://xmlns.com/wordnet/1.6/" xmlns:job="http://ilrt.org/discovery/2000/11/rss-query/jobvocab.rdf#"> <channel rdf:about="http://ilrt.org/discovery/2000/11/rss-query/jobs-rss.rdf"> <title>A hypothetical job listings channel</title> <link>http://ilrt.org/discovery/2000/11/rss-query/</link> <description> This example shows RSS used as a lightweight data transport mechanism </description> <image rdf:resource="http://ilrt.org/discovery/2000/11/rss-query/joblogo.gif"/> <items> <rdf:Seq> <rdf:li resource="http://example.com/job1.html" /> <rdf:li resource="http://example.com/job2.html" /> </rdf:Seq> </items> </channel> <image rdf:about="http://ilrt.org/discovery/2000/11/rss-query/joblogo.gif"> <title>RSS Job listing demo</title> <link>http://ilrt.org/discovery/2000/11/rss-query/</link> <url>http://ilrt.org/discovery/2000/11/rss-query/joblogo.gif</url> </image> <item rdf:about="http://example.com/job1.html"> <title>The title of job1 goes here</title> <link>http://example.com/job1.html</link> <description> (Job1-Job1-Job1...) A simple textual description of the job (ie. abstract of the job advert we reference) goes here. </description> <job:advertises> <wn:Job job:title="Job title for job1 goes here" job:salary="100000" job:currency="USD" > <job:orgHomepage rdf:resource="http://www.ukoln.ac.uk/"/> </wn:Job> </job:advertises> </item> <item rdf:about="http://example.com/job2.html"> <title>The title of job1 goes here</title> <link>http://example.com/job2.html</link> <description> (Job2-Job2-Job2...) A simple textual description of the job (ie. abstract of the job advert we reference) goes here. </description> <job:advertises> <wn:Job job:title="Job title for job2 goes here" job:salary="150000" job:currency="UKP" > <job:orgHomepage rdf:resource="http://ilrt.org/"/> </wn:Job> </job:advertises> </item> </rdf:RDF>
This RDF data structure can be thought of as an XML encoding of an 'edge labelled graph'. As such it can be represented graphically. The visualisation presented here was generated with the RDFViz tool.
Note: The representation of the list ordering is wrong; we show rdf:li instead of rdf:_1, rdf:_2 etc.
Having agreed an RSS extension such as this, how might one use it? While generic RSS 1.0 processors will be able to make some use of this data, we can show more sophisticated uses that exploit the additional data. For example, in querying.
Here we show a simple SQL-ish ("Squish") query, couched against a hypothetical database of job descriptions aggregated using the extended RSS outlined above.
Note: @@TODO query docs reference. We've been using variants on this syntax (see also Guha's RDFdb) without a proper writeup. The query syntax used here is designed to resemble the basic structure of SQL: we ask some database for possible values for a selection of variables given some constraining expression. In Squish, this constraining expression can be thought of as a list of RDF statements where some parts of each statement have missing values (this indicated by '?' variables in place of URIs or string literals). While an RDF/XML representation of this data structure is possible, we use a plain text encoding here for clarity.
SELECT ?item, ?job, ?orghome, ?salary, ?currency WHERE (job::advertises ?item ?job) (rdf::type ?job wordnet::Job) (job::salary ?job ?salary) (job::currency ?job ?currency) (job::orgHomepage ?job ?orghome) USING job FOR http://ilrt.org/discovery/2000/11/rss-query/jobvocab.rdf# rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns# wordnet FOR http://xmlns.com/wordnet/1.6/
The answer to queries like this can be represented as a tabular result set, where columns correspond to the variables in the query ("?salary" etc), and rows correspond to states of affairs represented in the RSS in which the variables match values from the dataset. This is very similar to the ODBC/JDBC model familiar from the relational database world. In addition, the result set can be viewed as another RDF/RSS dataset, ie. the data graph corresponding to all the nodes and arcs implicated in the answering of the query. For further discussion of this see the paper "Enabling Inference" or the more recent (very draft) case study "RDF, SQL and the Semantic Web".
Note: Skip this section if you're happy simply to know that there is an SQL-like query system that allows RSS to be queried in terms of the RDF information model.
For each row in the result set, there will be a concrete value given for each named variable such as "?item" that is specified in the SELECT clause. The variable itself is a placeholder rather than a specific Web resource. Some of the properties of the resources it is 'standing in for' are specified by the constraints in the WHERE clause. In the simplified system presented here, the WHERE clause is implicitly AND'd, though you might imagine adding 'OR', 'NOT' in a more industrial strength version.
Here is another simply query (not using the RSS core vocabulary, though this data could be transported within RSS). First we present the SQL-ish query, followed by a prose translation.
Squish: SELECT ?x, ?t, ?c, ?o WHERE (dc::title ?x, ?t) (dc::creator ?x, ?c) (eg::homePage ?c http://purl.org/net/eric/) (eg::worksFor ?c, ?o) USING dc FOR http://purl.org/dc/1.1/ eg FOR http://example.com/vocab/foaf/
Here we're saying...
find me the dc title (we'll call it 't') of any resource (we'll call it 'x') that has a dc creator 'c' with a homepage 'http://purl.org/net/eric/'; oh, and tell me who they work for ('o').
The answer is (just as in the SQL world) a table, with columns corresponding to the things we asked for, ie. 't','x','c','o'. Each row will supply one set of values from the database that match the constraints in the 'WHERE' clause of the query.
Here's a tabular representation of a possible result set from our main example...
|job1.html||job 1 title here||http://www.ukoln.ac.uk/||100000||USD|
|job2.html||job 2 title here||http://ilrt.org//||150000||UKP|
Further work is required on the integration of XML data typing into the system outlined. The example shown here illustrates one strategy for using RSS as a generic data transport for e-commerce related structures. A single data format, RSS, can allow for simple aggregation of human readable 'tables of contents' while also supporting richer data-oriented aggregation and query. By using XML namespaces and RDF, the RSS 1.0 proposal provides a mechanism that supports the graceful extension of the RSS core without the need for tightly-coordinated agreement about extension vocabularies.
In related work, we show an RDF query implementation, including an online demonstration service (xmlhack writeup).