Author: Libby Miller
Latest version: http://ilrt.org/discovery/2001/02/squish/
RDF needs a API-independent query language to take its use into the mainstream. This document is not a formal proposal for an RDF query language. Instead, it is an overview and implementation of a simple strawman SQL-like query language for RDF. The document also describes some of the uses we have made of the query language, and some of the problems we have come across in using it.
A draft for comments.
Squish is an SQL-ish query language for RDF. Below we describe the syntax for Squish, and then examine some applications of Squish to a web-based environment. The applications examined are Schemarama, IMesh database and an RDF calendar implementation. Brief descriptions of each provide an understanding of the functionality and limitations of Squish in its Java implementation.
Squish (SQL-ish) is aimed at being a simple graph-navigation query language for RDF, which can be used to test some of the functionality we will require from an RDF query language.
It is based on ideas from "Enabling Inferencing":
RDF's simple yet powerful data model allows for an equally simple yet
powerful query language. The query language is based on a single query
mechanism : subgraph matching.
This paper suggested an RDF/XML syntax for the query language. Squish is a syntactic variant on the same idea: queries are expressed as strings as in SQL. The approach is similar to that used in Guha's rdfDB query language.
The query specifies a linked subgraph expressed in terms of triples of subject, predicate and object. Unknown variables are specified using a string starting with a question mark. Known variables may be expressed using their full URIs or in an abbreviated form to make it easier to write and understand queries. Also available are "SELECT" commands, similar to the SQL SELECT command, which chooses the variables you are interested in, and there is also limited support for constraints, for example for text matching.
Here's an example query:
SELECT ?title, ?description FROM http://test1/test, http://test2/test WHERE (dc::creator ?doc ?sname) (dc::title ?doc ?title) (dc::description ?doc ?description) AND ?sname ~ brickley USING dc FOR http://purl.org/dc/1.1/
NOTES: * variables must start with a ? and contain no spaces * the arguments to FROM are urls which are de-serialized and treated as the basis for the query. * constraints can be variable > | < | = | >= | <= | ~ value * where value should be an integer, except in the case of ~ where it should be a string. Syntactic ISSUES: * :: or : * default namespace
I have been building applications from an implementation of Squish in Java, which also uses the JDBC API to access the RDF databases. Three example applications are described: Schemarama (a validation technique for syntactically valid RDF documents, based on Schematron), IMesh database (a database of people and projects relevant to the IMesh project), and a calendaring application, which doesn't yet have a name (Danbri? help me out here...). Each application uses Java Squish accessed through the JDBC API from a JSP or Java Servlet, using either an in-memory or a Postgres SQL backend as the RDF database. In each case I look at the usefulness and limitations of Squish in this implementation.
Schemarama is a validation technique for specifying the types of subgraphs you want to have connected to a particular set of nodes in an RDF Graph. It is based on Rick Jelliffe's Schematron, an XML schema language which works by finding tree patterns within an XML document.
Schemarama is much similar than Schematron. It has two main parts, which are represented as two Squish queries. The first is the context. This queries the RDF document(s) specified for the set of nodes that you are interested in testing the arcs out of. The second is the test query, another Squish query that expresses the test you wish to make on the nodes picked out by the context query. You can also specify an error message to print if a test node does not validate.
Schemarama could be used where each of a type of node is required to have certain properties, for example RSS 1.0 - items are required to have title and link; channels to have title, link. (note - RSS case, can do this with XML validation tools).
Writing Schemarama with Squish was very simple: you just need to put the results of the query into the second query. Effectively this is the simplest case for Squish to handle, because the documents parsed are online. The demo is an X-line JSP, and seems to work reasonably well.
IMesh database was a project to discover the links between people, organisations and projects, in an area (online Subject gateways) where the same people from the same organisations collaborate on various projects. To build up this information I started from the IMesh mailing list members and researched their projects and organisations. I was interested in recording the process of generating this metadata, so with Dan Brickley's help I formulated a complex metadata format which reified the information found into RDF, and also recorded the time, data and creator of the RDF. The complexity and homogeneity of the resultant data caused particular problems for the Java implementation of Squish with an SQL datastore, which is worth outlining here (more information is available in the paper I wrote about this).
homogeneity + specific implementation -> slow cumbersome querying of reified statements awkward querying of bags text-matching problems - API problems? accessing the capacities of the database
I wanted a calendar which could merge RDF data from different sources, without syncing it. Here RDF is essentially acting as a common file format, until we add support for deciding when two events are actually the same event, and so can do syncing. In this case, although it was comparatively easy to implement a merging of calendar data form different sources, there were some limitations of Squish that were annoying (feature wishlist) and some functional wishes which are not easily implementable in Squish, and require some kind of reasoning support (whether hard-coded or otherwise).
The RDF calendar demo takes urls and a date as a string from the cgi parameters, fetches the RDF from the URLs and queries for events on the date using the following query: we could extend this to query for events pertaining to a particular person easily enough: The JSP then orders the events returned by date and displys then in html.
* 'order by' feature * a secure way of getting password-protected URL content * a good way of tracking which URL each content came from (linked with security and privacy issues) * datatypes
* free/busy [[ you need to say something like: this person is free if an event isn't starting at that time, and also no event has started before this event and hasn't finished before this event and also no event is scheduled to start before this event is scheduled to finish. ]]
two categories of problems: 1. lack of communication between the query client and the database - so that functions of the database could not be used by the query engine. Examples include support for statements, support for text matching, support for Squish (e.g. Guha's rdfDB has a similar query interface).