Harmony final report

libby.miller@bristol.ac.uk

July 2002

Background

The Harmony project was funded in 1999 to investigate multimedia metadata for resource discovery.

From the website: [1]

The potential of digital libraries lies in their ability to store and deliver complex multimedia resources that combine text, image, audio and video components. The relationships between these components are multifaceted including  temporal, spatial structural and semantic and any descriptions of a multimedia resource must account for these relationships.

The original project proposal for Harmony presented the need to extend the Dublin Core element set to work better for multi-media applications. The Harmony project's work has been about articulating a model that integrates the representation of media-specific metadata attributes (filesizes, data formats, image/video substructure) alongside other information critical to the effective retrieval of multi-media content.

The Harmony collaboration has led to a metadata representational model (aka ontology, vocabulary, schema) and XML/RDF query implementation (query language and database search system) that supports resource discovery applications over multi-media, multi-vocabulary metadata descriptions.

Partners

Cornell Digital Library Research Group, USA [2]
Distributed Systems Technology Centre, Australia [3]
Institute for Learning and Research Technology UK [4]

The principal investigators were Dan Brickley of ILRT, Carl Lagoze of Cornell, and Jane Hunter of DSTC.
Associated researchers included Libby Miller of ILRT.
Dan left the ILRT in 2001, but continued to support the project. Libby Miller became the ILRT's principal investigator after May 2001.

Of the three partners, only ILRT was funded by JISC. This led to a mismatch between JISC's expectations of deliverables and JISC's control over the participating institutions funded by other organisations. The ILRT deliverables were specified in more detail in June 2001 to resolve this problem.

Methodology

The methodology of the Harmony project used implementation as a testing and evaluation procedure for the modeling work.

The ABC metadata vocabulary and model [5] has been iteratively developed over the lifetime of the project, incorporating feedback from user communities, most notably the CIMI Consortium [6].

The focus on implementation is apparent from the ILRT's deliverables which included an implementation of an RDF query language as well as demonstrations illustrating the use of the query language [7] (see below).

DSTC have also created a demonstration search [8] using the KWILT XML query language.
Cornell have produced an authoring tool [9] for ABC-related schemas.

Activities and Outputs

The ILRT's deliverables were updated in June 2001 after consultation with JISC, to reflect the technical advances since the start of the project and the particular experience of the project members at ILRT. The deliverables chosen mapped closely to the original workplan, and the details of this are described below.

Activities before May 2001

From [10]:

Activities post May 2001

Below are described the results of each new deliverable (as agreed with JISC July 2001), F1-F6

Database and query of CIMI data (F1)

A demonstrator of querying CIMI data in various formats is specified in both the project proposal and in the original milestones plan (a prototype). Both ILRT and DSTC have written demonstrators which allow the querying of CIMI data.

A simple RDF query interface to some of the AMOL CIMI data was written in early 2001 at ILRT [11]. However, this demonstrator used only simple DC descriptions of the data. Another more complex demonstrator using a more complex ABC data structure was built to illustrate the flexibility of ABC descriptions and the capability of querying them. There are some example queries on the Harmony sample query page [12].

A final demonstrator [13] was completed in July 2002 and shows querying of a small subset of the AMOL CIMI data in ABC format using the Inkling RDF query engine. This demonstrator provides a test of the capabilities of the query language in comparison to an XML query implementation at DSTC [8].

ABC formal schema (f2)

The aim of the ABC model is to provide a common conceptual model between metadata ontologies from different domains. It draws upon many metadata models, including IFLA, Dublin Core and INDECS. Its aims are two-fold: for communities to use its modeling methodology to improve their own models, and as a primitive ontology for others to connect to and build on top of.

From IFLA FRBR it uses the distinction between work, expression, manifestation and item - the aspects of some intellectual work. The FRBR also talks about the persons who can create or own such things: people and corporate body, and the objects of such things: concept, object, event and place.

INDECS also dealt with entities, events, persons, roles, agents, inputs, outputs, contributors.

Dublin Core has 15 very useful relations, including title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language relation, coverage, rights.

DC qualifiers were later added to increase the specificity of the relations it describes. For example a description could be qualified by 'abstract' or 'table of contents'. At the start of the Harmony project the Dublin Core metadata initiative lacked any architecture for community specific extensions, whether these are multimedia, education-related, or concerned with digital content rights management.

The ABC model is an attempt to provide a framework to link these different aspects of metadata description. It uses the notion of an event to link works (objects), and their manifestations (versions) at different times, and thereby describe the actions that occur to them, and the agents involved with them. For example a photography event concerning a manifestation of a work (say a book) might be preceded by a manufacturing event of the manifestation (for example the publication of the book), but is also a creation event (of a photo) in itself, with an agent associated with it. Some sample ABC data sources may be found here [14], and some visual examples can be found in the appendices of a paper on ABC by Carl Lagoze and Jane Hunter [15].

The ABC schema was the main focus of the original milestones plan and work has continued on developing it within the Harmony team. The latest version was version 3, created in discussion with the CIMI Consortium. A paper about this was published in the Journal of Digital Information, Special Issue - selected papers from Dublin Core 2001 Conference (by Carl Lagoze and June Hunter) [15].

This version of the ABC model is available as an RDF schema [16] and as an XML Schema [17].

SquishQL releases (F3) version 6 and version 7

The SquishQL implementation is an essential component of the ILRT's Harmony deliverables, before and after the deliverables were revised. A query language and implementation appears in the project proposals, and in the original milestones plan.

A query language for data described in ABC is essential for the methodology used. One of the aims of Harmony was to give guidance to communities beginning to examine and develop descriptive ontologies. An essential component of this advice-giving role was to ensure that the conceptual framework used in ABC could in the future be useful. The strategy here was to show that the ABC model could be used to search complex metadata. Implementation was used as a testing and evaluation procedure for the modeling work.

The problems that the ABC model had to face for multimedia metadata were the same problems as those faced with storing and querying any metadata - for example, versioning, time-related changes, ownership and transfer of ownership and use of objects described, derivative objects such as photographs. For this reason RDF was the natural format to use, being the W3C's standard for metadata. At the start of the Harmony project there were no query implementations for RDF available, and so effort was spent on creating an implementation, Inkling, of a basic RDF query language, SquishQL.

SquishQL is an SQL-like syntax for querying RDF data. It is designed to be human-readable and relatively easy to write queries for. It is extremely flexible, allowing the querying of data of arbitrary complexity that is describable in RDF, which is itself an extremely flexible data description model. It is based on the popular graph matching paradigm for RDF, as described in the paper 'Enabling Inferencing' [18] and as first implemented by R.V Guha in RDFdb [19]. An illustration of the idea is shown below.

SquishQL extends this model to include syntax for useful extra features such as greater than and less than for numbers, and substring matching for text. A typical query of ABC-type data might look like this:

SELECT ?title, ?type, ?time, ?place, ?name 
FROM
http://ilrt.org/discovery/harmony/oai.rdf 
WHERE 
(web::type ?event abc::Event) 
(abc::context ?event ?context) 
(dc::type ?event ?type) 
(abc::time ?context ?time) 
(abc::place ?context ?place) 
(abc::act ?event ?act) 
(abc::agent ?act ?per) 
(abc::name ?per ?name) 
AND ?name ~ 'lagoze' 
USING web FOR http://www.w3.org/1999/02/22-rdf-syntax-ns# 
abc FOR http://ilrt.org/discovery/harmony/abc-0.1# 
dc for http://purl.org/dc/elements/1.1/ 

The flexibility of SquishQL means that any data described in RDF can be queried, which meant that as ABC developed, demonstrators could quickly be implemented to evaluate and test the model.

The ILRT revised deliverables included two releases of Inkling. The latest release (version 0.7) was in late July 2002. The deliverables related to Inkling are described in detail below.

Inkling is Open Source software, with several contributors, including Libby Miller, Dan Brickley and Leigh Dodds. It is part funded by the Harmony project and the IMesh Project [20]. Because of this, specific deliverables were sought as part of the Harmony project, detailed below.

Syntax amendments

Minor syntax ambiguities were removed:

Improve parser

The SquishQL parser was initially built by hand and parser errors were occurring to do with whitespace and linebreaks. Javacc, a parser generator was used to create a much more accurate SquishQL parser.

Pass queries to backend database

The Inkling/SquishQL software can now be used with an SQL backend database by rewriting the query into SQL and passing it directly to the database for optimal performance. The initial draft of the rewriter which was based on one written by Matt Biddulph was altered for a more efficient database schema. This means that queries are much faster and more efficient

Improve API

The API has been refactored but still requires further work to improve it in line with our experiences of querying data; for example although provenance of data is stored in the database, it is not easily accessible via the API.

Detailed documentation

This has been written with respect to detailed usecases and put on the web site for Inkling [21]. Javadoc is also available [22].

Result to maintain Resource/Literal distinction

Resource/literal distinction is now maintained within the query language and implementation.

Test version 6

An extensive range of testcases has been developed in collaboration with other developers. The testcases have also been made available on the web site [23].

SquishQL releases (F3) version 7

Amendments arising from testing

Minor parser changes have been made, and bugfixes.

Document features, usage and examples

Usecases have been written up and made available on the website [21].

Squish - formal specification (f4)

A BNF formal syntax has been written for the SquishQL Inkling implementation, and made available on the website, along with sample queries [24].

Calendar Schema (F5)

The aim of this deliverable was to examine the possibility of modeling complex event data in RDF. Events are an essential part of the ABC model. Events are used in the model to link works, and their manifestations or versions at different times, and actions that occur to them, and the agents involved with them. Date and time constructs also appear in the model for dating manifestations of works and the changes that occur to them. Technical issues that occur in calendaring and scheduling occur in the storage and query of any event, in particular datatyping and querying datatypes.

There has been substantial work at ILRT on calendaring and the technical side of events storage and querying, recently focussing on the use of RSS 1.0 and its events module as a simple way of getting organisations to produce simple structured events data [25].

Libby Miller has led the W3C RDF Interest Groups' Calendar Taskforce [26] since early 2001.

iCalendar schema in RDF:

An annotated RDF schema for iCalendar was written by Libby Miller and Michael Arick in June 2001 [27].

Some of the issues which have arisen are detailed in a post sent by Libby Miller to the RDF-calendar mailing list [28].

Comparison document - iCalendar, ABC, RSS, Skical and DAML

Libby Miller, Greg Fitzpatrick and Dan Brickley wrote a paper about iCalendar and DAML+oil for the XMLEurope conference in 2002 [29]. The paper detailed how one might use the DAML+oil ontology language to specify events data using the iCalendar format, the most commonly used standard for PDA and scheduling applications. The paper discussed issues of merging together data from different sources, and datatyping issues, both crucial to the storage and query of events-based data from different sources.

Calendar demonstrations

Several demonstrations of the use of SquishQL/Inkling with event data are available [30], [31].

Article for XML.com or similar on SquishQL (F6)

The initial target for an article on the query language and implementation developed at the ILRT was the online developer news site XML.com.

However it was later felt that a more important community in terms of the goals of the project was the RDF community itself [32].

A paper about an application of SquishQL and Inkling to querying complex and diverse institutional data also submitted to the WWW2002 conference, but failed to be accepted [33]. However the work was demonstrated on Developers' day at the conference [34].

Membership of WebOnt

Libby Miller is a member of the Web Ontology W3C working group, which is chartered to design a web ontology language which uses DAML+oil as a starting draft. The resultant language, OWL, may become a significant language for describing ontologies within the web community, and as such, input from such projects as Harmony is useful to its development [35].

Impacts and communities

From [8] the aims of the ABC model are to

One of the interesting results of this work has been that for most organisations, very simple metadata is all that they can or need to produce; the success of Dublin Core, OAI and RSS illustrates this.

ABC allows very careful modeling of objects, including their change over time. However it is extremely complex to model and is also difficult to understand. For many organisations a part of the model can be useful but in its entirety it is too detailed. For this reason, in parallel to the detailed modeling of ABC, Harmony members have been working with the Dublin Core, RSS and OAI communuites to increase the use of structured, machine-readable data.

Future priorities

Future priories should include further evangelization of RSS 1.0, RDF Calendaring and Dublin Core, and other Semantic Web technologies and formats, and the creation and support of tools which allow the creation, search and storage of formats like RSS (for example [36]). In particular, RSS 1.0 provides a successful and popular link between the digital libraries, semantic web and syndication communities. A short description follows.

RSS 1.0

RSS 1.0 [37] is a flexible mechanism for syndicating data. It consists of an ordered list of links with titles and descriptions, within a 'channel' which also has title, description and optional logo associated with it. RSS 0.91 was described using xml; RSS 1.0 uses RDF, a specific set of design conventions for XML.

Using RDF makes RSS 1.0 extensible. One can write modules which extend the basic idea of a list of links by adding modules for specific uses such as Dublin Core. So for example, a list of webpages could have not just title, description and link, but also Dublin Core elements such as subject, creator, language and date. Essentially RSS 1.0 forms a scaffold of simple, often-used properties of internet objects: link, title, description, and allows you to enhance this simple description with more complex data for specific purposes. This makes objects described in RSS highly interoperable: there are many tools which enable the display of RSS feeds, and even if the specific module used is not understood by the tool, the standard RSS tool will nevertheless be able to display the limited amount of information available in the standard RSS fields, title, description and url.

ILRT members of the Harmony team have shown how RSS plus its events module can be used at a minimal level to provide structured events data which can be queried and displayed in a calendar-like fashion - for example [25], which uses data created by Martin Poulter from the LTSN Economics Centre [38].

There are many other RSS 1.0 modules available, and creation of modules is straightforward. This means that ontologies or vocabularies (for example for learning objects) can be created in RDF and the data reused, syndicated, aggregated and queried using RSS 1.0 technologies. The next stage of work in this area should therefore be directed towards lightweight tool developement which enables the creation, storage and query of structured data.

References

[1] http://metadata.net/harmony/
[2] http://www2.cs.cornell.edu/NCSTRL/CDLRG/cdlrg.htm
[[3] http://www.dstc.edu.au/
[4] http://www.ilrt.bris.ac.uk/
[5] http://metadata.net/harmony/ABC/ABC.rdfs
[6] http://www.cimi.org/
[7] http://sw1.ilrt.org/rdfquery/
[8] http://sunspot.dstc.edu.au:9000/cocoon/xmlQry/index.html
[9] http://metadata.net/harmony/constructor/ABC_Constructor.htm
[10] http://ilrt.org/discovery/2001/03/multimeta/
[11] http://sw1.ilrt.org/discovery/2001/03/amol/


[12] http://sw1.ilrt.org/discovery/2001/01/demo/harmony.html
[13] http://sw1.ilrt.org/discovery/2002/07/abc/
[14]
http://swordfish.rdfweb.org/people/libby/harmony/rdf/
[15]
http://metadata.net/harmony/JODI_Final.pdf
[16] http://metadata.net/harmony/ABC/ABC.rdfs
[17] http://metadata.net/harmony/ABC/ABC.xsd
[18]
http://www.w3.org/TandS/QL/QL98/pp/enabling.html
[19] http://www.guha.com/rdfdb/
[20] http://www.imesh.org/toolkit/
[21] http://sw1.ilrt.org/rdfquery/downloads.html
[22] http://sw1.ilrt.org/rdfquery/javadoc/
[23] http://sw1.ilrt.org/rdfquery/tests/
[24] http://sw1.ilrt.org/rdfquery/squish-bnf.html
[25] http://sw1.ilrt.org/discovery/2002/04/rsscal/
[26] http://ilrt.org/discovery/2001/04/calendar/
[27] http://ilrt.org/discovery/2001/06/schemas/ical-full/hybrid.rdf
[28] http://lists.w3.org/Archives/Public/www-rdf-calendar/2002May/0000.html
[29] http://ilrt.org/discovery/2002/03/skical-daml/
[30] http://sw1.ilrt.org/discovery/2001/07/swws/
[31] http://sw1.ilrt.org/discovery/2001/12/xmlcal/
[32] http://www.ilrt.bris.ac.uk/discovery/2002/05/squish-iscw/
[33] http://ilrt.org/discovery/2001/11/ilrt-rdf-paper/
[34] http://www.ariadne.ac.uk/issue32/www2002/
[35] http://www.w3.org/2001/sw/WebOnt/
[36] http://rssxpress.ukoln.ac.uk/.
[37] http://purl.org/rss/1.0/
[38] http://www.economics.ltsn.ac.uk/