Author: Dan Brickley
This version: $Id: README.html,v 1.18 2001/11/14 10:37:29 nr8262 Exp $
This document describes the installation and testing of some Perl tools for creating, querying and aggregating RDF data.
See also: RDF::RDFWeb homepage, STATUS.html
TODO: distinguish utilities from tests (eg. loadtest grew into a utility app)
Summary: look at loadtest, aggtest, harmony.pl, rdfweblet.t to understand how the system works.
As a minimum, you'll need the RDF::RDFWeb perl packages. These should be available (by cvs or tarball) from http://ilrt.org/discovery/2001/06/rdfperl/.
To do anything interesting, you'll probably want an RDF parser too, so you can load up RDF data from online sources. Two parsers are currently supported, Cara and W3C Perllib. Work is in progress on wrapping Redland, which itself encapsulates a number of parsers. See the Cara website for information on the compilation and installation of the Cara system.
Note: these instructions (and some of the code) currently assumes a Unix-like environment. The changes to run this under Win32 or other systems should be minimal (success has been reported under MacOS X, for example).
As a pure-Perl system, using W3C Perllib is a relatively easy way to get an RDF parser up and running.
W3C Perllib can be installed from CVS:
cvs -d :pserver:anonymous@dev.w3.org:/sources/public login
anonymous
cvs -d :pserver:anonymous@dev.w3.org:/sources/public -z3 checkout perl/modules
You'll need to make sure the Perllib files can be found by the RDF::RDFWeb package. Since all this code is under active development, it may be best to avoid system-wide installation. Instead, the library is set up to look in '../..' for related tools. Assuming Perllib has been checked out alongside the rudolf-perl/ directory, ie. we have 'rudolf-perl/' and 'perl/' in the same directory, a simple symlink will do the job:
ln -s perl/modules/W3C W3C
The following notes show how to create a database on disk, populated with RDF from our samples or your own data.
The sample utility, 'loadtest', exercises the basic APIs for parsing and loading RDF data.
./loadtest --batch=SAMPLES --op=load --parser=http://www.w3.org/Perllib/
or...
./loadtest --batch=SAMPLES --op=load --parser=http://cara.sourceforge.net/
If you omit the --parser=..., the current default is to use Cara (or
fail, if Cara is not installed).
or more selectively, we can load a subset of the files:
./loadtest --batch=SAMPLES --op=load --select=moosw
... will create you a test database on disk under ./tmp/ and populate it with the aggregate of various sample files, as specified in the file 'SAMPLES'.
load filtering: You can use --select=mp3 (or dc or whatever) to have it filter only those samples that fall into some category (or --filter=mp3 to exclude a category).
Alternatively, RDF data files (including Web URIs) can be used instead of the batch file mechanism. Use something like --data=../samples/data.rdf instead.
Here's how to invoke loadtest with the custom pseudo-prolog parser, running the node-folding load filter:
./loadtest --loadfilter=fold\
--parser=http://ilrt.org/discovery/2001/04/tripler/ \
--data=../samples/allfactoids.P \
--loadfilter=fold \
--op=load
Note the --parser=.... option. If you care about the choice RDF parser, specify it via its homepage ...eg: --parser=http://cara.sourceforge.net/
if you just want to view the triples, not load them, use:
... --op=view instead of --op=load
perl ./aggtest
... should confirm that everything is in working order. Only tests marked 'todo' should fail.
./rdfweblet.t --module=juke
or
./moo.pl
...to run some code that exercises our RDF API.
The RDF API supported by this package is still evolving. We began with a simple single-method query interface, then implemented a version of the Mozilla RDF calls (see nsIRDFDataSource docs at Mozilla for background). In addition, a fairly handy node-centric API now exists, using Perl 'AUTOLOAD' trickery to reflect RDF properties into the Perl object model. For examples, see harmony.pl or memtest.pl.
The W3C Perllib code is used in a couple of ways: for an RDF parser (via RDF::RDFWeb::XRDFDataSource.pm) and as a query engine: the RDF::RDFWeb::SquishAlgae.pm package serves to translate Squish queries into the Algae form used internally by Perllib. The Perllib query system is capable of a llowing arbitrary RDF databases to be plugged in; this has yet to be explored here fully. See a proposal for amending the Perllib API to better support this sort of cross- implementation interface.
(More API documentation is needed! Sample code linked to the tests would be best...)
To take a look at RDF query (initially implemented as a wrapper around W3C Perllib), see the ../RDF/RDFWeb/SquishAlgae.pm module, or t/sqtest.pl. The script dcmitest.pl is the beginning of an RDF query testbed for the Dublin Core Architecture work. It uses the sample file samples/dcmiarch1.rdf, and the same query as the online Java testbed.
The following example shows a little Perl script that calls the RDF query system. doalgae() returns an array of hashes, where the keys correspond to the (numbered rather than named) columns in the resultset. formatResultSet() is a convenience for debugging. We should probably wrap all this in Perl's DBI interfaces since the basic structure is very similar to SQL query. See the DC-Architecture testing page for links to the dataset and other sample queries.
#!/bin/perl
#
# tiny.pl - minimal script illustrating use of RDF query API
#
BEGIN {unshift @INC,('../../..','../..','..');}
use RDF::RDFWeb::SquishAlgae;
my $q1=
"SELECT ?x, ?title, ?a, ?moddate, ?createddate, ?name, ?creatormail
FROM http://rdfweb.org/people/danbri/2001/06/dcarch-test/dc3.rdf
WHERE
(dc::title ?x ?title)
(dcq::abstract ?x ?a)
(dcq::modified ?x ?m)
(dcq::created ?x ?cd)
(rdf::value ?m ?moddate)
(rdf::value ?cd ?createddate)
(dc::creator ?x ?cr)
(vcard::FN ?cr ?name)
(vcard::EMAIL ?cr ?creatormail)
USING dcq for http://dublincore.org/2000/03/13/dcq#
rdf for http://www.w3.org/1999/02/22-rdf-syntax-ns#
vcard for http://www.w3.org/2001/vcard-rdf/3.0#
dc for http://purl.org/dc/elements/1.1/
";
my $q = new RDF::RDFWeb::SquishAlgae;
my @results = $q->doalgae('../samples/dcmiarch1.rdf', $q->squish2algae($q1));
print $q->formatResultSet(@results) ."\n\n"; # simple pretty printer
# or to examine the result set in detail...
foreach my $hit (@results) {
printf ( "Resource URI: %s \tTitle: %s \tAbstract: %s
Last-modified date: %s \tCreated_Date: %s
Creator name: %s\tCreator email: %s \n\n",
$$hit{0}, $$hit{1}, $$hit{2},
$$hit{3}, $$hit{4},
$$hit{5}, $$hit{6}, $$hit{7});
}
|
To investigate node-folding (fancier aggregation facilities) see ./loadtest --loadfilter=fold (not working yet).
to create your own RDF-based scripts, copy from aggtest or similar
These tools can be used alongside systems such as GnuPG, which can provide evidence that data has been passed across the public networks without being interfered with.
The GNU Privacy Guard (GPG) package is available from the GNUPG site; RPM packages are also available.
(this assumes familiarity with GPG or PGP, that you have generated a public/private key pair etc...)
Signing some RDF:
gpg --detach-sign -a dcmi1.rdf
verifying some RDF (you'll need the public key of the signer):
gpg --verify dcmi1.rdf.asc dcmi1.rdf
If the RDF hasn't been altered since it was signed, you'll see something like:
[pldab@fireball samples]$ gpg --verify dcmi1.rdf.asc dcmi1.rdf
gpg: Signature made Thu Jun 28 19:00:14 2001 BST using DSA key ID 73228FE4
gpg: Good signature from "Dan Brickley <danbri@w3.org>"
gpg: aka "Dan Brickley <daniel.brickley@bristol.ac.uk>
If the data has been altered since signing:
[pldab@fireball samples]$ gpg --verify dcmi1.rdf.asc dcmi1.rdf
gpg: Signature made Thu Jun 28 19:00:14 2001 BST using DSA key ID 73228FE4
gpg: BAD signature from "Dan Brickley <danbri@w3.org>"
Note that this technique operates over the XML serialisation of RDF, and not over the data structure it encodes. We are not even treating the data as XML; instead, it is merely processed as a sequence of bytes. A complimentary approach can be found in W3C's XML Signature spec, which allows sub-sections of XML documents to be signed. There is also related work on XML encryption.
The example above is run from the command line. For programmatic access, a Perl module exists that can be used to operate the GPG tools: The GnuPg::Interface CPAN module offers a Perl interface to GPG.
perl -MCPAN -e shell cpan> install GnuPG::Interface perldoc GnuPG::Interface
See the GnuPG::Interface documentation for example code showing how to encrypt, decrypt etc.
See also the Perl5-GPG interfaces, : available from SourceForge.
The above methods provide low-level support for signed and/or encrypted RDF. Using this approach, you can have some assurance that a set of RDF triples was signed (in XML form) by the agency that owns (or stole...) a particular GPG key. Other mechanism are needed: for example, we want to be able to find (reliably; securely) PGP keys associated with particular individuals or groups. We want a lot of supporting infrastructure before signed metadata can be useful. RDF may prove useful in developing some of this infrastructure, for example, we can represent reciprocal key-signing 'web of trust' graphs in RDF (and sign these, and exchange them...).
Further reading:
Formal tests:
./aggtest
(we should be adding to these...)
./rdfweblet.t will do this eventually
Informal tests:
EXAMPLES
A shell script containing a few tests, for eg:
./loadtest --op=view --data=../samples/allfactoids.P \
--parser=http://ilrt.org/discovery/2001/04/tripler/ --loadfilter=fold
SAMPLES
A simple manifest file listing (and categorising) sample data files
nearby. It looks like this:
../samples/data.rdf [mp3,dc]
../samples/data2.rdf [mp3,dc]
../samples/dcmi1.rdf [dc,edu]
../samples/meerkat.swp.rdf [swipe,dc,rss]
../samples/sm-data.rdf [rdfweb,dc]
../samples/xmlhack.rss [rss,dc]
../samples/moo.rdf [moosw]
../samples/verbmoo.rdf [moosw]
STATE
Used (poorly) by the database loader scripts to keep track of the
currently loaded datasources.
TODO
Somewhat rambling textual narrative, relating bugfixes, works in
progress.
loadtest
One of the main programs. 'loadtest' imports RDF data into a
persistent triple store, can use multiple parsers.
channel.w
data.t
person.w
rdfsview.w
swipe.w
mp3.w Various Perl script fragments used by rdfweblet.t for rending RDF
moo.pl
A sample application, developed as a testbed for API features.
This is basically a RDF reimplementation of the MOO system.
mpinfo.pl
generates RDF describing an MP3 collection, given a directory name
and base URI. In progress.
rdfweb-todo.txt
Functions extracted from the original Perl RDFWeb application
and that need re-coding using the new API.
rdfweblet.t
An umbrella script that runs various tests based on '--module=xyz'
arg passed in on commandline. Some tests are interactive, the rest
should be integrated into more formal testing.
reloadall.sh
should be obsolete. This blanks the ./tmp/ databases and reloads a
number of our sample RDF files.
squish.txt
A sample Squish query: to be moved.
test.pl
The perl script that is supposed to run the test harness stuff
tmp
home for our default (transient) on disk store
w3perllib.t
obsolete. Old Perllib-specific wrapper script.