README for RDF::RDFWeb Perl packages

Author: Dan Brickley
This version: $Id: README.html,v 1.18 2001/11/14 10:37:29 nr8262 Exp $

This document describes the installation and testing of some Perl tools for creating, querying and aggregating RDF data.

See also: RDF::RDFWeb homepage, STATUS.html

TODO: distinguish utilities from tests (eg. loadtest grew into a utility app)

Summary: look at loadtest, aggtest, harmony.pl, rdfweblet.t to understand how the system works.

Getting Started: Download and Installation

  1. Install base Perl packages
  2. Install additional RDF tools (Perllib, Cara etc) to provide query, rdf parser support
  3. experiment...
  4. Basic installation

    As a minimum, you'll need the RDF::RDFWeb perl packages. These should be available (by cvs or tarball) from http://ilrt.org/discovery/2001/06/rdfperl/.

    Additional packages

    To do anything interesting, you'll probably want an RDF parser too, so you can load up RDF data from online sources. Two parsers are currently supported, Cara and W3C Perllib. Work is in progress on wrapping Redland, which itself encapsulates a number of parsers. See the Cara website for information on the compilation and installation of the Cara system.

    Note: these instructions (and some of the code) currently assumes a Unix-like environment. The changes to run this under Win32 or other systems should be minimal (success has been reported under MacOS X, for example).

    Installing W3C Perllib (for RDF Parser and RDF query support)

    As a pure-Perl system, using W3C Perllib is a relatively easy way to get an RDF parser up and running.

    W3C Perllib can be installed from CVS:

        cvs -d :pserver:anonymous@dev.w3.org:/sources/public login
        anonymous   
    
        cvs -d :pserver:anonymous@dev.w3.org:/sources/public -z3 checkout perl/modules 
    

    You'll need to make sure the Perllib files can be found by the RDF::RDFWeb package. Since all this code is under active development, it may be best to avoid system-wide installation. Instead, the library is set up to look in '../..' for related tools. Assuming Perllib has been checked out alongside the rudolf-perl/ directory, ie. we have 'rudolf-perl/' and 'perl/' in the same directory, a simple symlink will do the job:

        ln -s perl/modules/W3C W3C
    

    Start Experimenting...

    The following notes show how to create a database on disk, populated with RDF from our samples or your own data.

    Using 'loadtest'

    The sample utility, 'loadtest', exercises the basic APIs for parsing and loading RDF data.

        ./loadtest --batch=SAMPLES --op=load --parser=http://www.w3.org/Perllib/
    
    or...
    
        ./loadtest --batch=SAMPLES --op=load --parser=http://cara.sourceforge.net/
    
    
        If you omit the --parser=..., the current default is to use Cara (or
        fail, if Cara is not installed).
    
    

    or more selectively, we can load a subset of the files:

         ./loadtest --batch=SAMPLES --op=load --select=moosw
    

    ... will create you a test database on disk under ./tmp/ and populate it with the aggregate of various sample files, as specified in the file 'SAMPLES'.

    load filtering: You can use --select=mp3 (or dc or whatever) to have it filter only those samples that fall into some category (or --filter=mp3 to exclude a category).

    Alternatively, RDF data files (including Web URIs) can be used instead of the batch file mechanism. Use something like --data=../samples/data.rdf instead.

    Here's how to invoke loadtest with the custom pseudo-prolog parser, running the node-folding load filter:

    ./loadtest --loadfilter=fold\
        --parser=http://ilrt.org/discovery/2001/04/tripler/ \
        --data=../samples/allfactoids.P \
        --loadfilter=fold \
        --op=load
    

    Note the --parser=.... option. If you care about the choice RDF parser, specify it via its homepage ...eg: --parser=http://cara.sourceforge.net/

    if you just want to view the triples, not load them, use:

       
        ...  --op=view instead of --op=load
    

    2. run the basic tests

        perl ./aggtest
    

    ... should confirm that everything is in working order. Only tests marked 'todo' should fail.

    3. run a test application

        ./rdfweblet.t --module=juke
    or
        ./moo.pl
    

    ...to run some code that exercises our RDF API.

    RDF APIs

    The RDF API supported by this package is still evolving. We began with a simple single-method query interface, then implemented a version of the Mozilla RDF calls (see nsIRDFDataSource docs at Mozilla for background). In addition, a fairly handy node-centric API now exists, using Perl 'AUTOLOAD' trickery to reflect RDF properties into the Perl object model. For examples, see harmony.pl or memtest.pl.

    Perllib interface

    The W3C Perllib code is used in a couple of ways: for an RDF parser (via RDF::RDFWeb::XRDFDataSource.pm) and as a query engine: the RDF::RDFWeb::SquishAlgae.pm package serves to translate Squish queries into the Algae form used internally by Perllib. The Perllib query system is capable of a llowing arbitrary RDF databases to be plugged in; this has yet to be explored here fully. See a proposal for amending the Perllib API to better support this sort of cross- implementation interface.

    (More API documentation is needed! Sample code linked to the tests would be best...)

    4. RDF Query

    To take a look at RDF query (initially implemented as a wrapper around W3C Perllib), see the ../RDF/RDFWeb/SquishAlgae.pm module, or t/sqtest.pl. The script dcmitest.pl is the beginning of an RDF query testbed for the Dublin Core Architecture work. It uses the sample file samples/dcmiarch1.rdf, and the same query as the online Java testbed.

    The following example shows a little Perl script that calls the RDF query system. doalgae() returns an array of hashes, where the keys correspond to the (numbered rather than named) columns in the resultset. formatResultSet() is a convenience for debugging. We should probably wrap all this in Perl's DBI interfaces since the basic structure is very similar to SQL query. See the DC-Architecture testing page for links to the dataset and other sample queries.

    
    #!/bin/perl
    # 
    # tiny.pl - minimal script illustrating use of RDF query API
    #
    BEGIN {unshift @INC,('../../..','../..','..');}
    
    use RDF::RDFWeb::SquishAlgae;
    
    my $q1= 
    "SELECT ?x, ?title, ?a, ?moddate, ?createddate, ?name, ?creatormail
    FROM 	http://rdfweb.org/people/danbri/2001/06/dcarch-test/dc3.rdf
    WHERE
    	(dc::title ?x ?title)
    	(dcq::abstract ?x ?a)
    	(dcq::modified ?x ?m)
    	(dcq::created ?x ?cd)
    	(rdf::value ?m ?moddate)
    	(rdf::value ?cd ?createddate)
    	(dc::creator ?x ?cr)
    	(vcard::FN ?cr ?name)
    	(vcard::EMAIL ?cr ?creatormail)
    USING 	dcq for http://dublincore.org/2000/03/13/dcq#
    	rdf for http://www.w3.org/1999/02/22-rdf-syntax-ns#
    	vcard for http://www.w3.org/2001/vcard-rdf/3.0#
    	dc for http://purl.org/dc/elements/1.1/
    ";
     
    my $q = new RDF::RDFWeb::SquishAlgae;
    my @results = $q->doalgae('../samples/dcmiarch1.rdf', $q->squish2algae($q1)); 
    print $q->formatResultSet(@results) ."\n\n"; # simple pretty printer
    
    # or to examine the result set in detail...
    
    foreach my $hit (@results) {
      printf (      "Resource URI: %s \tTitle: %s \tAbstract: %s 
    		Last-modified date: %s \tCreated_Date: %s 	
    		Creator name: %s\tCreator email: %s \n\n",
    	 	$$hit{0}, $$hit{1}, $$hit{2},
    	 	$$hit{3}, $$hit{4}, 
    		$$hit{5}, $$hit{6}, $$hit{7});
    }
    
    
    

    5. Node Folding

    To investigate node-folding (fancier aggregation facilities) see ./loadtest --loadfilter=fold (not working yet).

    6. Getting started

    to create your own RDF-based scripts, copy from aggtest or similar

    Using Digital Signatures with RDF

    These tools can be used alongside systems such as GnuPG, which can provide evidence that data has been passed across the public networks without being interfered with.

    The GNU Privacy Guard (GPG) package is available from the GNUPG site; RPM packages are also available.

    Quick overview

    (this assumes familiarity with GPG or PGP, that you have generated a public/private key pair etc...)

    Signing some RDF:
         gpg --detach-sign -a dcmi1.rdf
    
    verifying some RDF (you'll need the public key of the signer):
         gpg --verify dcmi1.rdf.asc dcmi1.rdf      
    

    If the RDF hasn't been altered since it was signed, you'll see something like:

        [pldab@fireball samples]$ gpg --verify dcmi1.rdf.asc dcmi1.rdf
        gpg: Signature made Thu Jun 28 19:00:14 2001 BST using DSA key ID 73228FE4
        gpg: Good signature from "Dan Brickley <danbri@w3.org>"
        gpg:                 aka "Dan Brickley <daniel.brickley@bristol.ac.uk>
    

    If the data has been altered since signing:

        [pldab@fireball samples]$ gpg --verify dcmi1.rdf.asc dcmi1.rdf
        gpg: Signature made Thu Jun 28 19:00:14 2001 BST using DSA key ID 73228FE4
        gpg: BAD signature from "Dan Brickley <danbri@w3.org>"   
    

    Note that this technique operates over the XML serialisation of RDF, and not over the data structure it encodes. We are not even treating the data as XML; instead, it is merely processed as a sequence of bytes. A complimentary approach can be found in W3C's XML Signature spec, which allows sub-sections of XML documents to be signed. There is also related work on XML encryption.

    The example above is run from the command line. For programmatic access, a Perl module exists that can be used to operate the GPG tools: The GnuPg::Interface CPAN module offers a Perl interface to GPG.

    	perl -MCPAN -e shell
    	cpan> install GnuPG::Interface  
    	perldoc GnuPG::Interface     
    
    

    See the GnuPG::Interface documentation for example code showing how to encrypt, decrypt etc.

    See also the Perl5-GPG interfaces, : available from SourceForge.

    The above methods provide low-level support for signed and/or encrypted RDF. Using this approach, you can have some assurance that a set of RDF triples was signed (in XML form) by the agency that owns (or stole...) a particular GPG key. Other mechanism are needed: for example, we want to be able to find (reliably; securely) PGP keys associated with particular individuals or groups. We want a lot of supporting infrastructure before signed metadata can be useful. RDF may prove useful in developing some of this infrastructure, for example, we can represent reciprocal key-signing 'web of trust' graphs in RDF (and sign these, and exchange them...).

    Further reading:

    Overview of Filetree

    Formal tests:
    
      ./aggtest
    
    (we should be adding to these...)
    ./rdfweblet.t will do this eventually
    
    Informal tests:
    
    EXAMPLES
        
    A shell script containing a few tests, for eg:
        ./loadtest --op=view --data=../samples/allfactoids.P \
        --parser=http://ilrt.org/discovery/2001/04/tripler/ --loadfilter=fold 
    
    
    
    SAMPLES
        A simple manifest file listing (and categorising) sample data files
        nearby. It looks like this:
    
        ../samples/data.rdf [mp3,dc]
        ../samples/data2.rdf [mp3,dc]
        ../samples/dcmi1.rdf [dc,edu]
        ../samples/meerkat.swp.rdf [swipe,dc,rss]
        ../samples/sm-data.rdf [rdfweb,dc]
        ../samples/xmlhack.rss [rss,dc]
        ../samples/moo.rdf [moosw]
        ../samples/verbmoo.rdf [moosw]
         
    
    
    STATE
        Used (poorly) by the database loader scripts to keep track of the
        currently loaded datasources.
    
    
    TODO
        Somewhat rambling textual narrative, relating bugfixes, works in
        progress.
    
    loadtest
        One of the main programs. 'loadtest' imports RDF data into a
        persistent triple store, can use multiple parsers.
    
        
    channel.w
    data.t  
    person.w
    rdfsview.w
    swipe.w
    mp3.w   Various Perl script fragments used by rdfweblet.t for rending RDF
    
    moo.pl
        A sample application, developed as a testbed for API features.
        This is basically a RDF reimplementation of the MOO system.
    
    mpinfo.pl 
        generates RDF describing an MP3 collection, given a directory name
        and base URI. In progress.
    
    rdfweb-todo.txt
        Functions extracted from the original Perl RDFWeb application
            and that need re-coding using the new API.
    
    rdfweblet.t
        An umbrella script that runs various tests based on '--module=xyz'
        arg passed in on commandline. Some tests are interactive, the rest
        should be integrated into more formal testing.
    
    reloadall.sh
        should be obsolete. This blanks the ./tmp/ databases and reloads a 
        number of our sample RDF files.
    
    squish.txt
        A sample Squish query: to be moved.
    
    test.pl
        The perl script that is supposed to run the test harness stuff
    
    tmp
        home for our default (transient) on disk store
    
    w3perllib.t
        obsolete. Old Perllib-specific wrapper script.