IMeshTk

Subject Gateway Requirements - questionnaire results

Libby Miller
Monica Bonett
May 2001

Background

A detailed questionnaire was sent to the Imesh toolkit mailing list in March 2001, in order to inform the work of the project. The questionnaire was based on the work done in the ImeshTk: Subject Gateway Review Literature Review and the ImeshTk: Subject Gateway Review plan.

The raw questionnaire is here. It asks mostly technical questions about operating systems and software used and the reasons why certain configurations were used. In addition, it asks about the sorts of features subject gateways would like such as spellcheckers, metadata editors, cross-searching capability, and so on. It also asks about expectations of future technological developments.

The questionnaire was designed in consultation with subject gateway experts. Debra Hiom, Nicky Ferguson and Dan Brickley reviewed early versions of the questionnaire. A later version was reviewed by Lorcan Dempsey and Debbie Campbell.

The questionnaire was sent to the IMesh list and also to specific gateways. 11 gateways responded; a mix of small and large, established and new. The raw results are available here.

Results summary

A very wide variety of platforms, languages and software components were used (even in this small sample), although most used a variant of Unix as an operating system.

Of the respondents, if they used an integrated package for their service, it was ROADS. Perhaps as a result of this, Whois++ was the most-used protocol. Several did not use any search protocol, other than http or SQL.

Of the rest, a wide variety of homemade, open source and proprietory components were used. These were not connected with documented APIs, but were connected together using custom software, often using the language Perl. Zebra and SQL databases were used by some as backend databases.

Almost all respondents found free, open source solutions important. There was an interest in personalization.

Results overview

Operating system and software

The majority UNIX variants or clones (8) to run custom code (6) or ROADS (3), or both (1). ROADS was the only integrated solution used. ( see results)

ROADS users gave various reasons for using it instead of custom code, including that there was a well-established userbase for ROADS, and that it was simpler and faster to set up a ROADS installation than starting from scratch. ( see results)

Custom solutions were described as hard to maintain and document, and often reliant on the skills of a particular person, but were more flexible. ( see results)

Where a custom solution was used, it was mostly run from a backend SQL database (3). ( see results)

Components were written in a variety of languages: 4 used Perl, 2 Bash, 2 SQL, 2 TCL, 3 C, 3 Java, 1 VBscript. All but one used more than one language. ( see results)

Protocols

A wide mixture of protocols were supported, in some cases more than one was supported by the same system. Z39.50 and Whois++ were popular choices. Some systems did not used a search protocol except for SQL query and API. 5 systems used Whois++, 2 used Z39.50. 3 had no protocol, except for http to present the results to the user. ( see results)

APIs

All except one had no answer to this question (3 N/A, 6 did nto answer). ADO was used by On-Tyne. ( see results)

Free/Open source

Free software was quite important to the majority and very important to four. There were concerns about the technical requireemnts of staff for complex technologies, and support was seen as important. Quality was mentionned in two answers as a guiding principle. ( see results)

Open source was explicitly seen as important to six respondents. Various reasons were given - ability to modify (1) reliability stability (2) lack of lock in to a supplier (1) community support (2). ( see results)

Document formats
8 used IAFA templates that ROADS uses. 4 used Dublin Core, 1 GILS and 1 IMS/IEEE LOM in XML. ( see results)

Technical expertise

Ranged from several individuals to no dedicated technical support. A wide variety of languages and skills were covered - Perl was the most popular language (5). ( see results)

Functionality

A wide range of functionality was desired but there was little consensus apart from the existing basic functionality of a metadata editor, searching, browsing.

The exceptions were harvesting, which was desired by 3 who didn't already have the functionality (5 did). Classification support tools were on 4 wishlists. Personalization on 7, tools for crosswalks between schemas on 5.

Future technologies

XML was thought to be an important future technology, as was Dublin Core XHTML, z39.50, RSS, LDAP. Only about Whois++ was there consensus that it would not be important in the future. ( see results)

Content

All except one of the gateways described only webpages. One gateway described images and video, and anther thought it might in the future. ( see results)

Sharing content

4 already shared content with other gateways or OPACS; 4 did not ( see results). Of those that didn't there were no significant barriers apparent to them and it was something they would do in the future. ( see results)

Subject gateway problems

At one end of the scale, the major problem for one gateway was choosing the right platform and database to run a gateway. At the other were two gateways whose main concern was scalability. The lack of technical support and help was the main issue for two gateways.

Conclusions

There were some difficulties with the approach used with this questionnaire and the associated results, so that the results here should be treated with caution. In particular, in attempting to address the needs of the IMesh community, the questionnaire probably was not distributed widely enough. By aiming it primarily at member of the IMesh list, we probably got a result biased towards ROADS users.

The second difficulty was the length of the questionnaire, which in retrospect was too long - people found it off-putting and the response rate was very low (about 11/200).

Some of the questions were ambiguously phrased and so produced ambiguous results (for example those where Y/N/ wishlist responses were required didn't say whether a 'no' response precluded a 'wishlist' response or vice-versa.)

Nevertheless, this survey produced an interesting set of results. It is surprising how diverse the platforms, technologies and interests of this group were.

A few things stick out: