Greenstone tutorial exercise
Open Archives Initiative (OAI) collection
This exercise explores service-level interoperability using the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH). So that you can do this on a stand-alone computer, we do not actually connect to the external server that is acting as the data provider. Instead we have provided an appropriate set of files that take the form of XML records produced by the OAI-PMH protocol.
One of Greenstone's documented example collections is sourced over OAI. This exercise takes you through the steps necessary to reconstruct it. You may wish to take a look at the documented example collection OAI demo now to see what this exercise will build.
- Start a new collection called OAI Service Provider. Fill out the fields with appropriate information.
- In the Gather panel, locate the folder sample_files → oai → sample_small → oai. Drag this folder into the collection and drop it there.
- During the copy operation, a popup window may appear asking whether to add OAIPlugin to the list of plug-ins used in the collection, because the Librarian Interface has not found an existing plug-in that can handle this file type. Press the <Add Plugin> button to include it.The files for this collection consist of a set of images (in JCDLPICS → srcdocs) and a set of OAI records (in JCDLPICS) which contain metadata for the images.
When files are copied across like this, the Librarian Interface studies each one and uses its filename extension to check whether the collection contains a corresponding plug-in. No plug-in in the list is capable of processing the OAI file records that are copied across (they have the file extension .oai), so the Librarian Interface prompts you to add the appropriate plug-in.
Sometimes there is more than one plug-in that could process a file—for example, the .xml extension is used for many different XML formats. The popup window, therefore, offers a choice of all possible plug-ins that matched. It is normally easy to determine the correct choice. If you wish, you can ignore the prompt (click <Don't Add Plugin>), because plug-ins can be added later, in the Document Plugins section of the Design panel.
- You will need to specify which document the OAI metadata values should be attached to. In the Design panel, select the Document Plugins section, then select the OAIPlugin and click <Configure Plugin...>. Locate the document_field option in the popup window and type ex.dc.Identifier (it may not be available in the drop-down list until after building). Click <OK>.
- You also need to configure the image plug-in. Select the ImagePlugin line in the Document Plugins section and click <Configure Plugin...>. In the resulting popup window locate the screenviewsize option, switch it on, and type the number 300 in the box beside it to create a screen-view image of 300 pixels. Click <OK>.
- Now switch to the Create panel and build and preview the collection.
OAIPlugin will process the OAI records, and assign metadata to the images, which are processed by ImagePlugin.
Like other collections we have built by relying on Greenstone defaults, the end result is passable but can be improved. The next steps refine the collection using the metadata harvested by OAI-PMH into the .oai files.
- In the Browsing Classifiers section of the Design panel, delete the two List classifiers (dc.Title;ex.Title and ex.Source).
- Add an AZCompactList classifier based on ex.dc.Subject metadata. Configure it with Subjects as the buttonname.
- Now add an AZCompactList classifier based on ex.dc.Description metadata. In its configuration panel set mingroup to 2, mincompact to 1, maxcompact to 10 and buttonname to Captions.Setting mingroup to 2 will mean that two or more documents with the same description will be grouped into a bookshelf; the default mingroup of 1 means that every document will get a bookshelf. mincompact and maxcompact control how many documents are grouped into each section of the horizontal A-Z list. In this case, each group can have as few as one document, and no more than ten.
- In the Search Indexes section of the Design panel, delete all indexes and add a new one based on ex.dc.Description metadata. Set the Display text for the ex.dc.Description index by going to the Search section in the Format panel and changing its label to "descriptions".
Build the collection and preview it.
Tweaking the presentation with format statements
- In the Format panel, select Format Features. First, in the browse format statement, replace the templates for documentNode and classifierNode for VLists with the following (which can be copied from sample_files → oai → format_tweaks → browse_tweak.txt).
<gsf:template match="classifierNode[@classifierStyle = 'VList']">
Also replace the documentNode template of the Search format statement with the documentNode template above.
This format statement customizes the appearance of vertical lists such as the search results and captions lists to show a thumbnail icon followed by Description metadata.
Next, select the display format statement from the Format Features list and add the following to create a custom documentHeading format statement:
By default documentHeading displays the document's ex.Title metadata. In this particular set of OAI exported records, titles are filenames of JPEG images, and the filenames are particularly uninformative (for example, 01dla14). You can see them in the Enrich panel if you select an image in oai → JCDLPICS → srcdocs and check its ex.Source and ex.Title metadata. The above format statement displays ex.dc.Subject metadata instead.
- Still in the display format in the Format Features list, add the following (which can be copied from sample_files → oai → format_tweaks → document_content.txt) to create a custom documentContent format statement:
This format statement alters how the document view is presented. It includes a screen-sized version of the image that hyperlinks back to the original larger version available on the web (unfortunately, the original versions of the images in this sample collection are no longer available on the web). Factual information extracted from the image, such as width, height and type, is also displayed. It uses XSLT to construct the hyperlink which makes use of the greenstone metadata containing the link to the original image.
<td colspan="2" align="center">
<a><xsl:attribute name="href"><gsf:metadata name="ex.dc.OrigURL"/></xsl:attribute>
original <gsf:metadata name="ImageWidth"/>x<gsf:metadata name="ImageHeight"/> <gsf:metadata name="ImageType"/> available
- Format statements are processed by the runtime system, so the collection does not need to be rebuilt for these changes to take effect. Click <Preview Collection> to see the changes.