Greenstone tutorial exercise
Open Archives Initiative (OAI) collection
This exercise explores service-level interoperability using the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH). So that you can do this on a stand-alone computer, we do not actually connect to the external server that is acting as the data provider. Instead we have provided an appropriate set of files that take the form of XML records produced by the OAI-PMH protocol.
One of Greenstone's documented example collections is sourced over OAI. This exercise takes you through the steps necessary to reconstruct it. You may wish to take a look at the documented example collection OAI demo now to see what this exercise will build.
- Start a new collection called OAI Service Provider. Fill out the fields with appropriate information.
- In the Gather panel, locate the folder sample_files → oai → sample_small → oai. Drag this folder into the collection and drop it there.
- During the copy operation, a popup window may appear asking whether to add OAIPlugin to the list of plug-ins used in the collection, because the Librarian Interface has not found an existing plug-in that can handle this file type. Press the <Add Plugin> button to include it.The files for this collection consist of a set of images (in JCDLPICS → srcdocs) and a set of OAI records (in JCDLPICS) which contain metadata for the images.
When files are copied across like this, the Librarian Interface studies each one and uses its filename extension to check whether the collection contains a corresponding plug-in. No plug-in in the list is capable of processing the OAI file records that are copied across (they have the file extension .oai), so the Librarian Interface prompts you to add the appropriate plug-in.
Sometimes there is more than one plug-in that could process a file—for example, the .xml extension is used for many different XML formats. The popup window, therefore, offers a choice of all possible plug-ins that matched. It is normally easy to determine the correct choice. If you wish, you can ignore the prompt (click <Don't Add Plugin>), because plug-ins can be added later, in the Document Plugins section of the Design panel.
- You will need to specify which document the OAI metadata values should be attached to. In the Design panel, select the Document Plugins section, then select the OAIPlugin and click <Configure Plugin...>. Locate the document_field option in the popup window and type ex.dc.Identifier (it may not be available in the drop-down list until after building). Click <OK>. Finally, you may want to remove the EmbeddedMetadataPlugin to speed up building (since it's not going to extract metadata relevant to this tutorial anyway).
- You also need to configure the image plug-in. Select the ImagePlugin line in the Document Plugins section and click <Configure Plugin...>. In the resulting popup window locate the screenviewsize option, switch it on, and type the number 300 in the box beside it to create a screen-view image of 300 pixels. Click <OK>.
- Now switch to the Create panel and build and preview the collection.
OAIPlugin will process the OAI records, and assign metadata to the images, which are processed by ImagePlugin.
Like other collections we have built by relying on Greenstone defaults, the end result is passable but can be improved. The next steps refine the collection using the metadata harvested by OAI-PMH into the .oai files.
- In the Browsing Classifiers section of the Design panel, delete the two List classifiers (dc.Title;ex.Title and ex.Source).
- Add an AZCompactList classifier based on ex.dc.Subject metadata. Configure it with Subjects as the buttonname.
- Now add an AZCompactList classifier based on ex.dc.Description metadata. In its configuration panel set mingroup to 2, mincompact to 1, maxcompact to 10 and buttonname to Captions.Setting mingroup to 2 will mean that two or more documents with the same description will be grouped into a bookshelf; the default mingroup of 1 means that every document will get a bookshelf. mincompact and maxcompact control how many documents are grouped into each section of the horizontal A-Z list. In this case, each group can have as few as one document, and no more than ten.
- In the Search Indexes section of the Design panel, delete all indexes and add a new one based on ex.dc.Description metadata. Set the Display text for the ex.dc.Description index by going to the Search section in the Format panel and changing its label to "_labelDescription_". Using a macro for an index name means that it will display in the correct language (assuming that the macro has been translated). You can check Greenstone → macros → english.dm to see which macros are available..
Build the collection and preview it.
Tweaking the presentation with format statements
- In the Format panel, select Format Features. First replace the VList format statement with the following (which can be copied from the file vlist_tweak.txt in the sample_files → oai →format_tweaks folder).
This format statement customizes the appearance of vertical lists such as the search results and captions lists to show a thumbnail icon followed by Description metadata.
Next, select DocumentHeading from the Format Features list and change its format statement to:
The document heading appears above the DETACH and NO HIGHLIGHTING buttons when you get to a document in the collection. By default DocumentHeading displays the document's ex.Title metadata. In this particular set of OAI exported records, titles are filenames of JPEG images, and the filenames are particularly uninformative (for example, 01dla14). You can see them in the Enrich panel if you select an image in oai → JCDLPICS → srcdocs and check its ex.Source and ex.Title metadata. The above format statement displays ex.dc.Subject metadata instead.
- Finally, you will have noticed that where the document itself should appear, you see only "This document has no text.". To rectify this, select DocumentText in the Choose Feature pull-down list and use the following as its format statement (this text is in doctxt_tweak.txt in the format_tweaks folder mentioned earlier):
<center><table width=_pagewidth_ border=1>
This format statement alters how the document view is presented. It includes a screen-sized version of the image that hyperlinks back to the original larger version available on the web. (Unfortunately, the original versions of the images in this sample collection are no longer available on the web. If you want the link to lead to the local copy of the full size image, then use [ex.srclink]...[/ex.srclink] in place of <a href=[ex.dc.OrigURL]>...</a>.) Image property information extracted from the image, such as width, height and type, is also displayed as a consequence of using the above format statement.
<tr><td colspan=2 align=center>
<tr><td>Caption:</td><td> <i>[ex.dc.Description]</i> <br>
(<a href=[ex.dc.OrigURL]>original [ImageWidth]x[ImageHeight] [ImageType] available</a>)
- Format statements are processed by the runtime system, so the collection does not need to be rebuilt for these changes to take effect. Click <Preview Collection> to see the changes.