Greenstone tutorial exercise
This exercise looks at using fielded searching in a collection. Fielded searching is best used for metadata rich collections. Here we use bibliographic data in MARC format.
- Start a new collection called Papers Bibliography which will contain a collection of example MARC records of the working papers published at the Computer Science Department, Waikato University. Enter the requested information and base it on -- New Collection --.
- In the Gather panel, open the sample_files → marc folder, drag CMSwp-all.marc into the right-hand pane and drop it there. A popup window asks whether you want to add MARCPlugin to the collection to process this file. Click <Add Plugin>, because this plugin will be needed to process the MARC records.
- Now select Browsing Classifiers within the Design panel and remove the default classifier for Source metadata.
- In the Search Indexes section, remove the ex.Source index. In this collection all records are from the same file, so ex.Source metadata, which is set to the filename, is not particularly interesting or useful.
- Switch to the Create panel, build the collection, and preview it. Browse through the titles and view a record or two. Try searching—for example, find items that include graphics.
- Back in the Librarian Interface, go to the Browsing Classifiers section of the Design panel. Select AZCompactList from the Select classifier to add drop down menu, and click <Add Classifier...>. In the popup window, select dc.Subject and Keywords as the metadata item. Click <OK>.
AZCompactList is like List, except that terms that appear multiple times in the hierarchy are automatically grouped together and a new node, shown as a bookshelf icon, is formed.
Build the collection and preview the result.
Using fielded searching
- Now let's look at fielded searching. In the browser, press the form search button below the usual search form. This will present a fielded search form.
- You can specify which search form types are available for a particular collection, and which one is the default, using the SearchTypes format statement. In the Format panel, select Format Features from the left-hand list. Select the SearchTypes format statement from the list of assigned formats, and set the contents to just simpleform. This will make only fielded searching available for this collection.
Search type options include plain, simpleform (for fielded searching) and advancedform (for fielded searching with boolean operations). You can specify any combination of these, separated by a comma. If the plain search type is specified, it will be available in the search area at the top of each page of the collection.
Preview the collection again. Notice that the collection's pages no longer includes a query box. (This is because the search form is too big to fit here nicely.) To search, you have to click form search in the navigation bar. Note that text search is no longer offered.
- Look at the search form in the collection. There are two fields that can be searched: text and titles. Add some more fields to search on by going back to the Librarian Interface.
- In the Design panel, go to the Search Indexes section. Add a new index based on dc.Subject and Keywords by clicking <New Index>, selecting dc.Subject and Keywords in the list of metadata, and clicking <Add Index>.
Rebuild the collection and preview the results. Notice the extra field in the in field drop-down menus in the search form. You can do quite complicated queries by searching for words in different fields at the same time.
- To change the text that is displayed in the drop-down menus of the search form, you would go to the Search section of the Format panel. Here you can change the display text for the indexes.
Exploding the database
- Go to the Enrich panel and try to see the metadata. It doesn't appear! This is because the metadata is associated with records inside the file, not the file itself.Metadata file types, such as MARC, CDS/ISIS, BibTex etc. can be imported into Greenstone but their metadata cannot be viewed in the Librarian Interface. To edit any metadata you need to go back to the program that created the file.Greenstone provides a way of exploding a metadata database so that each record appears as an individual document, with viewable and editable metadata. This process is irreversible: once this step has been done, the database is deleted and can no longer be used in its original program.
- In the Gather panel, you may notice that the MARC database has a different coloured icon to other files. A metadata database that can be exploded will be displayed with this green icon. Right-click on the file and choose Explode Metadata Database from the menu. A new window opens, containing options for the exploding process. A description of each option can be obtained by hovering the mouse over the option.
If it's not already on, turn on the metadata_set option by checking its box. This option indicates which metadata set to explode the metadata into. The default set is the "Exploded Metadata Set"—a metadata set which initially has no elements in it, but will receive a new element for each metadata field retrieved from the database.
- Click <Explode> to start the exploding process. This may take a short while, depending on the size of the database.
- Once exploding has finished, the MARC database file will have been deleted, and three folders created in its place. These folders contain an empty file for each record in the original database. The metadata for these records can be viewed and edited by switching to the Enrich panel.
- Because the MARC file is no longer present, and the collection contains empty (.nul) files, we need to change the list of plugins. In the Document Plugins section of the Design panel, remove MARCPlugin.
Rebuild and preview the collection. You will notice that the subjects classifier is empty, searching no longer returns any results, and the document display is useless.Although the titles classifier was built on ex.Title, it still displays the correct titles, but in the Enrich panel you can see the ex.Title metadata are actually the filenames rather than titles of the MARC records. This is because the default browse format uses the exp.Title metadata. In the Format Features section of the Format panel, select global in the list of assigned format statements. The format statement looks like:
The above choose-title template, defined in the global format features, is included by the browse format statements.
Since there is no dc.Title metadata and because exp.Title comes before ex.Title, the exploded titles will be displayed.
Reformatting the collection to use the exploded metadata
The collection previously used extracted (ex.) metadata, but now it uses exploded (exp.) metadata. The subjects classifier and search indexes were built on ex metadata, which is why they no longer work properly.There is also no longer any text in the documents. Previously, MARCPlugin stored the raw record as the "text" of each record. Now that the metadata is in the Librarian Interface, there is no longer the concept of raw record, and so there is no text.We need to modify the collection design to take note of these changes.
- In the Search Indexes section, change the Title index to use exp.Title: select the Title index in the Assigned Indexes list and click <Edit Index>. Deselect dc.Title and ex.Title in the list of metadata, and select exp.Title. Click <Replace Index>.
- Remove the dc.Subject and Keywords index by selecting it in the Assigned Indexes list and clicking <Remove Index>. Add an index on exp.Subject: click <New Index>, select exp.Subject in the metadata list, and click <Add Index>.
- The text index is no longer any use, so remove that index too.
- To enable combined searching across all indexes at once, click <New Index>, tick the Add combined searching over all assigned indexes (allfields) checkbox, and click <Add Index>. Move this to the top of the list using the <Move Up> button, so that it appears first in the drop down list. Click <Set Default Index> on the right so that it becomes the default field for searching.
- To explicitly use the exp.Title metadata, in the Browsing Classifiers section, change the dc.Title;Title List to use exp.Title metadata. Double click the dc.Title;Title List in the Assigned Classifiers list, and change the metadata option to use exp.Title. Click <OK>. Do the same thing for the Subject AZCompactList, changing dc.Subject and Keywords to exp.Subject.
Rebuild and preview the collection. The classifiers should be back to normal and searching should now work.
Switch to the Format Features section of the Format panel to make the following adjustments.
- There is no dc (or ex.dc) metadata for this collection, so in the global format feature's choose-title template, replace the following
- There are no source or thumb icons, so in the documentNode template of the browse format feature remove the occurrences of the following section:
- The ex.Source metadata is set to the nul filename, so remove that from the display. Remove:
Preview the collection. In the titles list, click on one of the documents. This will take you to the document's display page. Exploding the database has left this document display useless. Only the record Title (in this case, the generated filename) is displayed. We will make two changes to improve the document display. First, we will remove the record Title, since it is not useful in this instance. To remove the record Title, we need to override the default documentHeading format statement with one that does not do anything. Go to the display format feature in the Format Features section of the Format panel and add the following just after <gsf:option name="TOC" value="true"/>:
- Next, override the default documentContent behaviour by creating a format statement for this. Still in the display format features, after the documentHeading, add the following format statement (which can be copied from sample_files → marc → format_tweaks → document_content.txt):
Press the <Preview Collection> button to preview the collection and see how the document display has improved.