Greenstone tutorial exercise

Back to wiki
Back to index
Prerequisite: A collection of Word and PDF files
Devised for Greenstone version: 2.70w|3.06
Modified for Greenstone version: 2.87|3.08

Formatting the Word and PDF collection

In this exercise, we play around with the format statements in the Word and PDF collection.

  1. Open the reports collection in the Librarian Interface and go to the Format Features section of the Format panel.

Tidying up the default format statement

  1. In this part of the exercise, we make the format statement simpler without changing the resulting display.

    Greenstone's default format statement is complex because it is designed to produce something reasonable under almost any conditions, and also because for practical reasons it needs to be backwards compatible with legacy collections. For this collection, we don't need all of the complexity.

    Make sure that the VList format statement is selected in the list of formats.

    The default VList format statement looks like the following:

    <td valign="top">[link][icon][/link]</td>
    <td valign="top">[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]</td>
    <td valign="top">[highlight]
    {Or}{[dc.Title],[exp.Title],[ex.Title],Untitled}
    [/highlight]{If}{[ex.Source],<br><i>([ex.Source])</i>}</td>

    This format statement is the default used for any vertical list, such as search results, classifiers, and document table of contents.

    {Or}{[ex.thumbicon],[ex.srcicon]} chooses ex.thumbicon metadata if it's there, otherwise chooses ex.srcicon metadata. If neither are present, nothing is displayed. For this collection there is no ex.thumbicon metadata so the choice is not needed.

    Replace {Or}{[ex.thumbicon],[ex.srcicon]} (highlighted above) with [ex.srcicon].

    There is no exp.Title metadata, so remove that element from {Or}{[dc.Title],[exp.Title],[ex.Title],Untitled}.

    The resulting format statement looks like the following:

    <td valign=top>[link][icon][/link]</td>
    <td valign=top>[ex.srclink][ex.srcicon][ex./srclink]</td>
    <td valign=top>[highlight]
    {Or}{[dc.Title],[ex.Title],Untitled} [/highlight] {If}{[ex.Source],<br><i>([ex.Source])</i>}</td>

    Preview the collection to make sure the display hasn't changed. You shouldn't notice any difference when looking at search results, classifiers etc.

Linking to the Greenstone version or original version of documents

  1. For collections with documents that undergo a conversion process during importing (e.g. Word, PDF, PowerPoint documents, but not text, HTML documents), the original file is stored in the collection along with the converted version. The default VList format statement links to both versions:

    [link][icon][/link] links to the Greenstone HTML version, while [ex.srclink][ex.srcicon][/ex.srclink] links to the original.

    Choose SearchVList in Format Features by selecting Search from the Choose Feature drop down list, and VList from the Affected Component list. Click <Add Format> to add the SearchVList format statement into the list of assigned formats. Experiment with removing either of the two links from the format statement.

    To see the results of your changes, preview the collection and do a search. You are making changes to SearchVList, which means the changes will only apply to search results.

    Storing and displaying the original allows users to see the correct format, but requires the user to have the relevant program installed. It also increases the size of the collection. The Greenstone version can be viewed in a browser, but may not look as nice.

Making bookshelves show how many items they contain

  1. Next, we'll customize the format statement for the Creators list. Classifier bookshelves have only a few pieces of metadata to display: ex.Title and numleafdocs. Whatever metadata the classifier has been built on, the bookshelf label is always stored as ex.Title. This is why a Creator is printed out for each bookshelf even though dc.Creator is not specified in the format statement. [numleafdocs] is only defined for bookshelves, so this metadata can be used in an {If} statement to make bookshelves and documents display differently in the list.

    Make each bookshelf in the Creator classifier show how many entries it contains. In the Format Features section of the Format panel, select the CL2 AZCompactList classifier (which is based on dc.Creator metadata) from the Choose Feature drop down list, and VList from the Affected Component list. Click the <Add Format> button to add this format into the list of assigned formats. Note that it gets added as CL2VList in this list: it is the VList format for the second (CL2) classifier.

    Append the following text to the bottom of the format statement:

    {If}{[numleafdocs],<td><i>([numleafdocs])</i></td>}

    Preview the collection. Click on the Creators list and notice that the bookshelves now display how many documents they contain.

    This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf. Since only bookshelves define [numleafdocs], only they will display this. By modifying CL2VList instead of VList, the change will only apply to the second classifier (Creators).

Displaying multi-valued metadata

  1. Next we modify the document entries in the Creator classifier to display all authors. Back in Format Features, select the CL2VList format in the list of assigned formats. After {If}{[ex.Source],<br> in the format statement, add [sibling:dc.Creator].

    [ex.Source] is not defined for bookshelves, so can also be used to differentiate bookshelves and documents.

    The resulting format statement looks like:

    <td valign=top>[link][icon][/link]</td>
    <td valign=top>[ex.srclink][ex.srcicon][ex./srclink]</td>
    <td valign=top>[highlight]
    {Or}{[dc.Title],[ex.Title],Untitled}[/highlight]
    {If}{[ex.Source],<br>[sibling:dc.Creator]
    <i>([ex.Source])</i>}</td>
    {If}{[numleafdocs],<td><i>([numleafdocs])</i></td>}

    This will display the Greenstone link, the link to the original, then the Title. For bookshelves, it will also display how many documents the bookshelf contains. For documents, it will display all the Authors (Creators), and the source document. [sibling:dc.Creator] displays all the Creator metadata for the document, separated by a space (" "), while [dc.Creator] displays only the first author. Preview the Creators list and make sure that all authors are displayed for documents.

  1. You can change the separator between the authors. Modify the format statement, and replace [sibling:dc.Creator] with [sibling(All'<br/>'):dc.Creator]. This will add a new line after each author (<br/> specifies a line break in HTML). Preview the Creators list.

    If you have done exercise Enhanced Word document handling, the collection will have both dc.Creator and ex.Creator metadata. To display both, you can use

    [sibling:dc.Creator] [sibling:ex.Creator]

    To display dc.Creator if it is present, otherwise display ex.Creator, use

    {Or}{[sibling:dc.Creator],[sibling:ex.Creator]}

Advanced multi-valued metadata

  1. You may notice that the AZCompactList classifier's configuration dialog has two options after the metadata option: firstvalueonly and allvalues. Manually added metadata can be used to replace or enhance automatically extracted metadata, and these options control exactly which pieces of metadata a document is classified by.

    For example, say we have two documents. Document 1 has four Creators specified (dc.Creator = dcA, dc.Creator = dcB, ex.Creator = exA, ex.Creator = exB), while document 2 has three (ex.Creator = exA, ex.Creator = exB, ex.Creator = exC). The following table shows which metadata values each document is classified by, for the different classifier options:

    AZCompactList optionsDocument 1Document 2
    -metadata dc.Creator,ex.CreatordcA, dcBexA, exB, exC
    -metadata dc.Creator,ex.Creator -firstvalueonlydcAexA
    -metadata dc.Creator,ex.Creator -allvaluesdcA, dcB, exA, exBexA, exB, exC
  1. We'll now set the firstvalueonly option for the Creators classifier. Switch to the Browsing Classifiers section of the Design panel, select the AZCompactList for dc.Creator metadata in the Assigned Classifiers box and click <Configure Classifier...>. Select the firstvalueonly option.

    Rebuild and preview the collection. Now the Creators list classifies documents based on the first author appearing in the dc.Creator metadata.

    If you set the metadata field of AZCompactList to dc.Creator,ex.Creator in the A collection of Word and PDF files exercise, now the Creators list will classify based on the first author appearing in either the dc.Creator metadata or the ex.Creator metadata.


Copyright © 2005-2016 by the New Zealand Digital Library Project at the University of Waikato, New Zealand
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License.”