Greenstone tutorial exercise
Incremental building of a collectionCollections built with the Lucene indexer support incremental addition, updates, and deletion of documents and metadata. By default, the import and build processes delete old index files in the index directory and intermediate files in the archives directory. With incremental building, the import and build process will keep the old files and only process the new or modified ones.Incremental import can be done with any collection, but incremental modification of the indexes can only be done for collections that use the Lucene indexer.The first part of this tutorial looks at using The Depositor for incremental building. The Depositor only supports addition of new documents and associated metadata. If you want to delete or modify existing documents and their metadata, you will need to use GLI or command line building.
The Depositor is Greenstone’s runtime support for institutional repositories. It provides the collection building work flow through a web interface. The Depositor only works with the Web library server, not the local library server. Greenstone users belonging to the all-collections-editor user group have access to The Depositor.
Enabling The Depositor
For Windows users, first make sure that you are using a Web Server (e.g. Apache) instead of the Local Library Server. The binary installation of Greenstone will install Apache, but by default the Local Library Server will be used. To switch to using Apache, rename the GSDLHOME → server.exe file to something else. Then re-run the Greenstone Server, from the Start → Greenstone Server menu if on Windows, followed by pressing its Enter Library button. (On unix systems, run ./gs2-server.sh from Greenstone 2's installed location to start up the Greenstone server.)Note: You might need to set permissions for the GSDLHOME → tmp and GSDLHOME → collect or GSDLHOME → collect → your_accessible_collection directory.In Greenstone, The Depositor is disabled by default. To enable it, edit the file GSDLHOME → etc → main.cfg. Look for the "depositor" line, and change disabled to enabled. Then save and close the file.
Setting a user group
Use of The Depositor involves an authentication step. A user will need a Greenstone account which belongs to an appropriate user group. The all-collections-editor user group gives access to edit any collection, while the ***-collection-editor group gives a user access to edit the *** collection, where *** is the collection's short name (or directory name). By default, the admin account is a member of the all-collections-editor group.The Greenstone admin pages are used to add new users and modify their group settings. Admin pages may have been enabled when you installed Greenstone. If not, they can be activated by changing the "status" line in the main.cfg file and changing disabled to enabled.
- To access the administration pages, go to your Greenstone home page when the Greenstone server is running and click the Administration Page (below the list of collections). To see the list of users, click the list users link on the left under User management section. You will need to sign in. You can use the admin account, or any other account which has been added to the administrator group. If you didn't set up the admin pages when you installed Greenstone, then a default admin account will be created with password "admin". Please change this immediately.
- Let's modify the groups for the demo user. This user was added for the authentication demonstration collection to allow restricted access to some of the documents. If this user doesn't exist for you, create a new user by clicking on the add a new user link under the User management section on the left. Give it the name "demo" and password "demo". Click submit. Back in the Administration Pages, click the list users link and the new user "demo" should be listed there now.
- We'll give this user access to modify the Demo Lucene collection that we will be using for this tutorial. If you have given the collection the title "Demo Lucene", then its short name is likely to be "demoluce". You can check this in GLI: Open the Demo Lucene collection, go to Format->General, and look for the collection folder item. Here we assume demoluce.
- In the list users page, at the end of each user entry there are two links: edit and delete. Click edit on the demo user account, and you will be shown more detailed information about the demo user. Add demoluce-collection-editor at the end of the groups line, using a comma to separate group entries, so that the groups field now contains:
demo,demoluce-collection-editor. (Note, if your lucene collection's shortname is not demoluce, then replace demoluce with the appropriate name in ***-collection-editor.)
- Click submit. Click the Greenstone home link on the left side and return to the Greenstone home page.
Use the Depositor to do incremental addition
- On the Greenstone library home page, click The Depositor button. You will see a drop-down selection list of all the available collections. Select Demo Lucene from the list and sign in with the demo account.
- The next page asks you to fill in the metadata fields — Title, Organization, Subject, Keyword and Language. These metadata fields are from the Development Library Subset (DLS) metadata set, which is the metadata set used in the Demo Lucene collection. In order to ensure the new document will be displayed in the classifiers, we will next specify these metadata for the new document.
The default metadata fields that would be displayed here for a new collection are the Title, Creator and Description from the Dublin Core Metadata Set. You can customize which metadata fields are required for items added through The Depositor in the Depositor Metadata section on the Format panel in the Greenstone Librarian Interface.
We are going to deposit this file: sample_files → demo_NewFiles → r9006e.htm. Double click r9006e.htm and have a look at its content. Type the following in the Title field:
Selected guidelines for the management of records and archives: a RAMP reader (r9006e)
(Note, You can copy this and the following metadata values across from the sample_files → demo_NewFiles → r9006e-metadata.txt).In the Organization field, type UNESCO
In the Subject field, type:
Communication, Information and Documentation|Records and Archives Management Programme (RAMP) of UNESCO, Archive Management
In the Keyword field, type:
manage records and archives
Finally in the Language field, type: English
- Click the Select File button. Click the Choose File button and select sample_files → demo_NewFiles → new → r9006e.htm, click the Confirmation button and check the document has been uploaded successfully.
- Click the Deposit Item button and wait for the process to finish. You will see the Collection built successfully. message if the collection has been built successfully, or error messages if something has gone wrong.
- Click View collection to preview the newly built collection and check that the newly added document is displayed correctly. For example, in the organizations classifier you should find a new bookshelf named UNESCO, which contains the new document.
Batch addition with the Depositor
The Depositor also supports batch addition of new documents. This is achieved by zipping up the new documents (together with their metadata files) and depositing the zip file. Please note that the collection must have ZIPPlugin in order to be able process the uploaded zip file, otherwise you'll first need to add the ZIPPlugin through the Librarian Interface.
- Go to the Greenstone's home page and click The Depositor button. Select Demo Lucene from the list and log in if asked to do so again.
- Leave the metadata fields blank, because the zip file we are adding contains metadata.xml files which specify these metadata values. Click the Select File button, select sample_files → demo_NewFiles → new_files.zip, which contains two new HTML documents along with their associated images and metadata.xml files. Click Confirmation and then the Deposit Item button.
A major benefit of using The Depositor is that the user can upload documents and metadata remotely, without having to have Greenstone installed at the client end. The Depositor is a tool for remote data input, allowing you to also deposit items to collections built with the MG or MGPP indexers. The difference is that the MG and MGPP indexers need to rebuild the entire index after adding a new item, while the Lucene indexer incrementally adds the new document to the existing index.
- After the building is finished, click View collection to preview the collection. On the collection's home page, it says the collection now contains 14 documents. Check the titles classifier to see that the new documents Above and beyond and Utilization and construction of pit silos have been added successfully.