Greenstone tutorial exercise
Downloading files from the web
The Greenstone Librarian Interface's Download panel allows you to download individual files, parts of websites, and indeed whole websites, from the web.
- Start a new collection called webtudor, and base it on -- New Collection --.
- In a web browser, visit http://englishhistory.net, follow the link to The Tudors. You should be at the URL
This is where we started the downloading process to obtain the files you have been using for the tudor collection. You could do the same thing by copying this URL from the web browser, pasting it into the Download panel, and clicking the <Download> button. However, several megabytes will be downloaded, which might strain your network resources—or your patience! For a faster exercise we focus on a smaller section of the site.
- Go to the Download panel by clicking its tab. There are five download types listed on the left hand side. For this exercise, we only use the Web type. Make sure this is selected in the list.Enter this URL
into the Source URL box. There are several other options that govern how the download process proceeds. To see a description of an option, hover the mouse over it and a tooltip will appear. To copy just the citizens section of the website, switch on the Only files below URL option by checking its box and set the Download Depth option to 1. If you don't do this (or if you miss out the terminating "/" in the URL), the downloading process will follow links to other areas of the englishhistory.net website and grab those as well. Also switch on the Only files within site option to avoid downloading any items on the site pages that actually emanate from outside it (like google ads).
- If your computer is behind a firewall or proxy server, you will need to edit the proxy settings in the Librarian Interface. Click the <Configure Proxy...> button. Switch on the Use proxy connection? checkbox. Enter the proxy server address and port number in the Proxy Host: and Proxy Port: boxes. Click <OK>.
- Now click <Download>. If you have set proxy information in Preferences..., a popup will ask for your user name and password. If you're on Windows Vista or later, Windows may show a popup message asking whether you wish to block or unblock the download. In such a case, choose to unblock. With proxy settings turned on, it may take a short while before GLI starts downloading. Once the download has started, a progress bar appears in the lower half of the panel that reports on how the downloading process is doing.
More detailed information can be obtained by clicking <View Log>. The process can be paused and restarted as needed, or stopped altogether by clicking <Close>. Downloading can be a lengthy process involving multiple sites, and so Greenstone allows additional downloads to be queued up. When new URLs are pasted into the url box and <Download> clicked, a new progress bar is appended to those already present in the lower half of the panel. When the currently active download item completes, the next is started automatically.
- Downloaded files are stored in a top-level folder called Downloaded Files that appears on the left-hand side of the Gather panel. You may not need all the downloaded files, and you choose which you want by dragging selected files from this folder over into the collection area on the right-hand side, just like we have done before when selecting data from the sample_files folder. In this example we will include everything that has been downloaded.Select the englishhistory.net folder within Downloaded Files and drag it across into the collection area.
- Switch to the Create panel to build and preview the collection. It is smaller than the previous collection because we included only the citizens files. However, these now represent the latest versions of the documents.