Tutorial: Building a Text Collection with Zotero
Why are we doing this?
Over the course of the semester you'll be building a corpus (or perhaps multiple corpora) for in-class exercises, and perhaps for your own projects. Having tools and a strategy for managing those corpora is an essential first step. In this course, we'll be using a tool called "Zotero" to manage our text collections.
Zotero is a bibliographic management tool, similar in many respects to EndNote and RefWorks. It is completely open source, and is thus preferred by most users in the digital humanities community. If you've used Mendeley, you'll notice many similarities -- Mendeley was actually a commercial spin-off, now owned by Elsevier.
There are three big perks to using Zotero to manage corpora:
- You can sync your materials with a remote server. This allows you to access them from multiple locations, including through a web interface, and means that you have a remote backup. You can create a free account that provides 300 MB of storage space, which is plenty for the purpose of this course. If you want more space, for larger projects, you can purchase a monthly subscription at fairly low cost (e.g. 2GB for $1.67/month).
- Zotero is good at automatically generating metadata and scraping content from the internet. When searching scholarly databases, you can often import big batches of bibliographic entries and, in some cases, the full-text files associated with them in just a few clicks.
- Zotero makes it easy to associate content with bibliographic entries. This last point is really important, because it means that Zotero can be used to manage much more than just traditional bibliographic data. You could, for example, use Zotero to manage digitized materials from an archive. Or any number of other things.
Before you start
- Download & install the [Firefox web browser](http://www.mozilla.org/en-US/firefox/new/).
- Download & install Zotero & Paper Machines
- Go to https://www.zotero.org/download/
- Download Zotero Standalone, install it, and launch the application; then
- close it. That first launch will initialize your database and profile on your computer.
- Using Firefox, go back to the download site; install the Firefox browser extension.
- Restart Firefox.
- Check out the Zotero Quick Start Guide: http://www.zotero.org/support/quick_start_guide
- Register for a new account on http://www.zotero.org
Connecting to your Zotero account
- Open Zotero by clicking on the "Zotero" logo in the bottom right.
- Click the gear icon, and go to "Preferences"
- Click "Sync"
- Enter your username and password, and make sure that all boxes are checked.
- Close the preferences window.
Now, Zotero should automatically sync your library with a remote server as you add material. To manually sync, you can click the green circular arrow at right.
Organizing your library
You can sort your library into "collections" and "sub collections"
- Right-click on "My Library" in the browser at left.
- Click "New Collection"
- Name your collection, and click "OK"
- Click on one of your items, and drag it into the collection.
Note
Items can belong to many collections, or none. Removing an item from a collection does not delete it from your library.
Adding Data
There are a few different ways to add data to your Zotero library.
- You can import bibliographic records from other reference managers,
- You can add records manually, and
- You can add records automatically through your web browser.
If you're interested in using the import feature, we can talk about that later. For now, we'll start by adding records manually.
- Download a PDF of a journal article from your favorite bibliographic database
(e.g. PubMed, Google Scholar). - Click the green and white "plus" icon.
- Click "Journal Article"
- Fill in the appropriate metadata fields for your journal article.
- Link the PDF to the bibliographic entry by dragging and dropping.
- Can also click on the "paperclip" icon, and select Attach Stored Copy of File.
- By default, Zotero will make a copy of the file, and store it in its own library. So if you delete the original, it won't affect the copy in your
library. UNLESS you link the file, rather than saving a stored copy.
Import Items from the Web
Now let's try the web importer.
- Navigate to PubMed Central: http://www.ncbi.nlm.nih.gov/pmc/
- Search for something, e.g. "model organism"
- Click on the first result
- Notice the icon in the address bar that looks like a sheet of paper. This means that Zotero thinks that it can extract bibliographic data from this page.
- Click the sheet-of-paper icon
- Notice the dialog in the bottom right, as Zotero imports the record.
- You should now see an entry in your Library!
- Click the arrow next to the new entry to see the resources that Zotero has
associated with that entry.- Note the link to the PubMed website; double-clicking that will take you to the original page on PMC.
- Note also that it found a PDF!
- Double-click the PDF: it should open in your browser.
- You can also right-click, and select "Open in External Viewer"
Importing Multiple Entries
The Zotero web importer can import multiple entries at the same time.
- Go back to your search results on PMC.
- Note the icon in the shape of a folder in the address bar.
- Click on that folder icon.
- Select the search results to import, then click "OK"
- Note dialog in the lower right
- You should see those new entires in your library.
Biodiversity Heritage Library
- Go to http://www.biodiversitylibrary.org/
- Search: "origin of species"
- Click first result: "On the origin of species"
- Note Zotero recognizes bibliographic entry; go ahead and import
- Note that metadata fields are populated; do you see any problems?
- Click "View book" on the right side of the page.
- You can navigate page images by scrolling, or using the menu in the upper right.
- Click page 23 in the menu at right.
- Click "Show OCR" tab. This shows plain text that has been automatically
extracted from the page.- We want to attach the full OCR text to our bibliography entry...
- Click: Download Contents > Download Book > Download OCR
- Go to (in the menu at the top of your browser window): File > Save Page As --
pick an easy-to-find spot - In Zotero, click on the entry for this document.
- Click the Paperclip Icon > Attach stored copy of file.
- Select the plain text file that you saved.
Digital HPS Repository
- Go to http://hpsrepository.asu.edu/
- Click on the links: History of Evolutionary Ecology > Evolutionary Ecology Oral Histories
- Click on the item: [Oral history interview with Stanton A. Cook about his
ecological research in California and Oregon. Part 1 of 3.](http://hpsrepository.asu.edu/handle/10776/6096) - Here we'll use the "import web page" function in Zotero: click the blue page
icon in your address bar.- Note that most metadata fields have been filled (including URL)
- You can select a different item type (e.g. audio recording), but you may lose some fields.
- Now we want to add the transcript of this interview to the item in Zotero.
- Right click on the "View/Open" link for the transcript bitstream, and "Save link as"
- In Zotero, click on the paperclip icon, and select the transcript file.
- For the audio file, may not want to add to library (too big) -- this would be a good candidate for linking to, rather than storing, a resource.
- Right click on the "View/Open" link for the audio bitstream, "Copy link location"
- Click the paperclip icon > Attach link to URI.
- Paste in the URL for the audio bitstream.