Tutorial: Identifiers, Names, and Dates in Metadata
Before You Start
Before you get started, be sure to complete the following tutorials and background modules:
- Tutorial: Building a Text Collection with Zotero
- Background: What is the Semantic Web?
- Readings:
- Sheila A. Bair and Sharon Carlson. 2008. "Where Keywords Fail: Using Metadata to Facilitate Digital Humanities Scholarship."
- Christina Manzo et al. 2015. "By the People, For the People": Assessing the Value of Crowdsourced, User-Generate Metadata. Digital Humanities Quarterly 9 (1).
Some of the most innovative advances in digital humanities have involved finding new ways to leverage metadata for discovery and inference. Techniques like co-citation analysis and co-authorship network modeling were two of the earliest examples of how metadata can be used to study scientific activity. But what we can do with our metadata depends on the quality of that metadata.
As you should recall from Background: What is the Semantic Web?, RDF (the core technology of the Semantic Web) involves describing relationships among resources. The term "resource" includes documents (e.g. a digitized archival document), but it also refers to other concepts (e.g. people or places) that are related to documents (e.g. as authors, or places of publication). We refer to resources using Uniform Resource Identifiers (URIs). In most cases, an URI usually looks something like a web address.
We'll talk more about URIs later on. For now, the important thing to note is that an URI provides an unambiguous way of referring to a concept. As we build and curate our collections in Zotero, a crucial step is disambiguating the concepts the concepts in our metadata – in other words, replacing vague entries with those "unambiguous ways of referring to those concepts".
In this short tutorial, we'll look at disambiguating three important components of our metadata: documents, personal names, and dates.
Referring to Documents
There are several systems for referring to published documents. A familiar system for identifying books is the International Standard Book Number, or ISBN. Each edition of a book is given a unique number. If you know that number, you can look up the corresponding edition in a bibliographic database (e.g. WorldCat, or your local library catalog).
A widely recognized system for referring to publications like research articles is the Digital Object Identifier system. Most publishers now assign DOIs to each article that they publish. If you know an article's DOI, you can get metadata about the article (including its location) by entering the DOI in the search bar at https://www.doi.org/.
In Zotero, you can automatically load metadata about a resource by clicking on the wand icon, and entering an ISBN, DOI, or PubMed ID (PMID).
As you curate your collection, you should make sure that each resource in your collection has some kind of unique identifier. If you can't find anything better, you can use the URL field to specify where you found the document.
For materials that are not online (e.g. a letter that you found in a physical archive), you can use the Archive and Loc. in Archive fields to identify the resource. What you put in "Loc. in Archive" depends very much on the structure of the archive, but in general this should provide an unambiguous way of locating the specific resource that you are describing. For example, you could use the archive's collection identifier, box, bundle, and item numbers (e.g. D1041/5/2/31/5).
Referring to People
Name authorities have been around in the library world for quite a long time. In the United States, the most widely used name authority is the Library of Congress Name Authority File. In Germany, the Deutsche National Bibliothek provides a name authority (e.g. here is an entry).
One of the problems with systems like the LoC Name Authority is that they have historically relied on standardized ways of spelling names. Over time, variations in spelling eventually creep in, and the system starts to break down. To remedy this, authorities began using alphanumeric identifiers. In the DNB, for example, people are referenced using identifiers like "185836917".
Another problem is that many of these name authorities (including the LoC and DNB authorities) are national. This means that for a single person there may be entries in several authorities, and therefore several ways of referring to that person. One attempt to remedy this problem is the Virtual International Authority File (VIAF). VIAF aggregates national name authorities, and issues its own URIs to refer to people that are represented in those various authorities. For an example, take a look at this entry for Christopher Isherwood: http://viaf.org/viaf/76317308/#Isherwood,_Christopher,_1904-1986. Look for the permalink field: that's the VIAF URI for Christopher Isherwood.
Wherever possible, you should use VIAF URIs to refer to people (e.g. Authors) in your metadata. In Zotero, rather than replacing the human-readable author names, you can add new Author fields that contain the URIs. Order them however you like, so that you can keep track of which URI belongs to which human-readable name. In the end, however, we'll use those URIs (rather than the readable names) to identify people in our corpora.
A final problem is that these name authorities – including VIAF – are populated by way of published works, e.g. books or articles. If you are working with materials that were not published, or that are not well known, you may not find an entry in VIAF for all of the people related to that work. At ASU, we provide an online authority service called Conceptpower to address that problem.
You can search for concepts here. To find the URI for an entry in Conceptpower, click on "Details" in the search results; a modal should open with metadata about the entry, including its URI. Use Conceptpower URIs just like VIAF URIs in your metadata.
If you need to add new records to Conceptpower, let us know and we'll set up an account for you.
Representing Dates
Representing time with URIs would be a bit unwieldy, so we don't. The trick to representing dates and times unambiguously in metadata is to use a standard. A standard describes how dates and times should be written, so that computers and humans can understand them precisely. The most popular standard for representing time is known as ISO 8601. For an introduction to ISO 8601, see this brief article. You can ignore the bits related to programming – the important thing is to understand the format itself, so that you can use it to curate your metadata.
In your Zotero collections, this will mostly apply to publication dates. Be sure that all of your dates are written in the ISO 8601 format (YYYY-MM-DD). In the example below, notice the characters "y m d" to the right of the Date field: Zotero understands a variety of date standards, including ISO 8601. The "y m d" characters mean that Zotero can recognize which part is the year, which part is the month, and which part is the day of the month.