Introductory Notes
By and large, what has made the very notion of computational humanities thinkable is the digitization of texts. This includes the conversion of print texts into digital form as well as the digital production of texts—the vast majority of texts produced in the 21st century are “born digital,” even if they are subsequently circulated in print form.
This course is an entry-point into interrogating digital texts. By design, this course focuses mostly on method and goes light on theory. We make the minimal assumption that texts are emissions of historical/cultural processes. The ways in which we theorize the relationship between texts and those underlying processes depend very much on the discipline in which we work. My hope is that as we grapple with these analytic methods, we can reflect on how to incorporate them into our own respective disciplinary frameworks. Translating qualitative theories from disciplines in the humanities into quantitative theories and models is a major problem-area for digital humanities.
What we will focus on in this course are ways of identifying theoretically relevant features in (or around) texts using computational methods, and using those features as the basis for quantitative analysis. Given the limited time available, you will not be an expert in any of these methods at the end of the course. Instead, my goal is to give you a thorough enough introduction to a range of methods that, as you develop your research project, you will know roughly where to start.
Programming Required
Format
This course will mix short lectures with hands-on coding exercises. The course is divided into bite-size "modules". For each module, we will start with 15-20 minutes of lecture on the core concepts of the module. We will then (usually) work through some code samples together, with further exposition on Python coding techniques. You will then have time to play with the code, either by tinkering with the parameters or applying it to other datasets.
We will use the SageMath cloud platform for the computational exercises. This provides a standardized, stable environment for running code samples.