This proposal is for a discussion and brainstorming session on the meanings of data in the humanities (and specifically history). As Miriam Posner has pointed out, most humanities scholars do not think of their work with sources as “extracting features in order to analyze them” but rather see the source material as something to “dive into … like a pool” so as to “understand it from within” (Posner 2015). Yet digitization has created tons of text that could potentially be, and to some extent is being, mined, from government reports to literature. How, then, do we combine the humanities impulse to understand our sources from within with the possibilities of processing and analyzing them en masse?
The idea in the session is to engage participants in a discussion about the kinds of sources they routinely deal with and the ways in which those sources could be thought of as repositories of “data” – conceptualized preliminarily as extracting some kind of information that is somehow different from the kind of understanding achieved through reading the sources one by one. It might be topic modeling a set of texts, creating a social network from connections they reveal, examining the kinds of language used… But rather than focusing on the tool, I’d like to focus on how historians and other humanities scholars conceptualize, or might conceptualize, “data” within their sources.