Towards Computational Historiographical Modeling: Corpora and Concepts

SNSF project funding (PI), Running

This SNSF project has been approved by the SNSF on 2021-09-24 (grant no. 105211_204305); start: 2022-02-01; duration: 48 months.


So far, digital humanities has largely contented itself with borrowing methods from other fields and has developed little methodology of its own. In our Spark pilot project An Agile Approach Towards Computational Modeling of Historiographical Uncertainty we have shown that the almost exclusive focus on methods and tools represents a major obstacle towards the construction of computational models that could help us to obtain new insights into humanities research questions (which are ultimately qualitative, why? questions) rather than just automate primarily quantitative processing.

In the proposed project we therefore want to focus on two issues we have identified as particularly pressing, and which together constitute a critical research gap:

  1. regardless of the application domain, digital humanities research tends to rely heavily on corpora, i.e., curated collections of texts, images, music, or other types of data. However, the epistemological implications have so far been largely ignored. We propose to consider corpora as phenomenotechnical devices (Bachelard 1968), like scientific instruments: corpora are, on the one hand, models of the phenomenon under study; on the other hand, the phenomenon is constructed through the corpus. We therefore want to study corpora as models to answer questions such as: How do corpora model and produce phenomena? What are commonalities and differences between different types of corpora? How can corpora-as-models be formally described in order to take their properties into account for research that makes use of them?
  2. Models of complex phenomena generally rely heavily on numerous concepts, e.g., (in history) textuality, feudalism, state, class, etc. Such concepts are effectively references to “submodels,” which serve as building blocks for larger models. Traditionally, these models were largely implicit and not formalized. This becomes a serious epistemological problem in digital humanities, because these concepts are the foundation for selecting data and building corpora. For example, a corpus of letters is based on the concept of “letter” (as distinct from other writings), or a data set for comparing some aspect of preliterate and literate societies is based on the concept of “literacy” (as distinct from “illiteracy”). The lack of a formalization of these concepts is currently a major weakness of computational research in the humanities: while the quantitative computational analyses are highly formalized, their qualitative foundations are shaky. Using the concept of “textuality,” central in the context of medieval manuscript studies, as a case study, we will investigate concepts as models: How do they function and how are they used? Are there structural similarities that would allow us to create a metamodel for formalizing concepts?

The project will examine these issues in a historical context, but these are general issues in digital humanities, and we envision the results to be transferable to other contexts. We expect the project to make an important contribution to theory formation and help advance the digital humanities from project-specific, often ad hoc, solutions to particular problems to a more general understanding of the issues at stake.


Bachelard, Gaston. 1968. Le nouvel esprit scientifique 10th ed. Paris: Les Presses universitaires de France. (Original work published 1934)



Funded by the the SNSF (grant no. 105211_204305).

Swiss National Science Foundation