Analysis of handwritten Czech-language manuscripts from the 18th century
The Institute for Slavic Studies at Humboldt-Universität zu Berlin is examining Czech-language “Rixdorf Sermons” manuscripts from Bohemian-Rixdorf in Berlin that have survived from the 18th century in order to find out, with the help of computer-assisted analysis programs, how language change and cultural exchange can be traced in them. The three-year interdisciplinary research project, which began in spring 2017, was funded by the Volkswagen Foundation as part of the “Mixed Methods” initiative in the humanities.
“Rixdorf Sermons” – Manuscript Identification and Linguistic Author Recognition
Around 5,000 pages, handwritten, were created between 1740 and 1830 – documents that record the life of a small, originally Czech-speaking community of religious refugees from Bohemia. The underlying story is the flight of around 350 Hussite believers from eastern Bohemia, who found a new home in Berlin-Rixdorf from 1737 onwards. Their records are kept in an archive in Berlin-Neukölln.
In the interaction of both method strands – textual science and image/pattern recognition – meaningful features for the following task fields were to be found and tested:
- Writer identification in a large number of handwritten texts
- Uncovering text-historical layers through writer differentiation
- Locating and quantifying recurring text pieces
- Identification of text-historically important handwriting features
The focus of the task for the MusterFabrik Berlin was the development of methods for analyzing the context-specific structure of image patterns in the digitized manuscripts in order to derive qualitative statements about the authors and the content of the sermons.
For this purpose, recurring image patterns such as contextual concatenations in the form of letters, words, or sentence phrases were detected in the digital data by developing and applying artificial intelligence methods, and their position-related occurrence was structurally recorded. From this, a so-called structured network of image patterns was derived, which describes the detected recurring image patterns, their number as well as their position-related connection to each other. Based on this, a qualitative analysis of the manuscripts was carried out in cooperation with experts from the Humboldt University in Berlin. In the process, the context-specific relationships determined automatically in advance with the aid of machine learning methods were analyzed to determine whether qualitative statements could be derived from them.
For such a structural analysis, MusterFabrik Berlin first identified suitable image features for structural evaluation. Based on this, a module for the detection of image patterns was developed. In addition, machine learning methods were developed for position-based linking and context-specific analysis. Finally, an analysis was carried out in collaboration with staff from the Humboldt University of Berlin to derive qualitative statements from the results obtained.
Our project partners: