AI in historical dictionaries and text corpora

News

published on 05. December 2025

As part of the Academy project 'Middle High German Dictionary', a meeting on the topic of 'AI in historical dictionaries and text corpora' will take place on December 10 at the Academy of Sciences and Literature. The aim of the event is to exchange ideas on the effective use of AI in research and to promote networking among scientists.

The discussion will focus on the participants' experiences with AI approaches and opportunities for further developing existing approaches. The researchers will also elaborate on the challenges of applying AI in the field of historical dictionaries and text corpora. These challenges are due, on the one hand, to the great variety of forms of individual words, which is partly due to the lack of standardised spelling and regional peculiarities, and, on the other hand, to the underrepresentation of relevant language levels in models that use ›the internet‹ as training data. The discussion will also address the following issues:

  • Grouping of lemma occurrences according to similarity
  • Distributional semantics/word embeddings as the basis for today's AI approaches
  • Lemmatization of occurrences in preparation for data-driven dictionary work
  • Semantic annotation and indexing of historical texts and dictionary articles

The discussion will be organized by Patrick D. Brookshire (Digital Academy of the AdWL) and Jonas Richter (Göttingen Academy of Sciences in Lower Saxony).

Involved researchers:

Niels Bohnert (Trier Research Unit of the AdWL)
Dr. Luise Borek (Technical University of Darmstadt & Union of Academies)
Julia Hintersteiner (Paris Lodron University of Salzburg, Austria)
Dr. Nora Ketschik (University of Stuttgart)
Sarah Oberbichler (Leibniz Institute for European History in Mainz)
Ismail Prada Ziegler (University of Bern, Switzerland)
Ute Recker-Hamm (Trier Research Unit of the AdWL)
Jan Schaffert (Göttingen Academy of Sciences in Lower Saxony)
Dr. Tobias Streck (University of Freiburg)
Dr. Stefan Tomasek (Julius Maximilian University of Würzburg)

The long-term project 'Middle High German Dictionary' (MWB) compiles a dictionary of Medieval High German from the period between 1050 and 1350. It covers the vocabulary and usage of the entire spectrum of German-language texts from this period, including the Nibelungenlied and classical Middle High German epic and lyric poetry, as well as German-language documents, law books, chronicles, non-fiction texts, and works of German-language mysticism. The MWB is an inter-academic project of the Academy of Sciences and Literature in Mainz and the Academy of Sciences in Göttingen.

Walther von der Vogelweide. ›Große Heidelberger Liederhandschrift - Cod. Pal. germ. 848‹. Middle High German Dictionary. University Library Heidelberg. CC BY-SA 3.0 DE.