Abstract
While Old Irish (c. 600–900 A.D.) is extensively documented, it remains digitally under-
resourced, lacking the range of digital resources available for other older Indo-European
languages (e.g., Latin, see Pellegrini and Passarotti, 2018). We report on the development
of a fully inflected lexicon of Old Irish nouns, provided in both phonemic and orthographic
notation. This involved a computer-assisted, systematic, and reproducible grapheme-to-
phoneme conversion pipeline and generating morphological forms through a finite-state
transducer. The inflected lexicon we develop will better enable computational studies in
Old Irish morphology, further research into diachronic developments, and have a wide
range of Natural Language Processing (NLP) applications.
We began by extracting noun lemmata from the Old Irish Würzburg glosses (Kavanagh,
2001) and the Corpus PalaeoHibernicum (CorPH) ‘Old Irish Corpus’ (Stifter et al., 2021). We
then devised a set of rules for orthography-to-phonology conversion, subsequently
implemented using the Python package Epitran (Mortensen, Dalmia, and Littell, 2018). The
resulting transcriptions act as the data input for a finite-state transducer (FST) adapted
from Fransen (2019), allowing us to generate inflected forms of Old Irish nouns. Finally,
we derived orthographic forms (and their variants) by applying conversion rules to the
generated forms.
Old Irish presents considerable challenges for the development of a resource of this
nature, given its opaque and inconsistent orthography, complex phonology, elaborate
system of morphophonological alternations, and intricate patterns of morphological
inflection (Anderson, 2016; Stifter, 2009; Thurneysen, 1946; Pedersen, 1909–1913). We
report on how we dealt with these problems in the development of the inflectional
lexicon. While this study focused on the Old Irish nouns in the Würzburg glosses, we
intend to extend the lexicon by applying this pipeline to further corpora and other parts-
of-speech. This inflected lexicon makes possible systematic studies in data-driven
morphology and typology (Pellegrini, 2020; Beniamine, Bonami, and Luís, 2021;
Beniamine, 2021), and facilitates future research into diachronic and diatopic variation in
Irish and the development of further NLP applications for the language.
References
Anderson, Cormac (2016). “Consonant colour and vocalism in the history of Irish”. PhD
thesis. Uniwersytet im. Adama Mickiewicza w Poznaniu. URL:
https://hdl.handle.net/10593/14780.
Beniamine, Sacha (2021). “One lexeme, many classes: inflection class systems as lattices”.
In: One-to-Many Relations. Ed. by Berthold Crysmann and Manfred Sailer. Berlin:
Language Science Press.
Beniamine, Sacha, Olivier Bonami, and Ana R. Luís (2021). “The fine implicative structure
of European Portuguese conjugation”. In: Isogloss 7.9, pp. 1–35. DOI:
https://doi.org/10.5565/rev/isogloss.109.
Fransen, Theodorus (2019). “Past, present and future: Computational approaches to
mapping historical Irish cognate verb forms”. PhD thesis. Trinity College Dublin,
The University of Dublin. URL: https://github.com/ThFransen84/OIfst.
Kavanagh, Séamus (2001). A Lexicon of the Old Irish Glosses in the Würzburg Manuscript of
the Epistles of St. Paul. Ed. by Dagmar S. Wodtko. Mitteilungen der Prähistorischen
Kommission 45. + 1 CD-ROM. Wien: Verlag der Österreichischen Akademie der
Wissenschaften. DOI: 10.1553/0x0001fb6e.
Mortensen, David R., Siddharth Dalmia, and Patrick Littell (May 2018). “Epitran: Precision
G2P for Many Languages”. In: Proceedings of the Eleventh International Conference
on Language Resources and Evaluation (LREC 2018). Ed. by Nicoletta Calzolari
(Conference chair) et al. Miyazaki, Japan: European Language Resources
Association (ELRA).
Pedersen, Holger (1909–1913). Vergleichende Grammatik der keltischen Sprachen. 2 Vols.
Göttingen: Vandenhoeck & Ruprecht.
Lingua originale | English |
---|---|
Titolo della pubblicazione ospite | International Congress of Celtic Studies XVII Utrecht |
Pagine | 25-26 |
Numero di pagine | 2 |
Stato di pubblicazione | Pubblicato - 2023 |
Evento | International Congress of Celtic Studies XVII Utrecht - Utrecht Durata: 24 lug 2023 → 28 lug 2023 |
Convegno
Convegno | International Congress of Celtic Studies XVII Utrecht |
---|---|
Città | Utrecht |
Periodo | 24/7/23 → 28/7/23 |
Keywords
- inflected lexicon
- Old Irish
- grapheme-to-phoneme conversion