Abstract
This paper describes an inflectional lexicon of Old Irish nouns, and the tools developed for its cre-
ation. While Old Irish (c. 600–900 A.D.) is extensively documented, it remains digitally under-resourced.
We develop a morphological description in the form of a fully inflected lexicon of Old Irish nouns, pro-
vided in both phonemic and orthographic notation. This entailed devising a computer-assisted, systematic,
and reproducible grapheme-to-phoneme conversion pipeline and generating morphological forms through
a finite-state transducer. We report on the considerable challenges posed by Old Irish in terms of its mor-
phophonological complexities and its intransparent and inconsistent orthography. The inflected lexicon we
develop will better enable computational studies in Old Irish morphology, further research into diachronic
developments, and have a wide range of Natural Language Processing (NLP) applications.
Despite the fact that “Old Irish is the earliest period of Irish –– or of any Celtic language –– for which
the extant record is sufficiently full and varied to permit a full synchronic description” (Stifter, 2009, p.
59), the language still lacks the range of digital resources available for other Indo-European languages (e.g.,
Latin, see Pellegrini and Passarotti, 2018). While there are a number of independent projects focusing on
Old Irish lexicography (Griffith, Stifter, and Toner, 2018), the most comprehensive resource, both in terms
of contemporary source material included and the level of grammatical annotation, is Corpus PalaeoHiber-
nicum (CorPH) ‘Old Irish Corpus’ (Stifter et al., 2021). However, in spite of the richness of the linguistic
annotation in CorPH, it cannot be used as the basis for a morphological generator without considerable pre-
processing, due to its inconsistent orthography for lemmata and the way it segments complex morphological
structures.
Old Irish presents many challenges for the development of computational resources. The language has a
complex phonology, an elaborate system of morphophonological alternations, and intricate patterns of mor-
phological inflection (Anderson, 2016; Stifter, 2009; Thurneysen, 1946; Pedersen, 1909–1913). Further
to this, the orthography is neither transparent nor consistent, and considerable differences in orthographic
practice exist (Ó Cróinín, 2001). This complicates the development of a tool for automatic orthography-to-
phonology conversion, as many orthographic sequences can have multiple readings; for instance, combina-
tions of sonorant and stop are ambiguous, in that can represent /rg/ or /rk/ and /rg/ or /rɣ/, which
we resolve by a) a normalised orthography, and b) some manual pre-processing.
We developed a pipeline for the creation of an inflectional lexicon. We began by extracting noun lemmata
from the Old Irish Würzburg glosses (Kavanagh, 2001) and then devised a set of rules for orthography-to-
phonology conversion, subsequently implemented using the Python package Epitran (Mortensen, Dalmia,
and Littell, 2018). The resulting transcriptions act as the data input for a finite-state transducer (FST) adapted
from Fransen (2019), allowing us to generate inflected forms of Old Irish nouns. Finally, we derived ortho-
graphic forms (and their variants) by applying conversion rules in the opposite direction. While this study
focused on the Old Irish nouns in the Würzburg glosses, we intend to extend the lexicon by applying this
pipeline to further corpora and other parts-of-speech.
This inflected lexicon makes possible systematic studies in data-driven morphology and typology (Pel-
legrini, 2020; Beniamine, Bonami, and Luís, 2021; Beniamine, 2021). It will also facilitate future research
into diachronic and diatopic variation in Irish and the development of further NLP applications for the lan-
guage. Moreover, the FST created to generate inflected forms provides a
Lingua originale | English |
---|---|
Titolo della pubblicazione ospite | N/A |
Pagine | N/A |
Stato di pubblicazione | Pubblicato - 2022 |
Evento | 25th International Conference on Historical Linguistics (ICHL25) - Oxford (UK) Durata: 1 ago 2022 → 5 ago 2022 |
Convegno
Convegno | 25th International Conference on Historical Linguistics (ICHL25) |
---|---|
Città | Oxford (UK) |
Periodo | 1/8/22 → 5/8/22 |
Keywords
- Old Irish
- inflectional lexicon
- finite-state transducers
- computational morphology
- grapheme-to-phoneme conversion