Developing an inflectional lexicon for Old Irish

Cormac Anderson, Theodorus Fransen, Sacha Beniamine

Risultato della ricerca: Contributo in libroContributo a convegno

Abstract

While Old Irish (c. 600–900 A.D.) is extensively documented, it remains digitally under- resourced, lacking the range of digital resources available for other older Indo-European languages (e.g., Latin, see Pellegrini and Passarotti, 2018). We report on the development of a fully inflected lexicon of Old Irish nouns, provided in both phonemic and orthographic notation. This involved a computer-assisted, systematic, and reproducible grapheme-to- phoneme conversion pipeline and generating morphological forms through a finite-state transducer. The inflected lexicon we develop will better enable computational studies in Old Irish morphology, further research into diachronic developments, and have a wide range of Natural Language Processing (NLP) applications. We began by extracting noun lemmata from the Old Irish Würzburg glosses (Kavanagh, 2001) and the Corpus PalaeoHibernicum (CorPH) ‘Old Irish Corpus’ (Stifter et al., 2021). We then devised a set of rules for orthography-to-phonology conversion, subsequently implemented using the Python package Epitran (Mortensen, Dalmia, and Littell, 2018). The resulting transcriptions act as the data input for a finite-state transducer (FST) adapted from Fransen (2019), allowing us to generate inflected forms of Old Irish nouns. Finally, we derived orthographic forms (and their variants) by applying conversion rules to the generated forms. Old Irish presents considerable challenges for the development of a resource of this nature, given its opaque and inconsistent orthography, complex phonology, elaborate system of morphophonological alternations, and intricate patterns of morphological inflection (Anderson, 2016; Stifter, 2009; Thurneysen, 1946; Pedersen, 1909–1913). We report on how we dealt with these problems in the development of the inflectional lexicon. While this study focused on the Old Irish nouns in the Würzburg glosses, we intend to extend the lexicon by applying this pipeline to further corpora and other parts- of-speech. This inflected lexicon makes possible systematic studies in data-driven morphology and typology (Pellegrini, 2020; Beniamine, Bonami, and Luís, 2021; Beniamine, 2021), and facilitates future research into diachronic and diatopic variation in Irish and the development of further NLP applications for the language. References Anderson, Cormac (2016). “Consonant colour and vocalism in the history of Irish”. PhD thesis. Uniwersytet im. Adama Mickiewicza w Poznaniu. URL: https://hdl.handle.net/10593/14780. Beniamine, Sacha (2021). “One lexeme, many classes: inflection class systems as lattices”. In: One-to-Many Relations. Ed. by Berthold Crysmann and Manfred Sailer. Berlin: Language Science Press. Beniamine, Sacha, Olivier Bonami, and Ana R. Luís (2021). “The fine implicative structure of European Portuguese conjugation”. In: Isogloss 7.9, pp. 1–35. DOI: https://doi.org/10.5565/rev/isogloss.109. Fransen, Theodorus (2019). “Past, present and future: Computational approaches to mapping historical Irish cognate verb forms”. PhD thesis. Trinity College Dublin, The University of Dublin. URL: https://github.com/ThFransen84/OIfst. Kavanagh, Séamus (2001). A Lexicon of the Old Irish Glosses in the Würzburg Manuscript of the Epistles of St. Paul. Ed. by Dagmar S. Wodtko. Mitteilungen der Prähistorischen Kommission 45. + 1 CD-ROM. Wien: Verlag der Österreichischen Akademie der Wissenschaften. DOI: 10.1553/0x0001fb6e. Mortensen, David R., Siddharth Dalmia, and Patrick Littell (May 2018). “Epitran: Precision G2P for Many Languages”. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Ed. by Nicoletta Calzolari (Conference chair) et al. Miyazaki, Japan: European Language Resources Association (ELRA). Pedersen, Holger (1909–1913). Vergleichende Grammatik der keltischen Sprachen. 2 Vols. Göttingen: Vandenhoeck & Ruprecht.
Lingua originaleEnglish
Titolo della pubblicazione ospiteInternational Congress of Celtic Studies XVII Utrecht
Pagine25-26
Numero di pagine2
Stato di pubblicazionePubblicato - 2023
EventoInternational Congress of Celtic Studies XVII Utrecht - Utrecht
Durata: 24 lug 202328 lug 2023

Convegno

ConvegnoInternational Congress of Celtic Studies XVII Utrecht
CittàUtrecht
Periodo24/7/2328/7/23

Keywords

  • inflected lexicon
  • Old Irish
  • grapheme-to-phoneme conversion

Fingerprint

Entra nei temi di ricerca di 'Developing an inflectional lexicon for Old Irish'. Insieme formano una fingerprint unica.

Cita questo