Morphology Beyond Inflection. Building a Word Formation Based Lexicon for Latin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The computational linguistics world is gradually focussing its interests in researching and building new derivational morphology resources and tools. This happens especially in the production of tools for modern languages such as the lexical network for Czech, DeriNet,1 and the derivational lexicon for German DErivBASE.2 On the Classical languages front, although the number of lexical resources and NLP tools (especially for Latin) is now manifold and varied, until now there has not been any attempt to create a derivational morphology tool, where lemmas are segmented and analysed into their derivational morphological components, so to establish relationships between them on the basis of word formation, and the verbal noun amator can be reconnected to the verb amo through a suffixation of –a-tor. The first steps towards constructing a lexicon based on wordformation for Latin were actually made by Marco Passarotti and Francesco Mambrini in 2012, when they published a paper proposing a model for the semi-automatic extraction of word formation rules and the subsequent pairing of lemmas to their morphologically simplest lemma (i.e. non-derived). 3 In this context, the Word Formation Latin project (WFL) has been awarded a Marie Curie individual fellowship to expand on these efforts and create a definitive derivational lexicon for Classical Latin. This will ultimately be included in the automatic lemmatiser for Latin LEMLAT (http://www.ilc.cnr.it/lemlat/lemlat/index.html, accessed 21/01/2016, due to an update soon), creating a 360° resource for the study of Latin Morphology. The data is collected and organised in a MySql relational database according to the following steps: a) A list of lemmas is automatically extracted from the LEMLAT dataset. b) The wordformation rules (WFR) are conceived according to the Item-and-Arrangement model, which considers word forms either as simple morphemes (simplex) or as a concatenation of morphemes absolving the following conditions: 1) Baudoin’s assumption that both base and affixes are lexical elements (i.e. they are both morphemes), 2) They are dualistic, having both form and meaning (Bloomfield’s “sign-base” morpheme theory) 3) They both exist in a lexicon (Bloomfield’s “lexical morpheme” theory)( Passarotti- Mambrini, 2012. Hockett, 1954). In Passarotti & Mambrini, a list of WFRs was obtained both manually and automatically, then identified and formalised into a table, according to their type (prefixal, suffixal, compound and 1 Ševčíková, Magda, and Zdeněk Žabokrtskỳ. 2014. “Word-Formation Network for Czech.” In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), 1087–93 2 Zeller, Britta D., Jan Snajder, and Sebastian Padó. 2013. “DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German”, in ACL (1), 1201–11. http://anthology.aclweb.org/P/P13/P13-1118.pdf 3 M. Passarotti & F. Mambrini, First Steps towards the Semi-automatic Development of a word-formation-based Lexicon of Latin, in Proceedings of LREC 2012, Istanbul, Turkey, 852-859 conversion) and according to the category of transformation undergone by the lexical element in input (N-to-N, N-to-V, N-to-A etc.). In the first phase of the WFL project, for each WFR, we automatically find input and output candidate lemmas through the aid of sql queries (an output lemma can belong to only one WFR). In phase 2, morphological families are induced from the data. A morphological family is the set of lemmas morphologically derived from one common ancestor-lemma: all those (simple, or complex) lemmas that share the same base are assigned to the same morphological family. Finally, the members of each family are automatically linked to each other according to their part of speech, inflectional category, and affixes by means of the WFR assignment. The simple lemma member is assigned the role
Original languageEnglish
Title of host publicationFormal Representation and the Digital Humanities
Pages97-114
Number of pages18
Publication statusPublished - 2018
EventFormal Representation and the Digital Humanities - Verona
Duration: 28 Jun 201629 Jun 2016

Workshop

WorkshopFormal Representation and the Digital Humanities
CityVerona
Period28/6/1629/6/16

Keywords

  • computational linguistics
  • latin morphology
  • lexicography
  • word formation

Fingerprint

Dive into the research topics of 'Morphology Beyond Inflection. Building a Word Formation Based Lexicon for Latin'. Together they form a unique fingerprint.

Cite this