Abstract
The computational linguistics world is gradually focussing its interests in researching and
building new derivational morphology resources and tools. This happens especially in the
production of tools for modern languages such as the lexical network for Czech, DeriNet,1 and
the derivational lexicon for German DErivBASE.2
On the Classical languages front, although the number of lexical resources and NLP tools
(especially for Latin) is now manifold and varied, until now there has not been any attempt to
create a derivational morphology tool, where lemmas are segmented and analysed into their
derivational morphological components, so to establish relationships between them on the
basis of word formation, and the verbal noun amator can be reconnected to the verb amo
through a suffixation of –a-tor. The first steps towards constructing a lexicon based on
wordformation for Latin were actually made by Marco Passarotti and Francesco Mambrini in
2012, when they published a paper proposing a model for the semi-automatic extraction of
word formation rules and the subsequent pairing of lemmas to their morphologically simplest
lemma (i.e. non-derived). 3
In this context, the Word Formation Latin project (WFL) has been awarded a Marie Curie
individual fellowship to expand on these efforts and create a definitive derivational lexicon for
Classical Latin. This will ultimately be included in the automatic lemmatiser for Latin LEMLAT
(http://www.ilc.cnr.it/lemlat/lemlat/index.html, accessed 21/01/2016, due to an update soon),
creating a 360° resource for the study of Latin Morphology.
The data is collected and organised in a MySql relational database according to the following
steps:
a) A list of lemmas is automatically extracted from the LEMLAT dataset.
b) The wordformation rules (WFR) are conceived according to the Item-and-Arrangement
model, which considers word forms either as simple morphemes (simplex) or as a
concatenation of morphemes absolving the following conditions:
1) Baudoin’s assumption that both base and affixes are lexical elements (i.e. they are both
morphemes),
2) They are dualistic, having both form and meaning (Bloomfield’s “sign-base” morpheme
theory)
3) They both exist in a lexicon (Bloomfield’s “lexical morpheme” theory)( Passarotti-
Mambrini, 2012. Hockett, 1954).
In Passarotti & Mambrini, a list of WFRs was obtained both manually and automatically, then
identified and formalised into a table, according to their type (prefixal, suffixal, compound and
1 Ševčíková, Magda, and Zdeněk Žabokrtskỳ. 2014. “Word-Formation Network for Czech.” In Proceedings of the 9th
International Conference on Language Resources and Evaluation (LREC 2014), 1087–93
2 Zeller, Britta D., Jan Snajder, and Sebastian Padó. 2013. “DErivBase: Inducing and Evaluating a Derivational
Morphology Resource for German”, in ACL (1), 1201–11. http://anthology.aclweb.org/P/P13/P13-1118.pdf
3 M. Passarotti & F. Mambrini, First Steps towards the Semi-automatic Development of a word-formation-based
Lexicon of Latin, in Proceedings of LREC 2012, Istanbul, Turkey, 852-859
conversion) and according to the category of transformation undergone by the lexical
element in input (N-to-N, N-to-V, N-to-A etc.).
In the first phase of the WFL project, for each WFR, we automatically find input and output
candidate lemmas through the aid of sql queries (an output lemma can belong to only one
WFR).
In phase 2, morphological families are induced from the data. A morphological family is the
set of lemmas morphologically derived from one common ancestor-lemma: all those (simple,
or complex) lemmas that share the same base are assigned to the same morphological family.
Finally, the members of each family are automatically linked to each other according to their
part of speech, inflectional category, and affixes by means of the WFR assignment. The simple
lemma member is assigned the role
Lingua originale | English |
---|---|
Titolo della pubblicazione ospite | Formal Representation and the Digital Humanities |
Pagine | 97-114 |
Numero di pagine | 18 |
Stato di pubblicazione | Pubblicato - 2018 |
Evento | Formal Representation
and the Digital Humanities - Verona Durata: 28 giu 2016 → 29 giu 2016 |
Workshop
Workshop | Formal Representation and the Digital Humanities |
---|---|
Città | Verona |
Periodo | 28/6/16 → 29/6/16 |
Keywords
- computational linguistics
- latin morphology
- lexicography
- word formation