Description
Despite the proliferation and the increasing coverage of linguistic resources for many modern and historical languages, the interoperability issues imposed by their different formats severely limit their potential for exploitation and use. To address this challenge, the LiLa: Linking Latin ERC-Consolidator project (2018-2023; https://lila-erc.eu) aims to connect the wealth of linguistic resources for Latin developed thus far, by building a Linked Data Knowledge Base of the currently available textual/lexical resources (e.g., corpora, lexica, dictionaries, thesauri) and natural language processing (NLP) tools for Latin, i.e. a collection of several data sets described using the same vocabulary of knowledge description and linked together (Passarotti et al., 2020). Linked Data is a paradigm according to which data in the Semantic Web (Berners-Lee et al., 2001) are interlinked with other data via triple-like connections that can be interrogated via semantic queries. LiLa makes use of a set of Semantic Web and Linguistic Linked Open Data standards, including ontologies to describe linguistic annotation (OLiA: Chiarcos & Sukhareva, 2015), corpus annotation (NIF: Hellmann et al., 2013) and lexical resources (Lemon: Buitelaar et al., 2011). The LiLa Knowledge Base is highly lexically-based, following the assumption that everything in LiLa deals with words. Textual resources are made of (occurrences of) words, lexical resources describe properties of words, and NLP tools process words. The core of the LiLa Knowledge Base is a large collection of Latin lemmas extracted from a number of dictionaries and glossaries for Classical, Late and Medieval Latin, for a total of more than 130,000 lemmas. The Lemma is the key Class in the ontology LiLa is built upon. Interoperability is achieved by linking the entries in lexical resources and the corpus tokens pointing to the same lemma. Our talk wants to introduce the LiLa Knowledge Base, particularly detailing (a) the structure of its collection of lemmas and (b) the inclusion into LiLa, as well as the use through LiLa, of a number of linguistic resources, including some for Late and Vulgar Latin, like a portion of the Computational Historical Semantics corpus (https://www.comphistsem.org/home.html) and the Late Latin Charter Treebank (Korkiakangas & Passarotti, 2011).
Dati resi disponibili | 13 set 2022 |
---|---|
Editore | ZENODO |