Abstract
Despite its key role in the history of computational linguistics, thanks to the pioneering work by Roberto Busa SJ on the Index
Thomisticus, Latin can still be considered as a less-resourced language. Although during the last decades several Latin texts have been
digitized, only a few of them have been linguistically tagged, while most still lack linguistic tagging at all. However, while the
less-resourced status affects historical languages in general, over the past few years a number of language resources for Latin and other
historical languages have been started, among which are several treebanks. Presenting the experience of the Index Thomisticus
Treebank project and, particularly, its valency lexicon, this paper reports some general insights about the creation and use of language
resources for less-resourced languages, showing that, although creating from scratch a language resource for a less-resourced language
still remains a labor-intensive and time-consuming task, today this is simplified by exploiting the results provided by previous similar
experiences in language resources development.
Lingua originale | English |
---|---|
Titolo della pubblicazione ospite | 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages |
Pagine | 27-32 |
Numero di pagine | 6 |
Stato di pubblicazione | Pubblicato - 2010 |
Evento | 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages - La Valletta, Malta Durata: 23 mag 2010 → 23 mag 2010 |
Convegno
Convegno | 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages |
---|---|
Città | La Valletta, Malta |
Periodo | 23/5/10 → 23/5/10 |
Keywords
- Latino
- Linguistica computazionale