L.A.S.L.A. and Collatinus: a convergence in lexica

Philippe Verkerk, Yves Ouvrard, Margherita Fantoli, Dominique Longrée

Abstract


The research group L.A.S.L.A. (Laboratoire d’Analyse Statistique des Langues Anciennes, University of Liège, Belgium) began in 1961 a project of lemmatization and morphosyntactic tagging of Latin texts. This project continues with new texts lemmatized each year (see http://web.philo.ulg.ac.be/lasla/). The resulting files, which contain approximatively 2,500,000 words, whose lemmatization and tagging have been verified by a philologist, have recently been made available to interested scholars. In the early 2000s, Collatinus was developed by Yves Ouvrard for teaching. Its goal was to generate a complete lexical aid, with a short translation and the morphological analyses of the forms, for any text that can be given to the students (see https://outils.biblissima.fr/fr/collatinus/). Although these two projects look very different, they met a few years ago in the conception of a new tool to speed up the lemmatization process of Latin texts at L.A.S.L.A. This tool is based on a concurrent lemmatization of each word by looking for the form in those already analyzed in the L.A.S.L.A. files and by Collatinus. This lemmatization is followed by a disambiguation process with a second-order hidden Markov model and the result is presented in a text-editor to be corrected by the philologist.

Parole chiave


lemmatization; morphosyntactic analysis; disambiguation; probabilistic tagger

Full Text

PDF (English)


DOI: https://doi.org/10.4454/ssl.v58i1.275

Refback

  • Non ci sono refbacks, per ora.


Copyright (c) 2020 Studi e Saggi Linguistici

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
eISSN 2281-9142 - ISSN 0085-6827 - Webmaster - Publisher