V. 58 N. 1 (2020)
Articoli

L.A.S.L.A. and Collatinus: a convergence in lexica

Philippe Verkerk
Université de Lille
Yves Ouvrard
retired professor of “Éducation Nationale”
Margherita Fantoli
Université de Liège
Dominique Longrée
Université de Liège

Pubblicato 2020-09-02

Parole chiave

  • lemmatization,
  • morphosyntactic analysis,
  • disambiguation,
  • probabilistic tagger

Abstract

The research group L.A.S.L.A. (Laboratoire d’Analyse Statistique des Langues Anciennes, University of Liège, Belgium) began in 1961 a project of lemmatization and morphosyntactic tagging of Latin texts. This project continues with new texts lemmatized each year (see http://web.philo.ulg.ac.be/lasla/). The resulting files, which contain approximatively 2,500,000 words, whose lemmatization and tagging have been verified by a philologist, have recently been made available to interested scholars. In the early 2000s, Collatinus was developed by Yves Ouvrard for teaching. Its goal was to generate a complete lexical aid, with a short translation and the morphological analyses of the forms, for any text that can be given to the students (see https://outils.biblissima.fr/fr/collatinus/). Although these two projects look very different, they met a few years ago in the conception of a new tool to speed up the lemmatization process of Latin texts at L.A.S.L.A. This tool is based on a concurrent lemmatization of each word by looking for the form in those already analyzed in the L.A.S.L.A. files and by Collatinus. This lemmatization is followed by a disambiguation process with a second-order hidden Markov model and the result is presented in a text-editor to be corrected by the philologist.