handwritting text recognition; image-text model; IA applied to historical texts
Résumé :
[en] This HTR model operates in a multilingual environment (Latin and Old French) and it is able to recognize several Latin script families (mostly Textualis and Cursiva) in documents produced in ca. 12th - 15th centuries. During the evaluation the models shows an accuracy of 94.1% on the validation set and a CER (character error ratio) of about 0.12 to 0.17 on four external unseen datasets. A fine-tuning exercise using 10 ground-truth pages can raise these results to a CER between 0.06 to 0.10 respectively.
Disciplines :
Arts & sciences humaines: Multidisciplinaire, généralités & autres
Auteur, co-auteur :
TORRES AGUILAR, Sergio Octavio ; University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Humanities (DHUM) > History
Jolivet, Vincent
Langue du document :
Anglais
Titre :
HTR model for Latin and French Medieval Documentary Manuscripts (12th-15th)
Date de publication/diffusion :
janvier 2023
Description technique :
This is Handwritting Text Recognition model trained on a charters and registers dataset from the Late-medieval period (12th-15th). The model uses an CNN+RNN+CTC approach backed by kraken.