handwritting text recognition; image-text model; IA applied to historical texts
Abstract :
[en] This HTR model operates in a multilingual environment (Latin and Old French) and it is able to recognize several Latin script families (mostly Textualis and Cursiva) in documents produced in ca. 12th - 15th centuries. During the evaluation the models shows an accuracy of 94.1% on the validation set and a CER (character error ratio) of about 0.12 to 0.17 on four external unseen datasets. A fine-tuning exercise using 10 ground-truth pages can raise these results to a CER between 0.06 to 0.10 respectively.
Disciplines :
Arts & humanities: Multidisciplinary, general & others
Author, co-author :
TORRES AGUILAR, Sergio Octavio ; University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Humanities (DHUM) > History
Jolivet, Vincent
Language :
English
Title :
HTR model for Latin and French Medieval Documentary Manuscripts (12th-15th)
Publication date :
January 2023
Technical description :
This is Handwritting Text Recognition model trained on a charters and registers dataset from the Late-medieval period (12th-15th). The model uses an CNN+RNN+CTC approach backed by kraken.