Paper published in a book (Scientific congresses, symposiums and conference proceedings)
From Digitized Sources to Digital Data, Behind the Scenes of (Critically) Enriching a Digital Heritage Collection
Viola, Lorella; Fiscarelli, Antonio Maria
2020 • In Weber, Andreas; Heerlien, Maarten; Gassó Miracle, Eulàliaet al. (Eds.) Proceedings of the International Conference Collect and Connect: Archives and Collections in a Digital Age
Digital Humanities; Digital Heritage; Enrichment; Artificial Intelligence
Abstract :
[en] Digitally available repositories are becoming not only more
and more widespread but also larger and larger. Although there are
both digitally-born collections and digitised material, the digital heritage
scholar is typically confronted with the latter. This immediately presents
new challenges, one of the most urgent being how to find the meaningful elements that are hidden underneath such unprecedented mass of
digital data. One way to respond to this challenge is to contextually enrich the digital material, for example through deep learning. Using the
enrichment of the digital heritage collection ChroniclItaly 3.0 [10] as a
concrete example, this article discusses the complexities of this process.
Specifically, combining statistical and critical evaluation, it describes the
gains and losses resulting from the decisions made by the researcher at
each step and it shows how in the passage from digitised sources to enriched material, most is gained (e.g., preservation, wider and enhanced
access, more material) but some is also lost (e.g., original layout and
composition, loss of information due to pre-processing steps). The article concludes that it is only through a critical approach that the digital
heritage scholar can successfully meet the interpretive challenges presented by the digital and the digital heritage sector fulfil the second
most important purpose of digitisation, that is to enhance access.
Research center :
- Luxembourg Centre for Contemporary and Digital History (C2DH) > Digital History & Historiography (DHI)
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Viola, Lorella ; University of Luxembourg > Luxembourg Centre for Contemporary and Digital History (C2DH) > DHARPA
Fiscarelli, Antonio Maria
External co-authors :
no
Language :
English
Title :
From Digitized Sources to Digital Data, Behind the Scenes of (Critically) Enriching a Digital Heritage Collection
Publication date :
2020
Event name :
International Conference Collect and Connect: Archives and Collections in a Digital Age
Event date :
23-24 November 2020
Main work title :
Proceedings of the International Conference Collect and Connect: Archives and Collections in a Digital Age
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information (2017)
Donaldson, C., Gregory, I.N., Taylor, J.E.: Locating the beautiful, picturesque, sublime and majestic: spatially analysing the application of aesthetic terminology in descriptions of the english lake district. Journal of Historical Geography 56, 43–60 (2017), doi: 10.1016/j.jhg.2017.01.006
Fiorucci, M., Khoroshiltseva, M., Pontil, M., Traviglia, A., Del Bue, A., James, S.: Machine learning for cultural heritage: A survey. Pattern Recognition Letters 133, 102 – 108 (2020). https://doi.org/https://doi.org/10.1016/j.patrec.2020.02.017,http://www.sciencedirect.com/science/article/pii/S0167865520300532
Murrieta-Flores, P., Martins, B.: The geospatial humanities: past, present and future. International Journal of Geographical Information Science 33(12), 2424–2429 (2019). https://doi.org/10.1080/13658816.2019.1645336, https://doi.org/10.1080/13658816.2019.1645336
Riedl, M., Padó, S.: A named entity recognition shootout for german. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 120–125 (2018), doi: 10.18653/v1/P18-2020
Tally Jr, R.T.: Geocritical explorations: Space, place, and mapping in literary and cultural studies. Springer (2011), doi: 10.1057/9780230337930
Viola, L.: ChroniclItaly. A corpus of Italian language newspapers published in the United States between 1898 and 1922. Utrecht University (2018), doi: 10.24416/ UU01-T4YMOW
Viola, L.: ChroniclItaly 2.0. A corpus of Italian American newspapers annotated for entities, 1898-1920. Utrecht University (2019), doi: 10.24416/UU01-4MECRO
Viola, L.: ChroniclItaly 3.0. A contextually enriched digital heritage collection of Italian immigrant newspapers published in the USA, 1898-1936 (In press)
Viola, L., Verheul, J.: Mining ethnicity: Discourse-driven topic modelling of immigrant discourses in the usa, 1898–1920. Digital Scholarship in the Humanities 35(4), 921–943 (2019), doi: 10.1093/llc/fqz068
Viola, L., Verheul, J.: Machine learning to geographically enrich understudied sources: A conceptual approach. In: Proceedings of the 12th International Conference on Agents and Artificial Intelligence-Volume 1: ARTIDIGH. pp. 469–475. SCITEPRESS (2020), doi: 10.5220/0009094204690475
Weber, A., Ameryan, M., Wolstencroft, K., Stork, L., Heerlien, M., Schomaker, L.: Towards a digital infrastructure for illustrated handwritten archives. In: Digital Cultural Heritage, pp. 155–166. Springer (2018)