comparison; data visualization; historical newspapers; impresso; scalable reading; semantic enrichment; text reuse; user tasks; Computer Science (miscellaneous); Information Systems; Artificial Intelligence
Abstract :
[en] Text Reuse reveals meaningful reiterations of text in large corpora. Humanities researchers use text reuse to study, e.g., the posterior reception of influential texts or to reveal evolving publication practices of historical media. This research is often supported by interactive visualizations which highlight relations and differences between text segments. In this paper, we build on earlier work in this domain. We present impresso Text Reuse at Scale, the to our knowledge first interface which integrates text reuse data with other forms of semantic enrichment to enable a versatile and scalable exploration of intertextual relations in historical newspaper corpora. The Text Reuse at Scale interface was developed as part of the impresso project and combines powerful search and filter operations with close and distant reading perspectives. We integrate text reuse data with enrichments derived from topic modeling, named entity recognition and classification, language and document type detection as well as a rich set of newspaper metadata. We report on historical research objectives and common user tasks for the analysis of historical text reuse data and present the prototype interface together with the results of a user evaluation.
Disciplines :
History
Author, co-author :
DURING, Marten ✱; University of Luxembourg > Luxembourg Centre for Contemporary and Digital History (C2DH) > Digital History and Historiography ; Digital History & Historiography, Luxembourg Centre for Contemporary and Digital History, Esch-sur-Alzette, Luxembourg
Romanello, Matteo; Institute of Archeology and Classical Studies (ASA), University of Lausanne, Lausanne, Switzerland
Beelen, Kaspar; Digital Humanities Research Hub, School of Advanced Study, University of London, London, United Kingdom
GUIDO, Daniele ; University of Luxembourg > Luxembourg Centre for Contemporary and Digital History (C2DH) > Digital Infrastructure ; Digital Research Infrastructure, Luxembourg Centre for Contemporary and Digital History, Esch-sur-Alzette, Luxembourg
Deseure, Brecht; Royal Library of Belgium, Brussels, Belgium
BUNOUT, Estelle ; University of Luxembourg > Luxembourg Centre for Contemporary and Digital History (C2DH) > Contemporary History of Luxembourg ; Contemporary History of Luxembourg, Luxembourg Centre for Contemporary and Digital History, Esch-sur-Alzette, Luxembourg
Keck, Jana; German Historical Institute Washington, Washington, DC, United States
APOSTOLOPOULOS, Petros ; University of Luxembourg > Luxembourg Centre for Contemporary and Digital History (C2DH) > Digital History and Historiography ; Digital History & Historiography, Luxembourg Centre for Contemporary and Digital History, Esch-sur-Alzette, Luxembourg
✱ These authors have contributed equally to this work.
External co-authors :
yes
Language :
English
Title :
impresso Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers.
U-AGR-7251 - INTER/SNF/22/17498891/IMPRESSO2 (01/09/2023 - 28/02/2027) - DURING Marten
Funders :
SNF - Schweizerischer Nationalfonds zur Förderung der wissenschaftlichen Forschung [CH]
Funding number :
ID CR- SII5_173719
Funding text :
The workshop was funded by the Luxembourg Center for Contemporary and Digital History (CDH). This work is building on the research project impresso–Media Monitoring of the Past funded by the Swiss National Science Foundation (SNSF) under grant ID CR- SII5_173719. 2
Büchler M. Burns P. R. Müller M. Franzini E. Franzini G. (2014). “Towards a historical text re-use detection,” in Text Mining, Theory and Applications of Natural Language Processing, eds. C. Biemann, and A. Mehler (Cham: Springer International Publishing), 221–238. 10.1007/978-3-319-12655-5_11
Cordell R. (2015). Reprinting, circulation, and the network author in antebellum newspapers. Am. Liter. Hist. 27, 417–445. 10.1093/alh/ajv028
Düring M. Kalyakin R. Bunout E. Guido D. (2021). Impresso inspect and compare: Visual comparison of semantically enriched historical newspaper articles. Information 12, 348. 10.3390/info12090348
Keck J. Oiva M. Fyfe P. (2022). Lajos kossuth and the transnational news: a computational and multilingual approach to digitized newspaper collections. Media History 29, 287–304. 10.1080/13688804.2022.2146905
Liebl B. Burghardt M. (2020). “‘Shakespeare in the Vectorian Age'—An evaluation of different word embeddings and NLP parameters for the detection of Shakespeare quotes,” in Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (International Committee on Computational Linguistics), 58–68.
Manjavacas E. Long B. Kestemont M. (2019). “On the Feasibility of Automated Detection of Allusive Text Reuse,” in Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (Minneapolis, USA. Association for Computational Linguistics), 104–114. 10.18653/v1/W19-2514
Marxen L. (2023). Where did the news come from? Detection of news agency releases in historical newspapers. Master's thesis, Ècole Polytechnique Fèdèrale de Lausanne, Lausanne.
Moritz M. Steding D. (2018). “Lexical and semantic features for cross-lingual text reuse classification: an experiment in english and latin paraphrases,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (Miyazaki, Japan: European Language Resources Association (ELRA)), 1976–1980.
Oiva M. Nivala A. Salmi H. Latva O. Jalava M. Keck J. et al. (2020). Spreading news in 1904. Media History 26, 391–407. 10.1080/13688804.2019.1652090
Paasikivi S. Salmi H. Vesanto A. Ginter F. (2022). Infectious media: Cholera and the circulation of texts in the finnish press, 1860–1920. Media Hist. 29, 17–38. 10.1080/13688804.2022.2054408
Paju P. Rantala H. Salmi H. (2023). “Towards an ontology and epistemology of text reuse,” in Reflections on tools, methods and epistemology, eds. E. Bunout, M. Ehrmann, and F. Clavert (Berlin: De Gruyter Oldenbourg), 253–274. 10.1515/9783110729214-012
Paju P. Salmi H. Rantala H. Lundell P. Marjanen Vesanto, A. (2022). “Textual migration across the baltic sea: Creating a database of text reuse between Finland and Sweden,” in Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), CEUR Workshop Proceedings, eds. K. Berglund, M. La Mela, and I. Zwart (Aachen: CEUR-WS.org), 361–369.
Romanello M. Berra A. Trachsel A. (2014). “Rethinking,” in text reuse as digital classicists 9th Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2014, Lausanne, Switzerland, 8–12 July 2014, Conference Abstracts [Alliance of Digital Humanities Organizations (ADHO)].
Romanello M. Hengchen S. (2020). Detecting text reuse with passim. Progr. Histor. 10.46430/phen0092
Romanello M. Snyder R. (2017). Cited Loci of the Aeneid: Searching through JSTOR's content the classicists' way. (Blog post). Available online at: https://labs.jstor.org/blog/cited-loci-of-the-aeneid/
Rosson D. Mäkelä E. Vaara V. Mahadevan A. Ryan Y. Tolonen M. (2023). Reception reader: exploring text reuse in early modern British Publications. arXiv preprint arXiv:2302.04084. 10.5334/johd.101
Salmi H. Paju P. Rantala H. Nivala A. Vesanto A. Ginter F. (2020). The reuse of texts in finnish newspapers and journals, 1771–1920: a digital humanities perspective. Histor. Method. 54, 14–28. 10.1080/01615440.2020.1803166
Salmi H. Rantala H. Vesanto A. Ginter F. (2019). “The long-term reuse of text in the finnish press, 1771–1920,” in Proceedings of the Digital Humanities in the Nordic Countries 4th Conference, eds. C. Navarretta, M. Agirrezabal, and B. Maegaard (Copenhagen, Denmark: CEUR Workshop Proceedings), 253–273.
Scheirer W. Forstall C. Coffee N. (2016). The sense of a connection: automatic tracing of intertextuality by meaning. Digital Schol. Human. 31, 204–217. 10.1093/llc/fqu058
Smith D. A. Cordell R. Dillon E. M. (2013). “Infectious texts: Modeling text reuse in nineteenth-century newspapers,” in 2013 IEEE International Conference on Big Data 86–94. 10.1109/BigData.2013.6691675
Smith D. A. Cordell R. Mullen A. (2015). Computational methods for uncovering reprinted texts in antebellum newspapers. Am. Liter. Hist. 27, E1–E15. 10.1093/alh/ajv029
Thèrenty M.-E. Venayre S. (2021). Le monde à la une. Une histoire de la presse par ses rubriques. Anamosa, illustrated èdition edition. 10.3917/anamo.there.2021.02
Verheul J. Salmi H. Riedl M. Nivala A. Viola L. Keck J. et al. (2022). Using word vector models to trace conceptual change over time and space in historical newspapers, 1840–1914. Dig. Human. Quart. 16, 7445. Available online at: https://www.digitalhumanities.org/dhq/vol/16/2/000550/000550.html
Vesanto A. Nivala A. Rantala H. Salakoski T. Salmi H. Ginter F. (2017). “Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771–1910,” in Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language 54–58.
Walma L. W. B. (2015). Filtering the “news:” Uncovering morphine's multiple meanings on delpher's dutch newspapers and the need to distinguish more article types. Tijdschrift voor Tijdschriftstudies. 38, 61–78. 10.18352/ts.345
Yousef T. Janicke S. (2021). A survey of text alignment visualization. IEEE Trans. Visual. Comput. Graph. 27, 1149–1159. 10.1109/TVCG.2020.302897533044932