Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Machine Learning to Geographically Enrich Understudied Sources: A Conceptual Approach
VIOLA, Lorella; Verheul, Jaap
2020 • In Rocha, Ana; Steels, Luc; van den Herik, Jaap (Eds.) Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ARTIDIGH
[en] This paper discusses the added value of applying machine learning (ML) to contextually enrich digital collections. In this study, we employed ML as a method to geographically enrich historical datasets. Specifically, we used a sequence tagging tool (Riedl and Padó 2018) which implements TensorFlow to perform NER on a corpus of historical immigrant newspapers. Afterwards, the entities were extracted and geocoded. The aim was to prepare large quantities of unstructured data for a conceptual historical analysis of geographical references. The intention was to develop a method that would assist researchers working in spatial humanities, a recently emerged interdisciplinary field focused on geographic and conceptual space. Here we describe the ML methodology and the geocoding phase of the project, focussing on the advantages and challenges of this approach, particularly for humanities scholars. We also argue that, by choosing to use largely neglected sources such as immigrant newspapers (a lso known as ethnic newspapers), this study contributes to the debate about diversity representation and archival biases in digital practices.
Research center :
- Luxembourg Centre for Contemporary and Digital History (C2DH) > Digital History & Historiography (DHI)
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
VIOLA, Lorella ; University of Luxembourg > Luxembourg Center for Contemporary and Digital History (C2DH)
Verheul, Jaap; Universiteit Utrecht > History and Art History
External co-authors :
yes
Language :
English
Title :
Machine Learning to Geographically Enrich Understudied Sources: A Conceptual Approach
Publication date :
2020
Event name :
12th International Conference on Agents and Artificial Intelligence
Event place :
Valletta, Malta
Event date :
from 22-02-2020 to 24-02-2020
Audience :
International
Main work title :
Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ARTIDIGH
Ardanuy, Maria Coll. (2017). Entity-Centric Text Mining for Historical Documents. Georg-August-Universitat Gottingen, Göttingen.
Ardanuy, Mariona Coll, & Sporleder, C. (2017). Toponym disambiguation in historical documents using semantic and geographic features. Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage - DATeCH2017, 175–180. https://doi.org/10.1145/3078081.3078099
Bodenhamer, D. J., Corrigan, J., & Harris, T. M. (Eds.). (2010). The spatial humanities: GIS and the future of humanities scholarship. Bloomington, Ind.: Indiana Univ. Press.
Bodenhamer, D. J., Corrigan, J., & Harris, T. M. (Eds.). (2015a). Deep maps and spatial narratives. Bloomington: Indiana University Press.
Bojanowski, P. Grave, E., Joulin, A. and Mikolov, T. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5:135–146.
Canale, L., Lisena, P., & Troncy, R. (2018). A Novel Ensemble Method for Named Entity Recognition and Disambiguation Based on Neural Network. In D. Vrandečić, K. Bontcheva, M. C. Suárez-Figueroa, V. Presutti, I. Celino, M. Sabou, … E. Simperl (Eds.), The Semantic Web – ISWC 2018 (Vol. 11136, pp. 91–107). https://doi.org/10.1007/978-3-030-00671-6_6
Cresswell, T. (2010). Place: A short introduction (Repr.). Malden, Mass.: Blackwell.
Donaldson, C., Gregory, I. N., & Taylor, J. E. (2017). Locating the beautiful, picturesque, sublime and majestic: Spatially analysing the application of aesthetic terminology in descriptions of the English Lake District. Journal of Historical Geography, 56, 43–60. https://doi.org/10.1016/j.jhg.2017.01.006
Eijnatten, J. V. (2019). Something about the Weather. Using Digital Methods to Mine Geographical Conceptions of Europe in Twentieth-Century Dutch Newspapers. BMGN - Low Countries Historical Review, 134(1), 28–61. https://doi.org/10.18352/bmgn-lchr.10655
Gregory, I. N. (2014). Further Reading: From Historical GIS to Spatial Humanities: An Evolving Literature. In I. N. Gregory & A. Geddes (Eds.), Toward spatial humanities: Historical GIS and spatial history (pp. 186–202). Bloomington, Ind.: Indiana Univ. Press.
Ju, Y., Adams, B., Janowicz, K., Hu, Y., Yan, B., & McKenzie, G. (2016). Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling. In E. Blomqvist, P. Ciancarini, F. Poggi, & F. Vitali (Eds.), Knowledge Engineering and Knowledge Management (Vol. 10024, pp. 353–367). https://doi.org/10.1007/978-3-319-49004-5_23
Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., & Gómez-Berbís, J. M. (2013). Named Entity Recognition: Fallacies, challenges and opportunities. Computer Standards & Interfaces, 35(5), 482–489. https://doi.org/10.1016/j.csi.2012.09.004
McDonough, K., Moncla, L., & van de Camp, M. (2019). Named entity recognition goes to old regime France: Geographic text analysis for early modern French corpora. International Journal of Geographical Information Science, 33(12), 2498–2522. https://doi.org/10.1080/13658816.2019.1620235
Murrieta-Flores, P., & Martins, B. (2019). The geospatial humanities: Past, present and future. International Journal of Geographical Information Science, 33(12), 2424–2429.
Neudecker, C. (2014, March 3). Named Entity Recognition for digitised newspapers – Europeana Newspapers. Retrieved 10 November 2019, from http://www. europeana-newspapers.eu/named-entity-recognition-fordigitised-newspapers/
Pascual-de-Sans, A. (2004). Sense of place and migration histories Idiotopy and idiotope. Area, 36(4), 348–357. https://doi.org/10.1111/j.0004-0894.2004.00236.
Riedl, M. and Padó, S. 2018. A Named Entity Recognition Shootout for German. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 120–125. Melbourne, Australia, July 15 - 20, 2018
Tally, R. T. (Ed.). (2011). Geocritical explorations: Space, place, and mapping in literary and cultural studies. New York: Palgrave Macmillan.
Taylor, J., Donaldson, C. E., Gregory, I. N., & Butler, J. O. (2018). Mapping Digitally, Mapping Deep: Exploring Digital Literary Geographies. Literary Geographies, 4(1), 10–19.
Viola, L. (2018). ChroniclItaly: A corpus of Italian American newspapers from 1898 to 1920. Utrecht University. Retrieved from https://public.yoda.uu.nl/ilab/UU01/T4YMOW.html
Viola, L. (2019). ChroniclItaly 2.0. A corpus of Italian American newspapers annotated for entities, 1898-1920 (Version 2.0). Retrieved from https://doi.org/10.24416/UU01-4MECRO
Viola, L., De Bruin, J., van Eijden, K., & Verheul, J. (2019). The GeoNewsMiner (GNM): An interactive spatial humanities tool to visualize geographical references in historical newspapers (v1.0.0). Retrieved from https://github.com/lorellav/GeoNewsMiner
White, R. (2010). Spatial History Project. Retrieved 8 November 2019, from https://web.stanford.edu/group/spatialhistory/cgi-bin/site/pub.php?id=29
Withers, C. W. J. (2009). Place and the ‘Spatial Turn’ in Geography and in History. Journal of the History of Ideas, 70(4), 637–658. https://doi.org/10.1353/jhi.0. 0054
Won, M., Murrieta-Flores, P., & Martins, B. (2018). Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora. Frontiers in Digital Humanities, 5. https://doi.org/10.3389/fdigh.2018. 00002
Yadav, V., & Bethard, S. (2019). A Survey on Recent Advances in Named Entity Recognition from Deep Learning models. ArXiv:1910.11470 [Cs]. Retrieved from http://arxiv.org/abs/1910.11470
Zhang, Z., & Iria, J. (2009). A novel approach to automatic gazetteer generation using Wikipedia. Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, 1–9. Retrieved from http://dl.acm.org/citation.cfm?id=1699765.1699766