References of "Kozlowski, Diego 50038844"
     in
Bookmark and Share    
Peer Reviewed
See detailAutomatic Classification of Peer Review Recommendation
Kozlowski, Diego UL; Boothby, Clara; Pei-Ying, Chen et al

Poster (2022, September 08)

Detailed reference viewed: 13 (0 UL)
Full Text
See detailDesigualdades interseccionales en la ciencia
Kozlowski, Diego UL

Speeches/Talks (2022)

Detailed reference viewed: 23 (0 UL)
Full Text
Peer Reviewed
See detailRace And Gender Inequalities In Citations And Research Topics In US
Kozlowski, Diego UL; Monroe-White, Thema

Article for general public (2022)

Detailed reference viewed: 24 (1 UL)
Full Text
See detailIntersectional inequalities in science
Kozlowski, Diego UL

Scientific Conference (2022, April 28)

Detailed reference viewed: 16 (0 UL)
Full Text
Peer Reviewed
See detailLarge-scale computational content analysis on magazines targeting men and women: the case of Argentina 2008-2018
Kozlowski, Diego UL; Lozano, Gabriela; Fletcher, Carla M. et al

in Feminist Media Studies (2022)

Differences in magazines content aimed specifically at women or men are a means to create and reproduce gender stereotypes. Novel computational tools allow to study differences in magazines content taking ... [more ▼]

Differences in magazines content aimed specifically at women or men are a means to create and reproduce gender stereotypes. Novel computational tools allow to study differences in magazines content taking into account all available articles. In this study, we analyse the case of two Argentinian magazines published by the same publishing house over a decade (2008–2018), advertised by the publishing house as targeting women and men respectively. Using computational tools, we are able to analyse more than 24,000 articles, which would have been an impossible task using manual content analysis methodologies. With Topic Modelling techniques we identify the main themes discussed in the magazines and quantify their different frequency between magazines over time. Then, we performed a word-frequency analysis to validate this methodology and extend the analysis to other subjects. Our results show that topics such as Family, Business and Women as sex objects present an initial bias that tends to disappear over time. Conversely, in Fashion and Science topics, the initial differences are maintained. Also, we identify a considerable increase in the use of words associated with feminism since 2015 and specifically the word abortion in 2018. Furthermore, we develop a website where everyone can perform additional analysis. [less ▲]

Detailed reference viewed: 46 (3 UL)
Full Text
Peer Reviewed
See detailAvoiding bias when inferring race using name-based approaches
Kozlowski, Diego UL; Murray, Dakota S.; Bell, Alexis et al

in PLoS ONE (2022), 3(17), 0264270

Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However ... [more ▼]

Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However, because of the lack of robust information on authors’ race, few large-scale analyses have been performed on this topic. Algorithmic approaches offer one solution, using known information about authors, such as their names, to infer their perceived race. As with any other algorithm, the process of racial inference can generate biases if it is not carefully considered. The goal of this article is to assess the extent to which algorithmic bias is introduced using different approaches for name-based racial inference. We use information from the U.S. Census and mortgage applications to infer the race of U.S. affiliated authors in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name-based inference varies by race/ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article lays the foundation for more systematic and less-biased investigations into racial disparities in science. [less ▲]

Detailed reference viewed: 42 (0 UL)
Full Text
Peer Reviewed
See detailIntersectional Inequalities in Science
Kozlowski, Diego UL; Larivière, Vincent; Sugimoto, Cassidy R. et al

in Proceedings of the National Academy of Sciences of the United States of America (2022), 119(2), 2113067119

The US scientific workforce is primarily composed of White men. Studies have demonstrated the systemic barriers preventing women and other minoritized populations from gaining entry to science; few ... [more ▼]

The US scientific workforce is primarily composed of White men. Studies have demonstrated the systemic barriers preventing women and other minoritized populations from gaining entry to science; few, however, have taken an intersectional perspective and examined the consequences of these inequalities on scientific knowledge. We provide a large-scale bibliometric analysis of the relationship between intersectional identities, topics, and scientific impact. We find homophily between identities and topic, suggesting a relationship between diversity in the scientific workforce and expansion of the knowledge base. However, topic selection comes at a cost to minoritized individuals for whom we observe both between- and within-topic citation disadvantages. To enhance the robustness of science, research organizations should provide adequate resources to historically underfunded research areas while simultaneously providing access for minoritized individuals into high-prestige networks and topics. [less ▲]

Detailed reference viewed: 51 (3 UL)
See detailIntersectional Inequalities in Science
Kozlowski, Diego UL

Presentation (2021, December 06)

Detailed reference viewed: 40 (1 UL)
Full Text
See detailMetascience: Disrupting the status quo or perpetuating inequities
Kozlowski, Diego UL

Scientific Conference (2021, September 23)

Detailed reference viewed: 40 (1 UL)
Full Text
Peer Reviewed
See detailAvoiding bias when inferring race using name-based approaches
Kozlowski, Diego UL; Murray, Dakota S.; Bell, Alexis et al

in 18th INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS, 12–15 July 2021KU Leuven, Belgium (2021, July)

Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However ... [more ▼]

Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However, few large-scale analyses have been performed on this topic, mostly because of the lack of robust race-disambiguation algorithms. Identifying author information does not generally include the author’s race. Therefore, an algorithm needs to be employed, using known information about authors, i.e., their names, to infer their perceived race. Nevertheless, as any other algorithm, the process of racial inference can generate biases if it is not carefully considered. When the research is focused on the understanding of racial-based inequalities, such biases undermine the objectives of the investigation and may perpetuate inequities. The goal of this article is to assess the biases introduced by the different approaches used name-based racial inference. We use information from US census and mortgage applications to infer the race of US author names in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name-based inference varies by race and ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article fills an important research gap that will allow more systematic and unbiased studies on racial disparity in science. [less ▲]

Detailed reference viewed: 61 (5 UL)
Full Text
See detailScience Inequalities
Kozlowski, Diego UL

Poster (2021, May 21)

Detailed reference viewed: 87 (60 UL)
Full Text
Peer Reviewed
See detailSemantic and Relational Spaces in Science of Science: Deep Learning Models for Article Vectorisation
Kozlowski, Diego UL; Dusdal, Jennifer UL; Pang, Jun UL et al

in Scientometrics (2021)

Over the last century, we observe a steady and exponentially growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a ... [more ▼]

Over the last century, we observe a steady and exponentially growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded. [less ▲]

Detailed reference viewed: 103 (21 UL)
Full Text
Peer Reviewed
See detailLatent Dirichlet Allocation Models for World Trade Analysis
Kozlowski, Diego UL; Semeshenko, Viktoriya; Molinari, Andrea

in PLoS ONE (2021), 16(2), 0245393

The international trade is one of the classic areas of study in economics. Nowadays, given the availability of data, the tools used for the analysis can be complemented and enriched with new methodologies ... [more ▼]

The international trade is one of the classic areas of study in economics. Nowadays, given the availability of data, the tools used for the analysis can be complemented and enriched with new methodologies and techniques that go beyond the traditional approach. The present paper shows the application of the Latent Dirichlet Allocation Models, a well known technique from the area of Natural Language Processing, to search for latent dimensions in the product space of international trade, and their distribution across countries over time. We apply this technique to a dataset of countries' exports of goods from 1962 to 2016. The findings show the possibility to generate higher level classifications of goods based on the empirical evidence, and also allow to study the distribution of those classifications within countries. The latter show interesting insights about countries' trade specialisation. [less ▲]

Detailed reference viewed: 32 (6 UL)
Full Text
See detailMachine Learning on Graphs
Kozlowski, Diego UL

Presentation (2020, November 18)

Graphs are a ubiquitous data structure that can be exploited in many different problems. In tasks where observations are not independently drawn from the data generating process, but their codependencies ... [more ▼]

Graphs are a ubiquitous data structure that can be exploited in many different problems. In tasks where observations are not independently drawn from the data generating process, but their codependencies add valuable information, a network analysis might be useful for modelling those relations. In this seminar we will discuss about Graph Neural Networks, the deep learning approach for dealing with networks. [less ▲]

Detailed reference viewed: 81 (5 UL)
Full Text
See detailPackage development in R
Kozlowski, Diego UL

Presentation (2020, October 12)

Detailed reference viewed: 22 (1 UL)
Full Text
Peer Reviewed
See detailPresentación del paquete eph
Rosati, German; Kozlowski, Diego UL; Shokida, Natsumi Solange UL et al

Scientific Conference (2020, October 09)

Working with data produced by public sources often encounters several problems: one of the most common is the lack of continuity in the publication of databases. In this regard, the Permanent Household ... [more ▼]

Working with data produced by public sources often encounters several problems: one of the most common is the lack of continuity in the publication of databases. In this regard, the Permanent Household Survey -EPH- of the National Institute of Statistics and Censuses (INDEC) in Argentina is an exception. In fact, this survey has published user databases with information since 1974. However, this has been done in a "non-replicable" way: from changes in the formats of its publication (dbase, .txt, .xls, .sav, etc.) to renaming some variables and recoding their categories, which makes them impractical for continuous use and processing. The lack of an API for the dissemination of information produced by INDEC generates a limit to the information processing capabilities, reducing the users to i) thematic experts with knowledge of how to access sources and ii) media that access information already processed in the form of press releases. This limits the potential value of the enormous work done by the institute, by discouraging its use by users with limited knowledge of the sources, but with data processing capabilities, such as the R community. In turn, certain key indicators presented by the EPH have methodological annexes, but no public implementations that allow the public to make use of the methodology outside the reports prepared by the institute. In this context, the eph package aims to facilitate the work of those users of the Permanent Household Survey - INDEC who wish to process data from it using the programming language R. The library has the following functionalities: i) a unified syntax for downloading, tagging and building datasets with comparable cross-sectional information ii) Implementation of indicator calculation (poverty) using the official methodology Some of its functions are: get_microdata(): Downloads the microdata bases, organize_panels(): Allows to build a panel data pool of the continuous EPH surveys, organize_cno(): Classifies occupations according to the CNO 2001 organize_caes(): Classifies economic activities according to CAES Mercosur 1.0 and CAES Mercosur organize_labels(): Label the bases following the last design map_agglomerates(): Indicator map by agglomerate The package also has other datasets that can be useful for working with the EPH: some dictionaries that contain the coding of geographic variables (such as regions or clusters) or the geographic position (centroids) of the clusters where the survey is conducted. [less ▲]

Detailed reference viewed: 83 (11 UL)
Full Text
Peer Reviewed
See detailA three-level classification of French tweets in ecological crises
Kozlowski, Diego UL; Lannelongue, Elisa; Saudemont, Frédéric et al

in Information Processing and Management (2020), 57(5),

The possibilities that emerge from micro-blogging generated content for crisis-related situations make automatic crisis management using natural language processing techniques a hot research topic. Our ... [more ▼]

The possibilities that emerge from micro-blogging generated content for crisis-related situations make automatic crisis management using natural language processing techniques a hot research topic. Our aim here is to contribute to this line of research focusing for the first time on French tweets related to ecological crises in order to support the French Civil Security and Crisis Management Department to provide immediate feedback on the expectations of the populations involved in the crisis. We propose a new dataset manually annotated according to three dimensions: relatedness, urgency and intentions to act. We then experiment with binary classification (useful vs. non useful), three-class (non useful vs. urgent vs. non urgent) and multiclass classification (i.e., intention to act categories) relying on traditional feature-based machine learning using both state of the art and new features. We also explore several deep learning models trained with pre-trained word embeddings as well as contextual embeddings. We then investigate three transfer learning strategies to adapt these models to the crisis domain. We finally experiment with multi-input architectures by incorporating different metadata extra-features to the network. Our deep models, evaluated in random sampling, out-of-event and out-of-type configurations, show very good performances outperforming several competitive baselines. Our results define the first contribution to the field of crisis management in French social media. [less ▲]

Detailed reference viewed: 107 (3 UL)
Full Text
See detailThe Networks of Science. Data-driven Understanding of Scientific Production
Kozlowski, Diego UL

Presentation (2020, July)

Detailed reference viewed: 19 (0 UL)
Full Text
Peer Reviewed
See detailImproving open data accessibility through package development and community work
Kozlowski, Diego UL; Tiscornia, Pablo; Weksler, Guido et al

Poster (2020, July)

Detailed reference viewed: 94 (6 UL)