Article (Périodiques scientifiques)
Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws
Martinez-Seis, Bella; Pichardo-Lagunas, Obdulia; KOFF, Harlan et al.
2022In Data, 7 (91)
Peer reviewed vérifié par ORBi
 

Documents


Texte intégral
data-07-00091-v3.pdf
Postprint Éditeur (447.9 kB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
Mexican legislation; laws; natural language processing; legislative documents
Résumé :
[en] This paper presents a corpus of pre-processed Mexican laws for computational tasks. The main contributions are the proposed JSON structure and the methodology used to achieve the semi-structured corpus with the selected algorithms. Law PDF documents were transformed into plain text, unified by a deconstruction of law–document structure, and labeled with natural language processing techniques considering part of speech (PoS); a process of entity extraction was also performed. The corpus includes the Mexican constitution and the Mexican laws that were collected from the official site in PDF format repealed before 14 October 2021. The collection has 305 documents, including: the Mexican constitution, 289 laws, 8 federal codes, 3 regulations, 2 statutes, 1 decree, and 1 ordinance. The semi-structured database includes the transformation of the set of laws from PDF format to a digital representation in order to facilitate its computational analysis. The documents were migrated to JSON type files to represent internal hierarchical relations. In addition, basic natural language processing techniques were implemented on laws for the identification of part of speech and named entities. The presented data set is mainly useful for text analysis and data science. It could be used for various legislative analysis tasks including: comprehension, interpretation, translation, classification, accessibility, coherence, and searches. Finally, we present some statistic of the identified entities and an example of the usefulness of the corpus for environmental laws.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Martinez-Seis, Bella
Pichardo-Lagunas, Obdulia
KOFF, Harlan  ;  University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Geography and Spatial Planning (DGEO)
Equihua, Miguel
Perez-Maqueo, Octavio
Hernández-Huerta, Arturo
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws
Date de publication/diffusion :
juin 2022
Titre du périodique :
Data
eISSN :
2306-5729
Maison d'édition :
MDPI AG, Suisse
Volume/Tome :
7
Fascicule/Saison :
91
Peer reviewed :
Peer reviewed vérifié par ORBi
Focus Area :
Sustainable Development
Intitulé du projet de recherche :
Integralidad-GAMMA
Organisme subsidiant :
CONACYT - Consejo Nacional de Ciencia y Tecnología
Disponible sur ORBilu :
depuis le 12 juillet 2022

Statistiques


Nombre de vues
142 (dont 0 Unilu)
Nombre de téléchargements
94 (dont 0 Unilu)

citations Scopus®
 
1
citations Scopus®
sans auto-citations
1
OpenCitations
 
0
citations OpenAlex
 
2
citations WoS
 
1

Bibliographie


Publications similaires



Contacter ORBilu