Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws

Martinez-Seis, Bella; Pichardo-Lagunas, Obdulia; KOFF, Harlan; Equihua, Miguel; Perez-Maqueo, Octavio; Hernández-Huerta, Arturo

doi:10.3390/data7070091

Download

Article (Scientific journals)

Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws

Martinez-Seis, Bella; Pichardo-Lagunas, Obdulia; KOFF, Harlan et al.

2022 • In Data, 7 (91)

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/51615

DOI
10.3390/data7070091

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

data-07-00091-v3.pdf

Publisher postprint (447.9 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Mexican legislation; laws; natural language processing; legislative documents

Abstract :

[en] This paper presents a corpus of pre-processed Mexican laws for computational tasks. The main contributions are the proposed JSON structure and the methodology used to achieve the semi-structured corpus with the selected algorithms. Law PDF documents were transformed into plain text, unified by a deconstruction of law–document structure, and labeled with natural language processing techniques considering part of speech (PoS); a process of entity extraction was also performed. The corpus includes the Mexican constitution and the Mexican laws that were collected from the official site in PDF format repealed before 14 October 2021. The collection has 305 documents, including: the Mexican constitution, 289 laws, 8 federal codes, 3 regulations, 2 statutes, 1 decree, and 1 ordinance. The semi-structured database includes the transformation of the set of laws from PDF format to a digital representation in order to facilitate its computational analysis. The documents were migrated to JSON type files to represent internal hierarchical relations. In addition, basic natural language processing techniques were implemented on laws for the identification of part of speech and named entities. The presented data set is mainly useful for text analysis and data science. It could be used for various legislative analysis tasks including: comprehension, interpretation, translation, classification, accessibility, coherence, and searches. Finally, we present some statistic of the identified entities and an example of the usefulness of the corpus for environmental laws.

Disciplines :

Computer science

Author, co-author :

Martinez-Seis, Bella

Pichardo-Lagunas, Obdulia

KOFF, Harlan ; University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Geography and Spatial Planning (DGEO)

Equihua, Miguel

Perez-Maqueo, Octavio

Hernández-Huerta, Arturo

External co-authors :

yes

Language :

English

Title :

Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws

Publication date :

June 2022

Journal title :

Data

eISSN :

2306-5729

Publisher :

MDPI AG, Switzerland

Volume :

Issue :

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Sustainable Development

Additional URL :

https://www.mdpi.com/2306-5729/7/7/91

Name of the research project :

Integralidad-GAMMA

Funders :

CONACYT - Consejo Nacional de Ciencia y Tecnología

Available on ORBilu :

since 12 July 2022

Statistics

Number of views

185 (0 by Unilu)

Number of downloads

129 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™