Automated Identification of National Implementations of European Union Directives With Multilingual Information Retrieval Based On Semantic Textual Similarity
Text Similarity; Machine Learning; Transposition; European Law; Concept and Named Entity Recognition
Abstract :
[en] The effective transposition of European Union (EU) directives into Member States is important to achieve the policy goals defined in the Treaties and secondary legislation. National Implementing Measures (NIMs) are the legal texts officially adopted by the Member States to transpose the provisions of an EU directive. The measures undertaken by the Commission to monitor NIMs are time-consuming and expensive, as they resort to manual conformity checking studies and legal analysis. In this thesis, we developed a legal information retrieval system using semantic textual similarity techniques to automatically identify the transposition of EU directives into the national law at a fine-grained provision level. We modeled and developed various text similarity approaches such as lexical, semantic, knowledge-based, embeddings-based and concept-based methods. The text similarity systems utilized both textual features (tokens, N-grams, topic models, word and paragraph embeddings) and semantic knowledge from external knowledge bases (EuroVoc, IATE and Babelfy) to identify transpositions. This thesis work also involved the development of a multilingual corpus of 43 directives and their corresponding NIMs from Ireland (English legislation), Italy (Italian legislation) and Luxembourg (French legislation) to validate the text similarity based information retrieval system. A gold standard mapping (prepared by two legal researchers) between directive articles and NIM provisions was prepared to evaluate the various text similarity models. The results show that the lexical and semantic text similarity techniques were more effective in identifying transpositions as compared to the embeddings-based techniques. We also observed that the unsupervised text similarity techniques had the best performance in case of the Luxembourg Directive-NIM corpus.
We also developed a concept recognition system based on conditional random fields (CRFs) to identify concepts in European directives and national legislation. The results indicate that the concept recognitions system improved over the dictionary lookup program by tagging the concepts which were missed by dictionary lookup. The concept recognition system was extended to develop a concept-based text similarity system using word-sense disambiguation and dictionary concepts. The performance of the concept-based text similarity measure was competitive with the best performing text similarity measure. The labeled corpus of 43 directives and their corresponding NIMs was utilized to develop supervised text similarity systems by using machine learning classifiers. We modeled three machine learning classifiers with different textual features to identify transpositions. The results show that support vector machines (SVMs) with term frequency-inverse document frequency (TF-IDF) features had the best overall performance over the multilingual corpus. Among the unsupervised models, the best performance was achieved by TF-IDF Cosine similarity model with macro average F-score of 0.8817, 0.7771 and 0.6997 for the Luxembourg, Italian and Irish corpus respectively. These results demonstrate that the system was able to identify transpositions in different national jurisdictions with a good performance. Thus, it has the potential to be useful as a support tool for legal practitioners and Commission officials involved in the transposition monitoring process.
Disciplines :
Computer science
Author, co-author :
NANDA, Rohan ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Language :
English
Title :
Automated Identification of National Implementations of European Union Directives With Multilingual Information Retrieval Based On Semantic Textual Similarity