Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Transforming Unstructured Sensitive Information into Structured Knowledge
Blanco Lambruschini, Braulio C.; BRORSSON, Mats
2024In ICAIF 2024 - 5th ACM International Conference on AI in Finance
Peer reviewed
 

Files


Full Text
3677052.3698602.pdf
Publisher postprint (596.2 kB) Creative Commons License - Attribution
CC BY 4.0
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Finance; Information Extraction; LLM; 'current; Extraction modeling; Fine tuning; Information extraction; Limited information; Sensitive informations; Structured knowledge; Synthetic data; Unstructured data; Artificial Intelligence
Abstract :
[en] Information is crucial in today's context, yet less than 20% of companies utilize their unstructured data due to its complexity. Information Extraction (IE) is vital for effective data use, but current IE models face four major issues. First, they often provide limited information, such as a simple entity-attribute relation. Second, they struggle with multiple languages. Models like GPT, Mistral, and Llama3 show promise but face a third issue: output reliability due to hallucinations. Fourth, there is a challenge in reducing sensitive data leakage after fine-tuning models. This study introduces an enhanced approach for fine-tuning GPT-based models, designed to extract and assess information involving multiple entities and attributes, performing both multientity extraction (MEE) and multirelation extraction (MRE), and presenting results in a JSON format. Our methodology evaluates the impact of using synthetic data for fine-tuning to ensure reliable outcomes. Applied to legal documents from the Luxembourg Business Registers (LBR), our findings show that replacing sensitive data with synthetic data significantly improves the fine-tuning of Llama3-based models, though not for Mistral-based models. Our top models outperform Mistral in various scenarios, requiring only 500 samples for fine-tuning and running efficiently on modest servers. This approach is suitable for multilingual Information Extraction in any domain.
Disciplines :
Computer science
Author, co-author :
Blanco Lambruschini, Braulio C. ;  Snt - Sedan, University of Luxembourg, Luxembourg
BRORSSON, Mats  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN
External co-authors :
no
Language :
English
Title :
Transforming Unstructured Sensitive Information into Structured Knowledge
Publication date :
14 November 2024
Event name :
Proceedings of the 5th ACM International Conference on AI in Finance
Event place :
Brooklyn, Usa
Event date :
14-11-2024 => 17-11-2024
Audience :
International
Main work title :
ICAIF 2024 - 5th ACM International Conference on AI in Finance
Publisher :
Association for Computing Machinery, Inc
ISBN/EAN :
9798400710810
Peer reviewed :
Peer reviewed
FnR Project :
FNR15403349 - SCRiPT - Sme Credit Risk Platform, 2020 (01/04/2021-31/03/2024) - Radu State
Funders :
Luxembourg National Research Fund (FNR)
Funding text :
This research was funded in whole or in part by the Luxembourg National Research Fund (FNR), grant reference 15403349. For the purpose of open access, and in fulfilment of the obligations arising from the grant agreement, the author has applied a Creative Commons Attribution 4.0 International (CC BY 4.0) license to any Author Accepted Manuscript version arising from this submission.
Available on ORBilu :
since 21 January 2026

Statistics


Number of views
7 (0 by Unilu)
Number of downloads
2 (0 by Unilu)

Scopus citations®
 
0
Scopus citations®
without self-citations
0
OpenCitations
 
0
OpenAlex citations
 
0
WoS citations
 
0

Bibliography


Similar publications



Contact ORBilu