Abstract :
[en] The introduction of Large Language Models (LLMs) has revolutionized Natural Language Processing (NLP), particularly in domains like manufacturing, where knowledge sharing and retrieval play crucial roles. Traditionally, manufacturers relied on extensive databases and manual querying processes, often limited by domain-specific vocabulary and collaboration challenges. LLMs, with their advanced capabilities in document retrieval and summarization, have spurred interest in Retrieval Augmented Generation (RAG) pipelines, particularly for enhancing decision support systems. Existing approaches to knowledge retrieval, such as hybrid methods combining sparse and dense retrieval, face limitations in interpretability and fine-tuning performance when applied to noisy or scarce manufacturing data. To address these gaps, this study introduces SEASONED (SEquentiAl denoiSing cONtrastive EncoDing), a novel LLM-based contrastive representation learning framework that incorporates triplet loss learning and attention heatmaps to improve retriever module performance and interpretability in RAG pipelines. By leveraging both TSDAE and contrastive fine-tuning, SEASONED enables Efficient sub-cluster segregation and differentiation between closely related sentences. Experimental evaluation on two datasets—one open-source and one proprietary manufacturing dataset—demonstrates that SEASONED enhances document retrieval performance by 28 to 58% (accuracy) and 21 to 62% (Mean Reciprocal Rank - MRR) and compared to six state-of-the-art architectures.
Funding text :
This research was funded in whole, or in part, by the
Luxembourg National Research Fund (FNR), grant reference
BRIDGES/2023/IS/18435508/ATTAINS.
Scopus citations®
without self-citations
0