Article (Scientific journals)
Data Optimization through Compression Methods Using Information Technology
Malyk, Igor V.; Kyrychenko, Yevhen; Gorbatenko, Mykola et al.
2025In International Journal of Information Technology and Computer Science, 17 (5), p. 84 - 99
Peer reviewed
 

Files


Full Text
IJITCS-V17-N5-7.pdf
Author postprint (1.08 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Compact Data Representation; Compressed Copy of Tabular Data; Data Similarity; Information Technology; Control and Systems Engineering; Information Systems; Computer Science Applications; Computer Networks and Communications; Computational Theory and Mathematics; Artificial Intelligence
Abstract :
[en] Efficient comparison of heterogeneous tabular datasets is difficult when sources are unknown or weakly documented. We address this problem by introducing a unified, type-aware framework that builds compact data representations (CDRs)—concise summaries sufficient for downstream analysis—and a corresponding similarity graph (and tree) over a data corpus. Our novelty is threefold: (i) a principled vocabulary and procedure for constructing CDRs per variable type (factor, time, numeric, string), (ii) a weighted, type-specific similarity metric we call Data Information Structural Similarity (DISS) that aggregates distances across heterogeneous summaries, and (iii) an end-to-end, cloud-scalable real-ization that supports large corpora. Methodologically, factor variables are summarized by frequency tables; time variables by fixed-bin histograms; numeric variables by moment vectors (up to the fourth order); and string variables by TF–IDF vectors. Pairwise similarities use Hellinger, Wasserstein (p=1), total variation, and L1/L2 distances, with MAE/MAPE for numeric summaries; the DISS score combines these via learned or user-set weights to form an adjacency graph whose minimum-spanning tree yields a similarity tree. In experiments on multi-source CSVs, the approach enables accurate retrieval of closest datasets and robust corpus-level structuring while reducing storage and I/O. This contributes a repro-ducible pathway from raw tables to a similarity tree, clarifying terminology and providing algorithms that practitioners can deploy at scale.
Disciplines :
Mathematics
Author, co-author :
Malyk, Igor V.  ;  Department of Mathematical Problems of Control and Cybernetics, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, Ukraine
Kyrychenko, Yevhen ;  Department of Mathematical Modeling, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, Ukraine
Gorbatenko, Mykola ;  Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
LUKASHIV, Taras   ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Clinical and Translational Informatics ; Department of Mathematical Problems of Control and Cybernetics, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, Ukraine
 These authors have contributed equally to this work.
External co-authors :
yes
Language :
English
Title :
Data Optimization through Compression Methods Using Information Technology
Publication date :
October 2025
Journal title :
International Journal of Information Technology and Computer Science
ISSN :
2074-9007
eISSN :
2074-9015
Publisher :
Modern Education and Computer Science Press
Volume :
17
Issue :
5
Pages :
84 - 99
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 19 January 2026

Statistics


Number of views
15 (2 by Unilu)
Number of downloads
5 (1 by Unilu)

Scopus citations®
 
2
Scopus citations®
without self-citations
2
OpenCitations
 
0
OpenAlex citations
 
0

Bibliography


Similar publications



Contact ORBilu