Article (Scientific journals)
A large-scale microblog dataset and stock movement prediction based on Supervised Contrastive Learning model
Yang, Song; TANG, Xunzhu
2024In Neurocomputing, 584, p. 127583
Peer Reviewed verified by ORBi
 

Files


Full Text
Neurocomputing24.pdf
Author postprint (2.02 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
A large-scale microblog dataset; Natural Language Processing; Stock market; Supervised Contrastive Learning; Language processing; Large-scales; Learning methods; Micro-blog; Natural language processing; Natural languages; Neural-networks; Supervised contrastive learning; Textual data; Computer Science Applications; Cognitive Neuroscience; Artificial Intelligence
Abstract :
[en] The integration of Deep Neural Networks (DNN) with Natural Language Processing (NLP) technologies has opened new avenues in financial market prediction, particularly through the utilization of textual information. This study represents a significant advancement, which offers two primary contributions to stock trend prediction: (i) the exploitation of textual data (news, comments, microblogs) using advanced DNN architectures, enhancing market information utilization; (ii) significant improvement of the accuracy of predicting the direction of stock volatility by integrating textual and neural network technologies. Meanwhile, we have crawled, filtered, and constructed a large-scale microblog dataset. This dataset includes approximately 114,992 microblog textual data from 40 Science and Technology Innovation Board (STIB) companies in China during 2021. We conducted a comprehensive analysis using various DNN techniques, including Feedback Neural Networks (FNN), Supervised Contrastive Learning (SCL), Cross Entropy (CE), and Dual Contrastive Learning (DualCL), in conjunction with bag of words models, BERT, and Roberta compilers. Our findings reveal that the SCL method, when combined with microblog data, significantly increases prediction accuracy, particularly during the COVID-19 period. Furthermore, we discovered that using a cross-stock dataset enhances the accuracy of all prediction methods, and random allocation of microblog data leads to better results than sequential allocation. Additionally, we compared the efficacy of traditional models like the CAPM, three-factor, and five-factor models against neural network-based methods. Our results suggest a notable superiority of the SCL method in increasing prediction accuracy. Finally, applying our findings to real-world trading strategies, we demonstrated the practical advantages of using the SCL method in trading, evidenced by significant improvements across all performance indicators.
Disciplines :
Computer science
Author, co-author :
Yang, Song;  GuiZhou University of Finance and Economics, China
TANG, Xunzhu  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
External co-authors :
yes
Language :
English
Title :
A large-scale microblog dataset and stock movement prediction based on Supervised Contrastive Learning model
Publication date :
June 2024
Journal title :
Neurocomputing
ISSN :
0925-2312
eISSN :
1872-8286
Publisher :
Elsevier B.V.
Volume :
584
Pages :
127583
Peer reviewed :
Peer Reviewed verified by ORBi
Funding text :
The authors thank the editors and reviewers for their comments, which led to the improvement of this paper. This work is supported by the Guizhou Province Philosophy and Social Science Foundation of China (Grant No. 22GZQN03).
Available on ORBilu :
since 02 September 2025

Statistics


Number of views
31 (2 by Unilu)
Number of downloads
60 (1 by Unilu)

Scopus citations®
 
3
Scopus citations®
without self-citations
3
OpenCitations
 
0
OpenAlex citations
 
3
WoS citations
 
3

Bibliography


Similar publications



Contact ORBilu