App review; Bug finding; Bug report; Bug similarity; Bug reports; Embeddings; Evolution process; Matchings; Mobile app; Natural languages texts; Software development teams; Software
Abstract :
[en] Software development teams generally welcome any effort to expose bugs in their code base. In this work, we build on the hypothesis that mobile apps from the same category (e.g., two web browser apps) may be affected by similar bugs in their evolution process. It is therefore possible to transfer the experience of one historical app to quickly find bugs in its new counterparts. This has been referred to as collaborative bug finding in the literature. Our novelty is that we guide the bug finding process by considering that existing bugs have been hinted within app reviews. Concretely, we design the BugRMSys approach to recommend bug reports for a target app by matching historical bug reports from apps in the same category with user app reviews of the target app. We experimentally show that this approach enables us to quickly expose and report dozens of bugs for targeted apps such as Brave (web browser app). BugRMSys ’s implementation relies on DistilBERT to produce natural language text embeddings. Our pipeline considers similarities between bug reports and app reviews to identify relevant bugs. We then focus on the app review as well as potential reproduction steps in the historical bug report (from a same-category app) to reproduce the bugs. Overall, after applying BugRMSys to six popular apps, we were able to identify, reproduce and report 20 new bugs: among these, 9 reports have been already triaged, 6 were confirmed, and 4 have been fixed by official development teams.
Disciplines :
Computer science
Author, co-author :
TANG, Xunzhu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
TIAN, Haoye ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > TruX > Team Tegawendé François d A BISSYANDE
KONG, Pingfan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > TruX > Team Jacques KLEIN
EZZINI, Saad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > TruX > Team Jacques KLEIN ; School of Computing and Communications, Lancaster University, Lancaster, United Kingdom
LIU, Kui ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON ; Huawei, Hangzhou City, China
Xia, Xin; Huawei, Hangzhou City, China
KLEIN, Jacques ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Bissyandé, Tegawendé F.; SnT, University of Luxembourg, Luxembourg City, Luxembourg
H2020 European Research Council National Natural Science Foundation of China Natural Science Foundation of Jiangsu Province, China Open Project Program of the State Key Labo- ratory of Mathematical Engineering and Advanced Computing
Funding text :
This work is supported by the NATURAL project, which has received funding from the European Research Council (ERC) under the European Union\u2019s Horizon 2020 research and innovation programme (grant No. 949014).
D. Amalfitano V. Riccio A.C. Paiva A.R. Fasolino Why does the orientation change mess up my android application? from gui failures to code faults Softw Test Verif Rel 2018 28 1 e1654 10.1002/stvr.1654
Bevan J, Werner L, McDowell C (2002) Guidelines for the use of pair programming in a freshman programming class. In: Proceedings 15th conference on software engineering education and training (CSEE &T 2002). IEEE, pp 100–107
Calcagno C, Distefano D, Dubreil J, Gabi D, Hooimeijer P, Luca M, O’Hearn P, Papakonstantinou I, Purbrick J, Rodriguez D (2015) Moving fast with software verification. In: NASA Formal methods symposium. Springer, pp 3–11
Cames K, II-Grants AA (2006) Recommendation. City
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Fan L, Su T, Chen S, Meng G, Liu Y, Xu L, Pu G, Su Z (2018) Large-scale analysis of framework-specific exceptions in android apps. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE). IEEE, pp 408–419
Gao C, Zeng J, Xia X, Lo D, Lyu MR, King I (2019) Automating app review response generation. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 163–175
Ghosh J, Strehl A (2006) Similarity-based text clustering: a comparative study. In: Grouping multidimensional data. Springer, pp 73–97
E. Haddi X. Liu Y. Shi The role of text pre-processing in sentiment analysis Procedia Comput Sci 2013 17 26 32 10.1016/j.procs.2013.05.005
Haering M, Stanik C, Maalej W (2021) Automatically matching bug reports with related app reviews. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE). IEEE, pp 970–981
X. Han D. Carroll T. Yu Reproducing performance bug reports in server applications: the researchers’ experiences J Syst Softw 2019 156 268 282 10.1016/j.jss.2019.06.100
S. Hassan C. Tantithamthavorn C.P. Bezemer A.E. Hassan Studying the dialogue between users and developers of free apps in the google play store Empir Softw Eng 2018 23 3 1275 1312 10.1007/s10664-017-9538-9
Hu G, Yuan X, Tang Y, Yang J (2014) Efficiently, effectively detecting mobile app bugs with appdoctor. In: Proceedings of the ninth European conference on computer systems, pp 1–15
Jiang H, Yang H, Qin S, Su Z, Zhang J, Yan J (2017) Detecting energy bugs in android apps using static analysis. In: International conference on formal engineering methods. Springer, pp 192–208
Lee S, Dolby J, Ryu S (2016) Hybridroid: static analysis framework for android hybrid applications. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 250–261
Li H, Fang C, Wei Z, Chen Z (2019) Cocotest: collaborative crowdsourced testing for android applications. In: Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pp 390–393
Liu Z, Chen C, Wang J, Huang Y, Hu J, Wang Q (2022) Guided bug crush: assist manual gui testing of android apps via hint moves. arXiv:2201.12085
Li H, Zhang L, Zhang L, Shen J (2010) A user satisfaction analysis approach for software evolution. In: 2010 IEEE international conference on progress in informatics and computing, vol 2. IEEE, pp 1093–1097
Long T, Yoon I, Memon A, Porter A, Sussman A (2014) Enabling collaborative testing across shared software components. In: Proceedings of the 17th international ACM Sigsoft symposium on Component-based software engineering, pp 55–64
Long T, Yoon I, Porter A, Memon A, Sussman A (2016) Coordinated collaborative testing of shared software components. In: 2016 IEEE international conference on software testing, verification and validation (ICST). IEEE, pp 364–374
Loper E, Bird S (2002) Nltk: the natural language toolkit. arXiv preprint cs/0205028
W. Maalej Z. Kurtanović H. Nabil C. Stanik On the automatic classification of app reviews Requirements Eng 2016 21 3 311 331 10.1007/s00766-016-0251-9
T. Mahmood F. Ricci A. Venturini Improving recommendation effectiveness: adapting a dialogue strategy in online travel planning Inf Technol Tour 2009 11 4 285 302 10.3727/109830510X12670455864203
Mao K, Harman M, Jia Y (2017) Crowd intelligence enhances automated mobile testing. In: 2017 32nd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 16–26
Martens D, Maalej W (2019) Extracting and analyzing context information in user-support conversations on twitter. In: 2019 IEEE 27th international requirements engineering conference (RE). IEEE, pp 131–141
S. McIlroy N. Ali A.E. Hassan Fresh apps: an empirical study of frequently-updated mobile apps in the google play store Empir Softw Eng 2016 21 3 1346 1370 10.1007/s10664-015-9388-2
Oh J, Kim D, Lee U, Lee JG, Song J (2013) Facilitating developer-user interactions with mobile app review digests. In: CHI’13 extended abstracts on human factors in computing systems, pp 1809–1814
Poerner N, Waltinger U, Schütze H (2020) E-BERT: efficient-yet-effective entity embeddings for BERT. In: Findings of the association for computational linguistics: EMNLP 2020, pp 803–818. Association for Computational Linguistics, Online. https://doi.org/10.18653/v1/2020.findings-emnlp.71. https://aclanthology.org/2020.findings-emnlp.71
Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv:1910.01108
Sefferman A (2015) Survey on user ratings and reviews. https://www.apptentive.com/blog/2020/02/04/mobile-app-ratings-and-reviews/. Accessed 2015
Shani G, Gunawardana A (2011) Evaluating recommendation systems. In: Recommender systems handbook. Springer, pp 257–297
Sitikhu P, Pahi K, Thapa P, Shakya S (2019) A comparison of semantic similarity methods for maximum human interpretability. In: 2019 artificial intelligence for transforming business and society (AITB), vol 1. IEEE, pp 1–4
Stanik C, Haering M, Maalej W (2019) Classifying multilingual user feedback using traditional machine learning and deep learning. In: 2019 IEEE 27th international requirements engineering conference workshops (REW). IEEE, pp 220–226
Su T, Fan L, Chen S, Liu Y, Xu L, Pu G, Su Z (2020) Why my app crashes understanding and benchmarking framework-specific exceptions of android apps. IEEE Trans Softw Eng
Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: 2011 26th IEEE/ACM international conference on automated software engineering (ASE 2011). IEEE, pp 253–262
Sun J, Su T, Li J, Dong Z, Pu G, Xie T, Su Z (2021) Understanding and finding system setting-related defects in android apps. In: Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis, pp 204–215
Talukder MAI, Shahriar H, Qian K, Rahman M, Ahamed S, Wu F, Agu E (2019) Droidpatrol: a static analysis plugin for secure mobile software development. In: 2019 IEEE 43rd annual computer software and applications conference (COMPSAC), vol 1. IEEE, pp 565–569
Tan SH, Li Z (2020) Collaborative bug finding for android apps. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 1335–1347
Van Der Veen V, Bos H, Rossow C (2013) Dynamic analysis of android malware. Internet & Web Technology Master thesis, VU University Amsterdam
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on Software engineering, pp 461–470
W.J. Wilbur K. Sirotkin The automatic identification of stop words J Inf Sci 1992 18 1 45 55 10.1177/016555159201800106
Wooditch A, Johnson NJ, Solymosi R, Ariza JM, Langton S (2021) Getting to know your data. In: A beginner’s guide to statistics for criminology and criminal justice using R. Springer, pp 21–38
Yang X, Lo D, Xia X, Bao L, Sun J (2016) Combining word embedding with information retrieval to recommend similar bug reports. In: 2016 IEEE 27Th international symposium on software reliability engineering (ISSRE). IEEE, pp 127–137
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 689–699
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: 2012 34th International conference on software engineering (ICSE). IEEE, pp 14–24
T. Zimmermann R. Premraj N. Bettenburg S. Just A. Schroter C. Weiss What makes a good bug report? IEEE Trans Software Eng 2010 36 5 618 643 10.1109/TSE.2010.63