Form filling,; Data entry forms,; Completeness requirement relaxation,; Machine Learning; Software data quality,; User interfaces
Abstract :
[en] Data entry forms use completeness requirements to specify the fields that are required or optional
to fill for collecting necessary information from different types of users. However, because of the evolving nature of software, some required fields may not be applicable for certain types of users anymore.
Nevertheless, they may still be incorrectly marked as required in the form; we call such fields obsolete
required fields. Since obsolete required fields usually have “not-null” validation checks before submitting
the form, users have to enter meaningless values in such fields in order to complete the form submission.
These meaningless values threaten the quality of the filled data, and could negatively affect stakeholders
or learning-based tools that use the data. To avoid users filling meaningless values, existing techniques
usually rely on manually written rules to identify the obsolete required fields and relax their completeness
requirements. However, these techniques are ineffective and costly.
In this paper, we propose LACQUER, a learning-based automated approach for relaxing the completeness requirements of data entry forms. LACQUER builds Bayesian Network models to automatically
learn conditions under which users had to fill meaningless values. To improve its learning ability, LACQUER identifies the cases where a required field is only applicable for a small group of users, and uses
SMOTE, an oversampling technique, to generate more instances on such fields for effectively mining dependencies on them. During the data entry session, LACQUER predicts the completeness requirement
of a target based on the already filled fields and their conditional dependencies in the trained model.
Our experimental results show that LACQUER can accurately relax the completeness requirements
of required fields in data entry forms with precision values ranging between 0.76 and 0.90 on different
datasets. LACQUER can prevent users from filling 20% to 64% of meaningless values, with negative
predictive values (i.e., the ability to correctly predict a field as “optional”) between 0.72 and 0.91.
Furthermore, LACQUER is efficient; it takes at most 839 ms to predict the completeness requirement of
an instance.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Computer science
Author, co-author :
BELGACEM, Hichem ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Xiaochen Li; Dalian University of Technology
BIANCULLI, Domenico ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
External co-authors :
yes
Language :
English
Title :
Learning-Based Relaxation of Completeness Requirements for Data Entry Forms
Publication date :
March 2024
Journal title :
ACM Transactions on Software Engineering and Methodology
ISSN :
1049-331X
Publisher :
Association for Computing Machinery (ACM), United States
Atia M. Albhbah and Mick J. Ridley. 2010. Using RuleML and database metadata for automatic generation of web forms. In ISDA'10. IEEE, 790-794.
Maysoon Aldekhail and Djamal Ziani. 2017. Intelligent method for software requirement conflicts identification and removal: Proposed framework and analysis. Int. J. Comput. Sci. Netw. Secur. 17, 12 (2017), 91-95.
Alexander Avidan and Charles Weissman. 2012. Record completeness and data concordance in an anesthesia information management system using context-sensitive mandatory data-entry fields. Int. J. Medic. Inform. 81, 3 (2012), 173-181.
Tanya Barrett, Karen Clark, Robert Gevorgyan, Vyacheslav Gorelenkov, Eugene Gribov, Ilene Karsch-Mizrachi, Michael Kimelman, Kim D. Pruitt, Sergei Resenchuk, Tatiana Tatusova, Eugene Yaschenko, and James Ostell. 2012. BioProject and BioSample databases at NCBI: facilitating capture and organization ofmetadata. Nucleic Acids Research 40, D1 (2012), D57-D63. https://doi.org/10.1093/nar/gkr1163
Hichem Belgacem, Xiaochen Li, Domenico Bianculli, and Lionel Briand. 2022. A machine learning approach for automated filling of categorical fields in data entry forms. ACM Trans. Softw. Eng. Methodol. 32, 2 (Apr. 2022), 40 pages. DOI:https://doi.org/10.1145/3533021
Morten Bohøj, Niels Olof Bouvin, and Henrik Gammelmark. 2011. AdapForms: A framework for creating and validating adaptive forms. In ICWE'11. Springer Berlin, 105-120.
Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press, Boca Raton, FL.
Brooklyn Lupari. 2015. national-survey-on-drug-use-and-health. Retrieved from https://catalog.data.gov/dataset/nat ional-survey-on-drug-use-and-health-nsduh-2015
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321-357.
Kuang Chen, Harr Chen, Neil Conway, Joseph M. Hellerstein, and Tapan S. Parikh. 2011. Usher: Improving data quality with dynamic forms. IEEE Trans. Knowl. Data Eng. 23, 8 (2011), 1138-1153.
William W. Cohen. 1995. Fast effective rule induction. In Machine Learning Proceedings 1995. Elsevier, San Francisco, CA, 115-123.
Fabiano Dalpiaz, Ivor Van Der Schalk, Sjaak Brinkkemper, Fatma Basak Aydemir, and Garm Lucassen. 2019. Detecting terminological ambiguity in user stories: Tool and experimentation. Inf. Softw. Technol. 110 (2019), 3-16.
Karel Dejaeger, Thomas Verbraken, and Bart Baesens. 2012. Toward comprehensible software fault prediction models using Bayesian network classifiers. IEEE Trans. Softw. Eng. 39, 2 (2012), 237-257.
Ofer Dekel, Ohad Shamir, and Lin Xiao. 2010. Learning to classify with missing and corrupted features. Mach. Learn. 81, 2 (2010), 149-178.
Isabel M. Del Águila and José Del Sagrado. 2016. Bayesian networks for enhancement of requirements engineering: A literature review. Requir. Eng. 21 (2016), 461-480.
A. Elbibas and M. J. Ridley. 2004. Developingweb entry forms based on METADATA. In Proceedings of the International Workshop on Web Quality in Conjunction with ICWE. Citeseer, Trinity College Dublin, Dublin, 113-118 pages.
Sergio Firmenich, Vincent Gaits, Silvia Gordillo, Gustavo Rossi, and Marco Winckler. 2012. Supporting users tasks with personal information management and web forms augmentation. In ICWE'12. Springer, Berlin, 268-282.
Susan Fowler and Victor Stanwick. 2004. Web Application Design Handbook: Best Practices for Web-based Software. Morgan Kaufmann, Amsterdam, Boston, MA.
Martin R. Frank and Pedro Szekely. 1998. Adaptive forms: An interaction technique for entering structured data. Knowl.-based Syst. 11, 1 (1998), 37-45.
Nir Friedman, Dan Geiger, and Moises Goldszmidt. 1997. Bayesian network classifiers. Mach. Learn. 29, 2-3 (1997), 131-163.
José A. Gámez, Juan L. Mateo, and José M. Puerta. 2011. Learning Bayesian networks by hill climbing: Efficient methods based on progressive restriction of the neighborhood. Data Min. Knowl. Discov. 22, 1-2 (2011), 106-148.
C. Ghezzi. 2017. Of software and change. J. Softw.: Evolut. Process 29, 9 (2017), e1888. DOI: https://doi.org/10.1002/sm r.1888
Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In ICSE'15, Vol. 1. IEEE, 789-800.
Rafael S. Gonçalves, Martin J. O'Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen. 2017. The CEDAR workbench: An ontology-assisted environment for authoring metadata that describe scientific experiments. In ISWC'17 (LNCS, Vol. 10588). Springer International Publishing, Cham, 103-110.
Google LLC. 2016. Google Forms. Retrieved from https://docs.google.com/forms/
Carrie Heeter. 2000. Interactivity in the context of designed experiences. J. Interact. Advert. 1, 1 (2000), 3-14.
Caroline Jarrett and Gerry Gaffney. 2009. Forms that Work: Designing Web Forms for Usability. Morgan Kaufmann, Amsterdam, Boston, USA.
Kawal Jeet, Nitin Bhatia, and Rajinder Singh Minhas. 2011. A Bayesian network based approach for software defects prediction. ACM SIGSOFT Softw. Eng. Notes 36, 4 (2011), 1-5.
Justin M. Johnson and Taghi M. Khoshgoftaar. 2019. Survey on deep learning with class imbalance. J. Big Data 6, 1 (2019), 1-54.
Oksana Kulyk, Benjamin Maximilian Reinheimer, and Melanie Volkamer. 2017. Sharing information with web services-A mental model approach in the context of optional information. In HAS'17. Springer, Cham, 675-690.
Mario Linares-Vásquez, Collin McMillan, Denys Poshyvanyk, and Mark Grechanik. 2014. On using machine learning to automatically classify software applications into domain categories. Empir. Softw. Eng. 19, 3 (2014), 582-618.
Ruchika Malhotra and Megha Khanna. 2017. An empirical study for software change prediction using imbalanced data. Empir. Softw. Eng. 22, 6 (2017), 2806-2851.
Marcos Martínez-Romero, Martin J. O'Connor, Attila L. Egyedi, Debra Willrett, Josef Hardi, John Graybeal, and Mark A. Musen. 2019. Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases. Datab. J. Biol. Datab. Curat. 2019 (2019), 25 pages.
Emilia Mendes, Mirko Perkusich, Vitor Freitas, and Joao Nunes. 2018. Using Bayesian network to estimate the value of decisions within the context of value-based software engineering. In EASE'18. Association for Computing Machinery, New York, NY, 90-100.
Microsoft. 2013. Change the Default Tab Order for Controls on a Form. Retrieved from https://support.microsoft.co m/en-us/office/change-The-default-tab-order-for-controls-on-A-form-03d1599a-debf-4b66-a95b-e3e744210afe
Momentive Inc. 1999. Survey Monkey. Retrieved from https://www.surveymonkey.com/
Kivanç Muslu, Yuriy Brun, and Alexandra Meliou. 2015. Preventing data errors with continuous testing. In ISSTA'15. Association for Computing Machinery, New York, NY, 373-384.
Ahmet Okutan and Olcay Taner Yildiz. 2014. Software defect prediction using Bayesian networks. Empir. Softw. Eng. 19 (2014), 154-181.
Sebastian Proksch, Johannes Lerch, and Mira Mezini. 2015. Intelligent code completion with Bayesian networks. ACM Trans. Softw. Eng. Methodol. 25, 1 (2015), 1-31.
Adrian E. Raftery. 1995. Bayesian model selection in social research. Sociol. Methodol. 25 (1995), 111-163.
Santiago del Rey Juárez, Silverio Juan Martínez Fernández, and Antonio Salmerón Cerdán. 2023. Bayesian network analysis of software logs for data-driven software maintenance. IET Softw. 3, 17 (2023), 1-19.
Rocketgenius Inc. 2007. Graviy Forms. Retrieved from https://www.gravityforms.com/
Seyed Ehsan Roshan and Shahrokh Asadi. 2020. Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization. Eng. Applic. Artif. Intell. 87 (2020), 103319.
Halima Sadia, Syed Qamar Abbas, and Mohammad Faisal. 2022. A Bayesian network-based software requirement complexity prediction model. In ICCMDE'21. Springer, 197-213.
Andrew Sears and Ying Zha. 2003. Data entry for mobile devices using soft keyboards: Understanding the effects of keyboard size and user tasks. J. Hum.-comput. Interact. 16, 2 (2003), 163-184.
Mirjam Seckler, Silvia Heinz, Javier A. Bargas-Avila, Klaus Opwis, and Alexandre N. Tuch. 2014. Designing usable web forms: Empirical evaluation of web form improvement guidelines. In CHI'14. Association for Computing Machinery, New York, NY, 1275-1284.
Jiuling Song, Yonghe Zhou, Juren Zhang, and Kewei Zhang. 2017. Structural, expression and evolutionary analysis of the non-specific phospholipase C gene family in Gossypium hirsutum. BMC Genom. 18, 1 (2017), 1-15.
Qinbao Song, Yuchen Guo, and Martin Shepperd. 2018. A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans. Softw. Eng. 45, 12 (2018), 1253-1269.
Rasmus Strømsted, Hugo A. López, Søren Debois, and Morten Marquard. 2018. Dynamic evaluation forms using declarative modeling. BPM (Dissert./Demos/Industr.) 2196 (2018), 172-179.
Paul Thistlewaite and Steve Ball. 1996. Active forms. Comput. Netw. ISDN Syst. 28, 7-11 (1996), 1355-1364.
Costas Vassilakis, Giorgos Laskaridis, Giorgos Lepouras, Stathis Rouvas, and Panagiotis Georgiadis. 2003. A framework for managing the lifecycle of transactional e-government services. Telemat. Inform. 20, 4 (2003), 315-329.
Wan M. N. Wan-Kadir and Pericles Loucopoulos. 2004. Relating evolving business rules to software design. J. Syst. Archit. 50, 7 (2004), 367-382.
Zeqing Wu and Weishen Chu. 2021. Sampling strategy analysis of machine learning models for energy consumption prediction. In SEGE'21. IEEE, 77-81.
Junwen Yang, Utsav Sethi, Cong Yan, Alvin Cheung, and Shan Lu. 2020. Managing data constraints in database-backed web applications. In ICSE'20. IEEE, Association for Computing Machinery, New York, NY, 1098-1109.