Learning the Relation between Code Features and Code Transforms with Structured Prediction

big code; Code transform; machine learning; program repair; Big code; Code; Codes transform; Computer bugs; Feature transform; Features extraction; Machine-learning; Predictive models; Program repair; Synthesizer; Software; Computer Science - Software Engineering; Computer Science - Learning; Computer Science - Programming Languages

Abstract :

[en] To effectively guide the exploration of the code transform space for automated code evolution techniques, we present in this article the first approach for structurally predicting code transforms at the level of AST nodes using conditional random fields (CRFs). Our approach first learns offline a probabilistic model that captures how certain code transforms are applied to certain AST nodes, and then uses the learned model to predict transforms for arbitrary new, unseen code snippets. Our approach involves a novel representation of both programs and code transforms. Specifically, we introduce the formal framework for defining the so-called AST-level code transforms and we demonstrate how the CRF model can be accordingly designed, learned, and used for prediction. We instantiate our approach in the context of repair transform prediction for Java programs. Our instantiation contains a set of carefully designed code features, deals with the training data imbalance issue, and comprises transform constraints that are specific to code. We conduct a large-scale experimental evaluation based on a dataset of bug fixing commits from real-world Java projects. The results show that when the popular evaluation metric top-3 is used, our approach predicts the code transforms with an accuracy varying from 41% to 53% depending on the transforms. Our model outperforms two baselines based on history probability and neural machine translation (NMT), suggesting the importance of considering code structure in achieving good prediction accuracy. In addition, a proof-of-concept synthesizer is implemented to concretize some repair transforms to get the final patches. The evaluation of the synthesizer on the Defects4j benchmark confirms the usefulness of the predicted AST-level repair transforms in producing high-quality patches.

Disciplines :

Computer science

Author, co-author :

Yu, Zhongxing ; Shandong University, Jinan, China

Martinez, Matias ; Universitat Politècnica de Catalunya, Barcelona, Spain

Chen, Zimin ; KTH Royal Institute of Technology, Stockholm, Sweden

BISSYANDE, Tegawendé François d Assise ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

Monperrus, Martin ; KTH Royal Institute of Technology, Stockholm, Sweden

External co-authors :

yes

Language :

English

Title :

Learning the Relation between Code Features and Code Transforms with Structured Prediction

Publication date :

July 2023

Journal title :

IEEE Transactions on Software Engineering

ISSN :

0098-5589

eISSN :

1939-3520

Publisher :

Institute of Electrical and Electronics Engineers Inc.

Volume :

Issue :

Pages :

3872 - 3900

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://xplorestaging.ieee.org/ielx7/32/10185148/10130317.pdf?arnumber=10130317

Available on ORBilu :

since 10 December 2024

Statistics

Number of views

98 (1 by Unilu)

Number of downloads

41 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™

Bibliography

S. Gulwani et al., "Program synthesis," Found. Trends Program. Lang., vol. 4, no. 1/2, pp. 1-119, 2017.
F. Long and M. Rinard, "Automatic patch generation by learning correct code," in Proc. 43rd Annu. ACM SIGPLAN-SIGACT Symp. Princ. Program. Lang., 2016, pp. 298-312.
R. Joshi, G. Nelson, and K. Randall, "Denali: A goal-directed superoptimizer," SIGPLAN Notices, vol. 37, no. 5, pp. 304-314, May 2002.
T.Mens and T. Tourwé, "A survey of software refactoring," IEEE Trans. Softw. Eng., vol. 30, no. 2, pp. 126-139, Feb. 2004.
X. B. D. Le, D. Lo, and C. L. Goues, "History driven program repair," in Proc. IEEE 23rd Int. Conf. Softw. Anal. Evol. Reengineering, 2016, pp. 213-224.
K. Wang, R. Singh, and Z. Su, "Search, align, and repair: Data-driven feedback generation for introductory programming exercises," in Proc. 39th ACM SIGPLAN Conf. Program. Lang. Des. Implementation, 2018, pp. 481-495.
Z. Chen, S. Kommrusch, M. Tufano, L.-N. Pouchet, D. Poshyvanyk, and M. Monperrus, "SequenceR: Sequence-to-sequence learning for end-toend program repair," 2018, arXiv:1901.01808.
G. BakIr, T. Hofmann, B. Schölkopf, A. J. Smola, and B. Taskar, Predicting Structured Data. Cambridge, MA, USA: MIT Press, 2007.
J. D. Lafferty, A. McCallum, and F. C. N. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proc. 18th Int. Conf. Mach. Learn., 2001, pp. 282-289.
V. Raychev, M. Vechev, and A. Krause, "Predicting program properties from big code," in Proc. 42nd Annu. ACM SIGPLAN-SIGACT Symp. Princ. Program. Lang., 2015, pp. 111-124.
U. Alon, M. Zilberstein, O. Levy, and E. Yahav, "A general pathbased representation for predicting program properties," in Proc. 39th ACM SIGPLAN Conf. Program. Lang. Des. Implementation, 2018, pp. 404-419.
J. Zhou et al., "Graph neural networks: A review of methods and applications," 2018. [Online]. Available: https://arxiv.org/abs/1812.08434
M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, "An empirical study on learning bug-fixing patches in the wild via neural machine translation," ACMTrans. Softw. Eng.Methodol., vol. 28, no. 4, pp. 1-29, 2019.
Y. Pu,K.Narasimhan, A. Solar-Lezama, and R. Barzilay, "skp:Aneural program corrector for MOOCs," in Proc. ACM SIGPLAN Int. Conf. Syst. Program. Lang. Appl. Softw. Humanity, 2016, pp. 39-40.
A. Mesbah, A. Rice, E. Johnston, N. Glorioso, and E. Aftandilian, "DeepDelta: Learning to repair compilation errors," in Proc. 27th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., New York, NY, USA, 2019, pp. 925-936.
N. Meng, M. Kim, and K. S.McKinley, "Systematic editing: Generating program transformations from an example," in Proc. 32nd ACM SIGPLAN Conf. Program. Lang. Des. Implementation, New York, NY, USA, 2011, pp. 329-342.
N. Meng, M. Kim, and K. S. McKinley, "LASE: Locating and applying systematic edits by learning from examples," in Proc. IEEE 35th Int. Conf. Softw. Eng., 2013, pp. 502-511.
R. Rolim et al., "Learning syntactic program transformations from examples," in Proc. IEEE/ACM39th Int. Conf. Softw. Eng., 2017, pp. 404-415.
A. Koyuncu et al., "FixMiner: Mining relevant fix patterns for automated program repair," Empir. Softw. Eng., vol. 25, pp. 1980-2024, 2020.
R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen, "Boa: Ultra-largescale software repository and source-code mining," ACM Trans. Softw. Eng. Methodol., vol. 25, no. 1, pp. 7:1-7:34, Dec. 2015.
M. Soto, F. Thung, C.-P.Wong, C. Le Goues, and D. Lo, "A deeper look into bug fixes: Patterns, replacements, deletions, and additions," in Proc. 13th Int. Conf. Mining Softw. Repositories, New York, NY, USA, 2016, pp. 512-515.
V. Raychev, P. Bielik, M. Vechev, and A. Krause, "Learning programs from noisy data," ACM SIGPLAN Notices, vol. 51, no. 1, pp. 761-774, 2016.
S. Chakraborty, Y. Ding, M. Allamanis, and B. Ray, "CODIT: Code editing with tree-based neural models," IEEE Trans. Softw. Eng., vol. 48, no. 4, pp. 1385-1399, Apr. 2022.
T. Lutellier, H. V. Pham, L. Pang, Y. Li,M.Wei, and L. Tan, "CoCoNuT: Combining context-aware neural translation models using ensemble for program repair," in Proc. 29th ACM SIGSOFT Int. Symp. Softw. Testing Anal., New York, NY, USA, 2020, pp. 101-114.
Y. Li, S. Wang, and T. N. Nguyen, "DLFix: Context-based code transformation learning for automated program repair," in Proc. IEEE/ACM 42nd Int. Conf. Softw. Eng., New York, NY, USA, 2020, pp. 602-614.
N. Jiang, T. Lutellier, and L. Tan, "CURE: Code-aware neural machine translation for automatic program repair," in Proc. IEEE/ACM 43rd Int. Conf. Softw. Eng., 2021, pp. 1161-1173.
Q. Zhu et al., "A syntax-guided edit decoder for neural program repair," in Proc. 29th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., New York, NY, USA, 2021, pp. 341-353.
Z. Yu, M. Martinez, B. Danglot, T. Durieux, and M. Monperrus, "Alleviating patch overfitting with automatic test generation: A study of feasibility and effectiveness for the Nopol repair system," Empir. Softw. Eng., vol. 24, pp. 33-67, May 2019.
T. Gvero and V. Kuncak, "Synthesizing Java expressions from freeform queries," ACM SIGPLAN Notices, vol. 50, no. 10, pp. 416-432, 2015.
D. Perelman, S. Gulwani, T. Ball, and D. Grossman, "Type-directed completion of partial expressions," in Proc. 33rd ACM SIGPLAN Conf. Program. Lang. Des. Implementation, 2012, pp. 275-286.
D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA, USA: MIT Press, 2009.
D. Pinto, A. McCallum, X.Wei, andW. B. Croft, "Table extraction using conditional random fields," in Proc. 26th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2003, pp. 235-242.
F. Sha and F. Pereira, "Shallow parsing with conditional random fields," in Proc. Hum. Lang. Technol. Conf. North Amer. Chapter Assoc. Comput. Linguistics, 2003, pp. 213-220.
B. Settles, "Biomedical named entity recognition using conditional random fields and rich feature sets," in Proc. Int. Joint Workshop Natural Lang. Process. Biomed. Appl., Geneva, Switzerland, 2004, pp. 107-110.
X. He,R. S. Zemel, andM.A.Carreira-Perpinan, "Multiscale conditional random fields for image labeling," in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2004, pp. II-II.
C. Sutton and A. McCallum, "An introduction to conditional random fields," Found. Trends Mach. Learn., vol. 4, pp. 267-373, Apr. 2012.
J. Falleri, F. Morandat, X. Blanc, M. Martinez, andM.Monperrus, "Finegrained and accurate source code differencing," in Proc. IEEE/ACM29th Int. Conf. Autom. Softw. Eng., 2014, pp. 313-324.
B. Fluri, M. Wuersch, M. PInzger, and H. Gall, "Change distilling: Tree differencing for fine-grained source code change extraction," IEEE Trans. Softw. Eng., vol. 33, no. 11, pp. 725-743, Nov. 2007.
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, "Large margin methods for structured and interdependent output variables," J. Mach. Learn. Res., vol. 6, no. Sep, pp. 1453-1484, 2005.
D. Kim, J. Nam, J. Song, and S.Kim, "Automatic patch generation learned from human-written patches," in Proc. Int. Conf. Softw. Eng., Piscataway, NJ, USA, 2013, pp. 802-811.
R. K. Saha, Y. Lyu, H. Yoshida, and M. R. Prasad, "Elixir: Effective object oriented program repair," in Proc. IEEE/ACM 32nd Int. Conf. Autom. Softw. Eng., Piscataway, NJ, USA, 2017, pp. 648-659.
F. Long, P. Amidon, and M. Rinard, "Automatic inference of code transforms for patch generation," in Proc. 11th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2017, pp. 727-739.
E. C. Campos andM. d.A.Maia, "Discovering common bug-fix patterns: A large-scale observational study," J. Softw. Evol. Process, vol. 31, no. 7, 2019, Art. no. e2173.
F. Madeiral, T. Durieux, V. Sobreira, and M. Maia, "Towards an automated approach for bug fix pattern detection," 2018, arXiv: 1807.11286.
N.V. Chawla, K.W. Bowyer, L.O.Hall, andW. P.Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," J. Artif. Intell. Res., vol. 16, pp. 321-357, 2002.
C. Elkan, "The foundations of cost-sensitive learning," in Proc. 17th Int. Joint Conf. Artif. Intell., San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001, pp. 973-978.
D. M. Tax and R. P. Duin, "Support vector data description," Mach. Learn., vol. 54, no. 1, pp. 45-66, Jan. 2004.
D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization," Math. Program., vol. 45, pp. 503-528, Aug. 1989.
F. V. Jensen and F. Jensen, "Optimal junction trees," in Proc. 10th Int. Conf. Uncertainty Artif. Intell., San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1994, pp. 360-366.
J. Pearl, "ReverendBayes on inference engines:Adistributed hierarchical approach," in Proc. 2nd AAAI Conf. Artif. Intell., AAAI Press, 1982, pp. 133-136.
R. Pawlak, M. Monperrus, N. Petitprez, C. Noguera, and L. Seinturier, "SPOON: A library for implementing analyses and transformations of Java source code," Softw. Pract. Exp., vol. 46, pp. 1155-1179, 2015.
F. Jousse, R. Gilleron, I. Tellier, and M. Tommasi, "Conditional random fields for XML trees," in Proc. Workshop Mining Learn. Graphs, Berlin, Germany, 2006, pp. 1-8.
R. Just, D. Jalali, and M. D. Ernst, "Defects4J: A database of existing faults to enable controlled testing studies for Java programs," in Proc. Int. Symp. Softw. Testing Anal., 2014, pp. 437-440.
M. Allamanis, H. Peng, and C. Sutton, "A convolutional attention network for extreme summarization of source code," in Proc. Int. Conf. Mach. Learn., 2016, pp. 2091-2100.
G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush, "OpenNMT: Open-source toolkit for neural machine translation," 2017, arxiv:1701.02810.
M. Wen, J. Chen, R. Wu, D. Hao, and S.-C. Cheung, "Context-aware patch generation for better automated program repair," in Proc. 40th Int. Conf. Softw. Eng., 2018, pp. 1-11.
Q. Xin and S. P. Reiss, "Leveraging syntax-related code for automated program repair," in Proc. IEEE/ACM32nd Int. Conf. Autom. Softw. Eng., 2017, pp. 660-670.
Q. Xin and S. P. Reiss, "Identifying test-suite-overfitted patches through test case generation," in Proc. 26th ACM SIGSOFT Int. Symp. Softw. Testing Anal., New York, NY, USA, 2017, pp. 226-236.
J. Yang, A. Zhikhartsev, Y. Liu, and L. Tan, "Better test cases for better automated program repair," in Proc. 11th Joint Meeting Found. Softw. Eng.,NewYork,NY, USA, 2017, pp. 831-841. [Online].Available: https: //doi.org/10.1145/3106237.3106274
K. Liu et al., "On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for Java programs," in Proc. IEEE/ACM 42nd Int. Conf. Softw. Eng., 2020, pp. 615-627.
V. Murali, L. Qi, S. Chaudhuri, and C. Jermaine, "Neural sketch learning for conditional program generation," 2017, arXiv: 1703.05698.
N. Yaghmazadeh, Y. Wang, I. Dillig, and T. Dillig, "SQLizer: Query synthesis from natural language," Proc. ACM Program. Lang., vol. 1, no. OOPSLA, Oct. 2017, Art. no. 63.
G. De la Torre, R. Robbes, and A. Bergel, "Imprecisions diagnostic in source code deltas," in Proc. 15th Int. Conf. Mining Softw. Repositories, 2018, pp. 492-502.
N. Tsantalis, M. Mansouri, L. Eshkevari, D. Mazinanian, and D. Dig, "Accurate and efficient refactoring detection in commit history," in Proc. IEEE/ACM 40th Int. Conf. Softw. Eng., 2018, pp. 483-494.
Y. Qi, X. Mao, Y. Lei, Z. Dai, and C. Wang, "The strength of random search on automated program repair," in Proc. 36th Int. Conf. Softw. Eng., 2014, pp. 254-265.
R. Dyer, H. Rajan, H. A. Nguyen, and T. N. Nguyen, "Mining billions of ast nodes to study actual and potential usage of Java language features," in Proc. 36th Int. Conf. Softw. Eng., 2014, pp. 779-790.
M. Asaduzzaman, M. Ahasanuzzaman, C. K. Roy, and K. A. Schneider, "How developers use exception handling in Java?," in Proc. IEEE/ACM 13th Work. Conf. Mining Softw. Repositories, 2016, pp. 516-519.
J. Galenson, P. Reames, R. Bodik, B. Hartmann, and K. Sen, "CodeHint: Dynamic and interactive synthesis of code snippets," in Proc. 36th Int. Conf. Softw. Eng., 2014, pp. 653-663.
V. Raychev, M. Vechev, and E. Yahav, "Code completion with statistical language models," in Proc. 35th ACM SIGPLAN Conf. Program. Lang. Des. Implementation, 2014, pp. 419-428.
M. Raghothaman, Y. Wei, and Y. Hamadi, "SWIM: Synthesizing what I mean-code search and idiomatic snippet synthesis," in Proc. IEEE/ACM 38th Int. Conf. Softw. Eng., 2016, pp. 357-367.
Z. Yu, C. Bai, and K.-Y. Cai, "Does the failing test execute a single or multiple faults? An approach to classifying failing tests," in Proc. IEEE 37th Int. Conf. Softw. Eng., 2015, pp. 924-935.
Z. Yu, C. Bai, and K.-Y. Cai, "Mutation-oriented test data augmentation for GUI software fault localization," Inf. Softw. Technol., vol. 55, no. 12, pp. 2076-2098, 2013.
Z. Yu, H. Hu, C. Bai, K. Cai, and W. E. Wong, "GUI software fault localization using n-gram analysis," in Proc. IEEE 13th Int. Symp. High-Assurance Syst. Eng., 2011, pp. 325-332.
E. Murphy-Hill, C. Parnin, and A. P. Black, "Howwe refactor, and howwe knowit," IEEE Trans. Softw. Eng., vol. 38, no. 1, pp. 5-18, Jan./Feb. 2012.
W. Ni, J. Sunshine, V. Le, S. Gulwani, and T. Barik, "reCode: A lightweight find-and-replace interaction in the IDE for transforming code by example," in Proc. 34th Annu. ACM Symp. User Interface Softw. Technol., 2021, pp. 258-269.
S. Proksch, S.Nadi, S. Amann, andM. Mezini, "Enriching in-IDE process information with fine-grained source code history," in Proc. IEEE 24th Int. Conf. Softw. Anal. Evol. Reeng., 2017, pp. 250-260.
X. Huang, L. Zhang, B. Wang, F. Li, and Z. Zhang, "Feature clustering based support vector machine recursive feature elimination for gene selection," Appl. Intell., vol. 48, pp. 594-607, 2018.
B. F. Darst, K. C. Malecki, and C. D. Engelman, "Using recursive feature elimination in random forest to account for correlated variables in high dimensional data," BMC Genet., vol. 19, no. 1, pp. 1-6, 2018.
T. Menzies, J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors," IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 2-13, Jan. 2007.
T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, "Defect prediction from static code features: Current results, limitations, new approaches," Autom. Softw. Eng., vol. 17, no. 4, pp. 375-407, 2010.
M. Bruch, M. Monperrus, and M. Mezini, "Learning from examples to improve code completion systems," in Proc. 7th JointMeeting Eur. Softw. Eng. Conf. ACM Symp. Found. Softw. Eng., 2009, pp. 213-222.
M. Pradel and K. Sen, "DeepBugs: A learning approach to name-based bug detection," Proc. ACM Program. Lang., vol. 2, no. OOPSLA, pp. 147:1-147:25, Oct. 2018.
J. He, P. Ivanov, P. Tsankov, V. Raychev, and M. Vechev, "Debin: Predicting debug information in stripped binaries," in Proc.ACMSIGSAC Conf. Comput. Commun. Secur., 2018, pp. 1667-1680.
R. Paletov, P. Tsankov, V. Raychev, and M. Vechev, "Inferring crypto API rules from code changes," ACM SIGPLAN Notices, vol. 53, no. 4, pp. 450-464, 2018.
M. Choetkiertikul, H. K. Dam, T. Tran, T. Pham, A. Ghose, and T. Menzies, "A deep learning model for estimating story points," IEEE Trans. Softw. Eng., vol. 45, no. 7, pp. 637-656, Jul. 2019.
C. Watson, M. Tufano, K. Moran, G. Bavota, and D. Poshyvanyk, "On learning meaningful assert statements for unit test cases," in Proc. IEEE/ACM 42nd Int. Conf. Softw. Eng., 2020, pp. 1398-1409.
A. Rice, E. Aftandilian, C. Jaspan, E. Johnston, M. Pradel, andY.Arroyo-Paredes, "Detecting argument selection defects," Proc. ACM Program. Lang., vol. 1, no. OOPSLA, Oct. 2017, Art. no. 104.
X. Si, H. Dai, M. Raghothaman, M. Naik, and L. Song, "Learning loop invariants for program verification," in Proc. 32nd Int. Conf. Neural Inf. Process. Syst., Red Hook, NY, USA: Curran Associates Inc., 2018, pp. 7762-7773.
M. Allamanis, E. T. Barr, S. Ducousso, and Z. Gao, "Typilus: Neural type hints," in Proc. 41st ACM SIGPLAN Conf. Program. Lang. Des. Implementation, New York, NY, USA, 2020, pp. 91-105.
J. Eberhardt, S. Steffen, V. Raychev, and M. Vechev, "Unsupervised learning of API aliasing specifications," in Proc. 40th ACM SIGPLAN Conf. Program. Lang. Des. Implementation, 2019, pp. 745-759.
H. Oh, H. Yang, and K. Yi, "Learning a strategy for adapting a program analysis via Bayesian optimisation," ACM SIGPLAN Notices, vol. 50, no. 10, pp. 572-588, 2015.
V. Murali, S. Chaudhuri, and C. Jermaine, "Bayesian sketch learning for program synthesis," in Proc. Int. Conf. Learn. Representations, 2018, pp. 1-17.
J. He, M. Balunovic, N. Ambroladze, P. Tsankov, and M. Vechev, "Learning to fuzz from symbolic execution with application to smart contracts," in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., New York, NY, USA, 2019, pp. 531-548.
M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, "A survey of machine learning for big code and naturalness," ACM Comput. Surv., vol. 51, no. 4, 2018, Art. no. 81.
J. Andersen and J. L. Lawall, "Generic patch inference," Autom. Softw. Eng., vol. 17, no. 2, pp. 119-148, 2010.
A. Miltner et al., "On the fly synthesis of edit suggestions," Proc. ACM Program. Lang., vol. 3, no. OOPSLA, pp. 1-29, 2019.
J. Bader, A. Scott, M. Pradel, and S. Chandra, "Getafix: Learning to fix bugs automatically," Proc. ACM Program. Lang., vol. 3, no. OOPSLA, pp. 1-27, 2019.
S. Brody, U. Alon, and E. Yahav, "A structural model for contextual code changes," Proc. ACM Program. Lang., vol. 4, 2020, Art. no. 215.
P. Yin, G. Neubig, M. Allamanis, M. Brockschmidt, and A. L. Gaunt, "Learning to represent edits," 2018, arXiv: 1810.13337.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in Proc. Int. Conf. Learn. Representations, 2013, pp. 1-12.
U. Alon, M. Zilberstein, O. Levy, and E. Yahav, "code2vec: Learning distributed representations of code," Proc. ACM Program. Lang., vol. 3, no. POPL, pp. 1-29, 2019.
D. DeFreez, A. V. Thakur, and C. Rubio-Gonz ález, "Path-based function embedding and its application to specification mining," in Proc. 26th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2018, pp. 423-433.
M. Allamanis, M. Brockschmidt, and M. Khademi, "Learning to represent programs with graphs," in Proc. Int. Conf. Learn. Representations, 2018, pp. 1-16.
J. Henkel, S. Lahiri, B. Liblit, and T. Reps, "Code vectors: Understanding programs through embedded abstracted symbolic traces," in Proc. 26th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2018, pp. 163-174.
V. J. Hellendoorn and P. Devanbu, "Are deep neural networks the best choice for modeling source code?," in Proc. 11th Joint Meeting Found. Softw. Eng., 2017, pp. 763-773.
S.Wang,T. Liu, andL.Tan, "Automatically learning semantic features for defect prediction," in Proc. 38th Int.Conf. Softw.Eng., 2016, pp. 297-308.
C. Le Goues, T. Nguyen, S. Forrest, andW.Weimer, "GenProg:Ageneric method for automatic software repair," IEEE Trans. Softw. Eng., vol. 38, no. 1, pp. 54-72, Jan./Feb. 2012.
H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra, "SemFix: Program repair via semantic analysis," in Proc. 35th Int. Conf. Softw. Eng., 2013, pp. 772-781.
M. Monperrus, "Automatic software repair: A bibliography," ACM Comput. Surveys, vol. 51, pp. 1-24, 2017.
H. Zhong and Z. Su, "An empirical study on real bug fixes," in Proc. IEEE/ACM 37th IEEE Int. Conf. Softw. Eng., 2015, pp. 913-923.
J. Jiang, Y. Xiong, H. Zhang, Q. Gao, and X. Chen, "Shaping program repair space with existing patches and similar code," in Proc. 27th ACM SIGSOFT Int. Symp. Softw. Testing Anal., 2018, pp. 298-309.
B. Bichsel, V. Raychev, P. Tsankov, andM. Vechev, "Statistical deobfuscation of Android applications," in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., New York, NY, USA, 2016, pp. 343-355.
D. Ye, Z. Xing, C. Y. Foo, Z. Q. Ang, J. Li, and N. Kapre, "Softwarespecific named entity recognition in software engineering social content," in Proc. IEEE 23rd Int. Conf. Softw. Anal. Evol. Reengineering, 2016, pp. 90-101.