CodeS: Towards Code Model Generalization Under Distribution Shift

HU, Qiang; GUO, Yuejun; Xie, Xiaofei; CORDY, Maxime; PAPADAKIS, Mike; Ma, Lei; LE TRAON, Yves

doi:10.1109/ICSE-NIER58687.2023.00007

Download

Paper published in a journal (Scientific congresses, symposiums and conference proceedings)

CodeS: Towards Code Model Generalization Under Distribution Shift

HU, Qiang; GUO, Yuejun; Xie, Xiaofei et al.

2023 • In IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Results, p. 1–6

Peer reviewed

Permalink
https://hdl.handle.net/10993/59233

DOI
10.1109/ICSE-NIER58687.2023.00007

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ICSE23_NIER_CodeS (1).pdf

Author postprint (146.86 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Disciplines :

Computer science

Author, co-author :

HU, Qiang ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

GUO, Yuejun ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON

Xie, Xiaofei

CORDY, Maxime ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

PAPADAKIS, Mike ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

Ma, Lei

LE TRAON, Yves ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

External co-authors :

yes

Language :

English

Title :

CodeS: Towards Code Model Generalization Under Distribution Shift

Publication date :

2023

Event name :

45th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Results

Event date :

2023

Audience :

International

Journal title :

IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Results

Pages :

1–6

Peer reviewed :

Peer reviewed

Additional URL :

https://doi.org/10.1109/ICSE-NIER58687.2023.00007

Available on ORBilu :

since 28 December 2023

Statistics

Number of views

13 (2 by Unilu)

Number of downloads

5 (1 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

“Github,” 2008, online; accessed 19 May 2022. [Online]. Available: https://github.com//
M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, “A survey of machine learning for big code and naturalness,” ACM Computing Surveys (CSUR), vol. 51, no. 4, p. 81, 2018.
Y. Yang, X. Xia, D. Lo, and J. Grundy, “A survey on deep learning for software engineering,” ACM Computing Surveys, vol. 54, no. 10s, sep 2022. [Online]. Available: https://doi-org.proxy.bnl.lu/10.1145/3505243
U. Alon, S. Brody, O. Levy, and E. Yahav, “code2seq: Generating sequences from structured representations of code,” in International Conference on Learning Representations, 2018.
U. Alon, M. Zilberstein, O. Levy, and E. Yahav, “code2vec: Learning distributed representations of code,” Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1–29, 2019.
L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin, “Convolutional neural networks over tree structures for programming language processing,” in Thirtieth AAAI conference on artificial intelligence, 2016.
D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. LIU, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fu, M. Tufano, S. K. Deng, C. Clement, D. Drain, N. Sundaresan, J. Yin, D. Jiang, and M. Zhou, “Graphcodebert: pre-training code representations with data flow,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=jLoC4ez43PZ
P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gao, T. Lee, E. David, I. Stavness, W. Guo, B. Earnshaw, I. Haque, S. M. Beery, J. Leskovec, A. Kundaje, E. Pierson, S. Levine, C. Finn, and P. Liang, “Wilds: a benchmark of in-the-wild distribution shifts,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 18–24 Jul 2021, pp. 5637–5664. [Online]. Available: https://proceedings.mlr.press/v139/koh21a.html
D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” Proceedings of the International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=HJz6tiCqYm
Y. Li, S. Chen, and W. Yang, “Estimating predictive uncertainty under program data distribution shift,” arXiv preprint arXiv:2107.10989, 2021.
R. Puri, D. S. Kung, G. Janssen, W. Zhang, G. Domeniconi, V. Zolotov, J. Dolby, J. Chen, M. Choudhury, L. Decker, V. Thost, L. Buratti, S. Pujar, S. Ramji, U. Finkler, S. Malaika, and F. Reiss, “Codenet: a large-scale ai for code dataset for learning a diversity of coding tasks,” 2021.
“Atcoder,” 2012, online; accessed 19 May 2022. [Online]. Available: https://atcoder.jp/
S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang et al., “Codexglue: A machine learning benchmark dataset for code understanding and generation,” arXiv preprint arXiv:2102.04664, 2021.
P. Nie, J. Zhang, J. J. Li, R. J. Mooney, and M. Gligoric, “Impact of evaluation methodologies on code summarization,” ACL, page (To Appear), 2022.
S. Luan, D. Yang, C. Barnaby, K. Sen, and S. Chandra, “Aroma: code recommendation via structural code search,” Proc. ACM Program. Lang., vol. 3, no. OOPSLA, oct 2019. [Online]. Available: https://doi-org.proxy.bnl.lu/10.1145/3360578
Z. Li, X. Ma, C. Xu, C. Cao, J. Xu, and J. Lü, “Boosting operational dnn testing efficiency through conditioning,” in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2019. New York, NY, USA: Association for Computing Machinery, 2019, p. 499–509. [Online]. Available: https://doi-org.proxy.bnl.lu/10.1145/3338906.3338930
Q. Hu, Y. Guo, M. Cordy, X. Xie, L. Ma, M. Papadakis, and Y. Le Traon, “An empirical study on data distribution-aware test selection for deep learning enhancement,” ACM Transactions on Software Engineering and Methodology, January 2022. [Online]. Available: https://doi-org.proxy.bnl.lu/10.1145/3511598
T. Sharma, M. Kechagia, S. Georgiou, R. Tiwari, and F. Sarro, “A survey on machine learning techniques for source code analysis,” 2021.
B. W. Silverman, Density estimation for statistics and data analysis. Routledge, 2018.
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” 2020. [Online]. Available: https://openreview.net/forum?id=SyxS0T4tvS
Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “Codebert: a pre-trained model for programming and natural languages,” in Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics, Nov. 2020, pp. 1536–1547. [Online]. Available: https://aclanthology.org/2020.findings-emnlp.139
S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. B. Clement, D. Drain, D. Jiang, D. Tang, G. Li, L. Zhou, L. Shou, L. Zhou, M. Tufano, M. Gong, M. Zhou, N. Duan, N. Sundaresan, S. K. Deng, S. Fu, and S. Liu, “Codexglue: a machine learning benchmark dataset for code understanding and generation,” CoRR, vol. abs/2102.04664, 2021.
S. Fort, J. Ren, and B. Lakshminarayanan, “Exploring the limits of out-of-distribution detection,” Advances in Neural Information Processing Systems, vol. 34, pp. 7068–7081, 2021.
D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. [Online]. Available: https://openreview.net/forum?id=Hkg4TI9xl
S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image detection in neural networks,” in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=H1VGkIxRZ
K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, ser. NIPS’18. Red Hook, NY, USA: Curran Associates Inc., 2018, p. 7167–7177.
D. Hendrycks, M. Mazeika, and T. Dietterich, “Deep anomaly detection with outlier exposure,” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum? id=HyxCxhRcY7
L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, L. Li, Y. Liu et al., “Deepgauge: Multi-granularity testing criteria for deep learning systems,” in Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, 2018, pp. 120–131.
J. Kim, R. Feldt, and S. Yoo, “Guiding deep learning system testing using surprise adequacy,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 2019, pp. 1039–1049.
Y. Li, S. Wang, T. N. Nguyen, and S. Van Nguyen, “Improving bug detection via context-based code representation learning and attention-based neural networks,” Proc. ACM Program. Lang., vol. 3, no. OOPSLA, oct 2019. [Online]. Available: https://doi-org.proxy.bnl.lu/10.1145/3360588