[en] Code search is an essential task in software development. Developers often search the internet and other code databases for necessary source code snippets to ease the development efforts. Code search techniques also help learn programming as novice programmers or students can quickly retrieve (hopefully good) examples already used in actual software projects. Given the recurrence of the code search activity in software development, there is an increasing interest in the research community. To improve the code search experience, the research community suggests many code search tools and techniques. These tools and techniques leverage several different ideas and claim a better code search performance. However, it is still challenging to illustrate a comprehensive view of the field considering that existing studies generally explore narrow and limited subsets of used components. This study aims to devise a grounded approach to understanding the procedure for code search and build an operational taxonomy capturing the critical facets of code search techniques. Additionally, we investigate evaluation methods, benchmarks, and datasets used in the field of code search.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Kim, Kisub ; Singapore Management University, Singapore, Singapore
European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme Fonds National de la Recherche (FNR), Luxembourg National Research Foundation, Singapore, under its Industry Alignment Fund–Pre-positioning (IAF-PP) Funding Initiative National Research Foundation of Korea (NRF) grant funded by the Korea government National Natural Science Foundation of China Natural Science Foundation of Jiangsu Province, China Kyungpook National University Research Fund, 2020
Subventionnement (détails) :
This work was supported by the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (grant agreement 949014); and Fonds National de la Recherche (FNR), Luxembourg, under FNR-AFR PhD/11623818 and the National Research Foundation, Singapore, under its Industry Alignment Fund–Pre-positioning (IAF-PP) Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore. This work was also supported by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2021R1A5A1021944 and 2021R1I1A3048013); the National Natural Science Foundation of China (62172214); the Natural Science Foundation of Jiangsu Province, China (BK20210279). Additionally, the research was supported by Kyungpook National University Research Fund, 2020.
Sushil Bajracharya, Joel Ossher, and Cristina Lopes. 2010. Searching API usage examples in code repositories with sourcerer API search. In Proceedings of the ICSE Workshop on Search-Driven Development: Users, Infrastructure, Tools and Evaluation. 5-8.
Lei Ai, Zhiqiu Huang, Weiwei Li, Yu Zhou, and Yaoshen Yu. 2019. Sensory: Leveraging code statement sequence information for code snippets recommendation. In Proceedings of the 2019 IEEE 43rd Annual Computer Software and Applications Conference, Vol. 1. IEEE, Los Alamitos, CA, 27-36.
S. Akbar and A. Kak. 2019. SCOR: Source code retrievalwith semantics and order. In Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR'19). 1-12.
M. Akhin, N. Tillmann, M. Fähndrich, J. de Halleux, and M. Moskal. 2012. Search by example in TouchDevelop: Code search made easy. In Proceedings of the 2012 4th International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. IEEE, Los Alamitos, CA, 5-8.
Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys 51, 4 (2018), 1-37.
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to represent programs with graphs. arXiv:1711.00740 (2017).
Ambient Software Evolution Group. 2022. IJaDataset 2.0. Retrieved June 25, 2023 from https://onedrive.live.com/?authkey=%21AKDB2aMepVDO8as&id=8BFCB70AA333DB15%21260605&cid=8BFCB70AA333DB15&parId=root& parQt=sharedby&o=OneUp.
Ambient Software Evolution Group. 2022. BigCloneBench. Retrieved June 26, 2023 from https://github.com/clonebench/BigCloneBench.
A. Arwan, S. Rochimah, and R. J. Akbar. 2015. Source code retrieval on StackOverflow using LDA. In Proceedings of the 2015 3rd International Conference on Information and Communication Technology (ICoICT'15). 295-299.
M. H. Asyrofi, F. Thung, D. Lo, and L. Jiang. 2020. AUSearch: Accurate API usage search in GitHub repositories with type resolution. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering (SANER'20). 637-641.
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval, Vol. 463. ACM, New York, NY.
Sushil Bajracharya, Trung Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Lopes. 2006. Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications. ACM, New York, NY, 681-682.
Sushil Bajracharya, Joel Ossher, and Cristina Lopes. 2014. Sourcerer: An infrastructure for large-scale collection and analysis of open-source code. Science of Computer Programming 79, Suppl. C (Jan. 2014), 241-259.
Sushil Krishna Bajracharya and Cristina Videira Lopes. 2010. Analyzing and mining a code search engine usage log. Empirical Software Engineering 17, 4-5 (Sept. 2010), 424-466.
Sushil Krishna Bajracharya and Cristina Videira Lopes. 2012. Analyzing and mining a code search engine usage log. Empirical Software Engineering 17, 4 (2012), 424-466.
S. Baltes, R. Kiefer, and S. Diehl. 2017. Attribution required: Stack Overflow code snippets in GitHub projects. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C'17).
E. A. Barbosa, A. Garcia, and M. Mezini. 2012. Heuristic strategies for recommendation of exception handling code. In Proceedings of the 2012 26th Brazilian Symposium on Software Engineering. 171-180.
E. A. Barbosa, A. Garcia, and M. Mezini. 2012. A recommendation system for exception handling code. In Proceedings of the 2012 5th International Workshop on Exception Handling (WEH'12). 52-54.
Ohad Barzilay, Christoph Treude, and Alexey Zagalsky. 2013. Facilitating crowd sourced software engineering via stack overflow. In Finding Source Code on the Web for Remix and Reuse. Springer, New York, NY, 289-308.
S. Bazrafshan, R. Koschke, and N. Gode. 2011. Approximate code search in program histories. In Proceedings of the 2011 18th Working Conference on Reverse Engineering. 109-118.
Kent Beck. 2003. Test-Driven Development: By Example. Addison-Wesley Professional.
Farnaz Behrang, Steven P. Reiss, and Alessandro Orso. 2018. GUIFetch: Supporting app design and development through GUI search. In Proceedings of the 5th International Conference on Mobile Software Engineering and Systems. ACM, New York, NY, 236-246.
Sumit Bhatia, Suppawong Tuarob, Prasenjit Mitra, and C. Lee Giles. 2011. An algorithm search engine for software developers. In Proceedings of the 3rd InternationalWorkshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. ACM, New York, NY, 13-16.
T. F. Bissyande, F. Thung, D. Lo, Lingxiao Jiang, and L. Reveillere. 2013. Orion: A software project search engine with integrated diverse software artifacts. In Proceedings of the 2013 18th International Conference on Engineering of Complex Computer Systems (ICECCS'13). 242-245.
Bitbucket. 2022. Home Page. Retrieved June 26, 2023 from https://bitbucket.org.
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993-1022.
Alessandro Bozzon, Marco Brambilla, and Piero Fraternali. 2010. Searching repositories of web application models. In Web Engineering. Springer, Berlin, Germany, 1-15.
Andrew Bragdon, Steven P. Reiss, Robert Zeleznik, Suman Karumuri, William Cheung, Joshua Kaplan, Christopher Coleman, Ferdi Adeputra, and Joseph J. LaViola Jr. 2010. Code bubbles: Rethinking the user interface paradigm of integrated development environments. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, Vol. 1. 455-464.
Andrew Bragdon, Robert Zeleznik, Steven P. Reiss, Suman Karumuri, William Cheung, Joshua Kaplan, Christopher Coleman, Ferdi Adeputra, and Joseph J. LaViola Jr. 2010. Code bubbles: A working set-based interface for code understanding and maintenance. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2503-2512.
Joel Brandt, Philip J. Guo, Joel Lewenstein, Mira Dontcheva, and Scott R. Klemmer. 2009. Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 1589-1598.
John R. Brown and R. H. Hoffman. 1972. Evaluating the effectiveness of software verification: Practical experience with an automated tool. In Proceedings of Fall Joint Computer Conference, Part I. 181-190.
Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from examples to improve code completion systems. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACMSIGSOFT Symposium on the Foundations of Software Engineering. 213-222.
Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY, 964-974.
Brock Angus Campbell and Christoph Treude. 2017. NLP2Code: Code snippet content assist via natural language tasks. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME'17). 628-632.
Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Computing Surveys 44, 1 (2012), Article 1, 50 pages.
Wing-Kwan Chan, Hong Cheng, and David Lo. 2012. Searching connected API subgraph via text phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, New York, NY, 1-11.
Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, andHee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-os binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 678-689.
Shaunak Chatterjee, Sudeep Juvekar, and Koushik Sen. 2009. SNIFF: A search engine for Java using free-form queries. In Fundamental Approaches to Software Engineering. Springer, Berlin, Germany, 385-400.
Chi Chen, Xin Peng, Jun Sun, Zhenchang Xing, Xin Wang, Yifan Zhao, Hairui Zhang, and Wenyun Zhao. 2019. Generative API usage code recommendation with parameter concretization. Science China Information Sciences 62, 9 (2019), 192103.
Chi Chen, Xin Peng, Zhenchang Xing, Jun Sun, Xin Wang, Yifan Zhao, andWenyun Zhao. 2020. Holistic combination of structural and textual code information for context based API recommendation. arXiv:2010.07514 [cs] (2020). https://arxiv.org/abs/2010.07514v1.
Hao Chen, Shi Ying, Jin Liu, and Wei Wang. 2004. SE4SC: A specific search engine for software components. In Proceedings of the 2004 4th International Conference on Computer and Information Technology (CIT'04). IEEE, Los Alamitos, CA, 863-868.
Qingying Chen and Minghui Zhou. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 2018 33rd IEEE/ACMInternational Conference on Automated Software Engineering (ASE'18). 826-831.
Zhengzhao Chen, Renhe Jiang, Zejun Zhang, Yu Pei, Minxue Pan, Tian Zhang, and Xuandong Li. 2020. Enhancing example-based code search with functional semantics. Journal of Systems and Software 165 (2020), 110568.
Tabnine. 2022. Home Page. Retrieved April 1, 2022 from https://www.codota.com/.
Collin McMillan. 2011. Finding relevant functions in millions of lines of code. In Proceedings of the 33rd International Conference on Software Engineering. ACM, New York, NY, 1170-1172.
Megan Conklin. 2007. Project entity matching across FLOSS repositories. In Open Source Development, Adoption and Innovation (IFIP-The International Federation for Information Processing). Springer, Boston, MA, 45-57.
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273-297.
B. Dagenais and M. P. Robillard. 2012. Recovering traceability links between an API and its learning resources. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE'12). 47-57.
Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. In Proceedings of the 35th ACMSIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 349-360.
Janet E. Davidson and Robert J. Sternberg (Eds.). 2003. The Psychology of Problem Solving. Cambridge University Press.
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 337-340.
Marcelo de Rezende Martins and Marco Aurélio Gerosa. 2020. CoNCRA: A convolutional neural networks code retrieval approach. In Proceedings of the 34th Brazilian Symposium on Software Engineering. ACM, New York, NY, 526-531.
Scott Deerwester, Susan T. Dumais, GeorgeW. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (Sept. 1990), 391-407.
U. Dekel and J. D. Herbsleb. 2009. Improving API documentation usability with knowledge pushing. In Proceedings of the 2009 IEEE 31st International Conference on Software Engineering. 320-330.
Serge Demeyer, Sander Tichelaar, and Stéphane Ducasse. 2001. FAMIX 2.1-The FAMOOS Information Exchange Model. Technical Report. University of Bern.
Luca Di Grazia and Michael Pradel. 2023. Code search: A survey of techniques for finding code. ACM Computing Surveys 55, 11 (2023), 1-31.
T. Diamantopoulos, K. Thomopoulos, and A. Symeonidis. 2016. QualBoa: Reusability-aware recommendations of source code components. In Proceedings of the 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR'16). 488-491.
Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature location in source code: A taxonomy and survey. Journal of Software: Evolution and Process 25, 1 (2013), 53-95.
Donzhen Wen, Liang Yang 0003, Yingying Zhang, Yuan Lin, Kan Xu, and Hongfei Lin. 2020. Multi-level semantic representation model for code search. In Proceedings of the Joint Conference of the Information Retrieval Communities in Europe (CIRCLE'20).
[60] Horatiu Dumitru, Marek Gibiec, Negar Hariri, Jane Cleland-Huang, Bamshad Mobasher, Carlos Castro-Herrera, and Mehdi Mirakhorli. 2011. On-demand feature recommendations derived from mining public product descriptions. In Proceedings of the 33rd International Conference on Software Engineering. 181-190.
Frederico A. Durão, Taciana A. Vanderlei, Eduardo S. Almeida, and Silvio R. de L. Meira. 2008. Applying a semantic layer in a source code search tool. In Proceedings of the 2008 ACM Symposium on Applied Computing. ACM, New York, NY, 1151-1157.
RobertDyer,Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2015. Boa: Ultra-large-scale software repository and source-code mining. ACM Transactions on Software Engineering Methodology 25, 1 (2015), Article 7, 34 pages.
Françoise Détienne and Frank Bott. 2001. Software Design-Cognitive Aspects. Springer-Verlag.
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861-874.
Gerhard Fischer, Scott Henninger, and David Redmiles. 1991. Cognitive tools for locating and comprehending software objects for reuse. In Proceedings of the 13th International Conference on Software Engineering. 318-328.
Denis Foo Kune and Yongdae Kim. 2010. Timing attacks on pin input devices. In Proceedings of the 17th ACM Conference on Computer and Communications Security. 678-680.
W. B. Frakes and B. A. Nejmeh. 1986. Software reuse through information retrieval. ACM SIGIR Forum 21, 1 (1986), 30-36.
Yuji Fujiwara, Norihiro Yoshida, Eunjong Choi, and Katsuro Inoue. 2019. Code-to-code search based on deep neural network and code mutation. In Proceedings of the 2019 IEEE 13th InternationalWorkshop on Software Clones (IWSC'19). 1-7.
George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais. 1987. The vocabulary problem in human-system communication. Communications of the ACM 30, 11 (1987), 964-971.
Mark Gabel and Zhendong Su. 2010. A study of the uniqueness of source code. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 147-156.
Joel Galenson, Philip Reames, Rastislav Bodik, Björn Hartmann, and Koushik Sen. 2014. CodeHint: Dynamic and interactive synthesis of code snippets. In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 653-663.
Rosalva E. Gallardo-Valencia and Susan Elliott Sim. 2009. Internet-scale code search. In Proceedings of the 2009 ICSE Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. IEEE, Los Alamitos, CA, 49-52.
Gitlab. 2022. Kernel.org Git Repositories. Retrieved January 26, 2023 from .https://git.kernel.org
Google. 2022. Home Page. Retrieved June 26, 2023 from https://www.google.com.
Google Code Jam. Home Page. 2022. Retrieved April 1, 2022 from https://developers.googleblog.com/2023/05/celebrate-googles-coding-competitions.html.
Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik, and Andy Zaidman. 2014. Lean GHTorrent: GitHub data on demand. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, New York, NY.
Mark Grechanik, Chen Fu, Qing Xie, Collin McMillan, Denys Poshyvanyk, and Chad Cumby. 2010. EXEMPLAR: EXEcutable exaMPLes ARchive. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, Vol. 2. ACM, New York, NY, 259-262.
M. Grechanik, C. Fu, Q. Xie, C. McMillan, D. Poshyvanyk, and C. Cumby. 2010. A search engine for finding highly relevant applications. In Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering, Vol. 1. 475-484.
Wenchao Gu, Zongjie Li, Cuiyun Gao, Chaozheng Wang, Hongyu Zhang, Zenglin Xu, and Michael R. Lyu. 2020. CRaDLe: Deep code retrieval based on semantic dependency learning. arXiv:2012.01028 (2020).
X. Gu, H. Zhang, and S. Kim. 2018. Deep code search. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE'18). 933-944.
X. Gu,H. Zhang, and S. Kim. 2019. CodeKernel: A graph kernel based approach to the selection of API usage examples. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE'19). 590-601.
Florian S. Gysin. 2010. Improved social trustability of code search results. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, Vol. 2. ACM, New York, NY, 513-514.
Florian S. Gysin and Adrian Kuhn. 2010. A trustability metric for code search based on developer karma. In Proceedings of 2010 ICSE Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. ACM, New York, NY, 41-44.
Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In Proceedings of the 2013 International Conference on Software Engineering. IEEE, Los Alamitos, CA, 842-851.
Sonia Haiduc, Giuseppe De Rosa, Gabriele Bavota, Rocco Oliveto, Andrea De Lucia, and Andrian Marcus. 2013. Query quality prediction and reformulation for source code search: The Refoqus tool. In Proceedings of the 2013 International Conference on Software Engineering. IEEE, Los Alamitos, CA, 1307-1310.
Rajarshi Haldar, LingfeiWu, Jinjun Xiong, and Julia Hockenmaier. 2020. A Multi-perspective architecture for semantic code search. arXiv:2005.06980[cs] (2020).
Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 763-773.
S. Henninger. 1994. Using iterative refinement to find reusable software. IEEE Software 11, 5 (Sept. 1994), 48-59.
Geert Heyman and Tom Van Cutsem. 2020. Neural code search revisited: Enhancing code snippet retrieval through natural language intent. arXiv:2008.12193 (2020).
Emily Hill. 2010. Integrating Natural Language and Program Structure Information to Improve Software Search and Exploration. University of Delaware.
Emily Hill, Lori Pollock, and K. Vijay-Shanker. 2009. Automatically capturing source code context of NL-queries for software maintenance and reuse. In Proceedings of the 31st International Conference on Software Engineering. IEEE, Los Alamitos, CA, 232-242.
EmilyHill, Lori Pollock, and K. Vijay-Shanker. 2011. Improving source code searchwith natural language phrasal representations of method signatures. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE, Los Alamitos, CA, 524-527.
Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet. 2014. NL-based query refinement and contextualized code search results: A user study. In Proceedings of 2014 Software Evolution Week: IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE'14). 34-43.
Reid Holmes. 2009. Do developers search for source code examples using multiple facts? In Proceedings of the 2009 ICSE Workshop on Search-Driven : Users, Infrastructure, Tools, and Evaluation. IEEE, Los Alamitos, CA, 13-16.
Reid Holmes and Gail C. Murphy. 2005. Using structural context to recommend source code examples. In Proceedings of the 27th International Conference on Software Engineering. ACM, New York, NY, 117-125.
Reid Holmes, Robert J. Walker, and Gail C. Murphy. 2005. Strathcona example recommendation tool. In Proceedings of the 10th European Software Engineering Conference Held Jointly with the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, 237-240.
R. Holmes, R. J. Walker, and G. C. Murphy. 2006. Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering 32, 12 (2006), 952-970.
Adrian Holovaty and Jacob Kaplan-Moss. 2009. The Definitive Guide to DJANGO: Web Development Done Right. Apress.
JamesHowison, Megan Squire, and Kevin Crowston. 2008. FLOSSmole:Acollaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering 1 (Sept. 2008), 17-26.
Sheng-Kuei Hsu and Shi-Jen Lin. 2011. A block-structured model for source code retrieval. In Intelligent Information and Database Systems, Ngoc Thanh Nguyen, Chong-Gun Kim, and Adam Janiak (Eds.). Springer, 161-170.
Sheng-Kuei Hsu and Shi-Jen Lin. 2011. A block-structured model for source code retrieval. In Intelligent Information and Database Systems. Springer, Berlin, Germany, 161-170.
Gang Hu, Min Peng, Yihan Zhang, Qianqian Xie, Wang Gao, and Mengting Yuan. 2020. Unsupervised software repositories mining and its application to code search. Software: Practice and Experience 50, 3 (2020), 299-322.
Gang Hu, Min Peng, Yihan Zhang, Qianqian Xie, and Mengting Yuan. 2020. Neural joint attention code search over structure embeddings for software Q&A sites. Journal of Systems and Software 170 (2020), 110773.
Q. Huang, A. Qiu, M. Zhong, and Y. Wang. 2020. A code-description representation learning model based on attention. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering (SANER'20). 447-455.
Qing Huang, Xudong Wang, Yangrui Yang,HongyanWan, Rui Wang, and GuoqingWu. 2017. SnippetGen:Enhancing the code search via intent predicting. In Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering. 307-312.
Qing Huang and Guoqing Wu. 2019. Enhance code search via reformulating queries with evolving contexts. Automated Software Engineering 26, 4 (2019), 705-732.
Qing Huang and Huaiguang Wu. 2019. QE-integrating framework based on GitHub knowledge and SVM ranking. Science China Information Sciences 62, 5 (2019), 52102.
Qing Huang, Yang Yang, and Ming Cheng. 2019. Deep learning the semantics of change sequences for query expansion. Software: Practice and Experience 49, 11 (2019), 1600-1617.
Qing Huang, Yangrui Yang, Xue Zhan, Hongyan Wan, and Guoqing Wu. 2018. Query expansion based on statistical learning from code changes. Software: Practice and Experience 48, 7 (2018), 1333-1351.
Y. Huang, Q. Kong, N. Jia, X. Chen, and Z. Zheng. 2019. Recommending differentiated code to support smart contract update. In Proceedings of the 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC'19). 260-270.
O. Hummel,W. Janjic, and C. Atkinson. 2008. Code conjurer: Pulling reusable software out of thin air. IEEE Software 25, 5 (2008), 45-52.
Hamel Husain. 2018. Towards natural language semantic code search. GitHub Blog. Retrieved June 26, 2023 from https://github.blog/2018-09-18-towards-natural-language-semantic-code-search/.
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv:1909.09436 (2019).
M. M. Islam and R. Iqbal. 2020. SoCeR: A new source code recommendation technique for code reuse. In Proceedings of the 2020 IEEE 44th Annual Computers, Software, and Applications Conference. 1552-1557.
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2073-2083.
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2073-2083.
Werner Janjic, Dietmar Stoll, Philipp Bostan, and Colin Atkinson. 2009. Lowering the barrier to reuse through testdriven search. In Proceedings of the 2009 ICSE Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. IEEE, Los Alamitos, CA, 21-24.
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4 (2002), 422-446.
H. Jiang, L. Nie, Z. Sun, Z. Ren, W. Kong, T. Zhang, and X. Luo. 2016. ROSF: Leveraging information retrieval and supervised learning for recommending code snippets. IEEE Transactions on Services Computing 12, 1 (2016), 34-46.
Lingxiao Jiang and Zhendong Su. 2009. Automatic mining of functionally equivalent code fragments via random testing. In Proceedings of the 18th International Symposium on Software Testing and Analysis. ACM, New York, NY, 81-92.
Renhe Jiang, Zhengzhao Chen, Zejun Zhang, Yu Pei, Minxue Pan, and Tian Zhang. 2018. Semantics-based code search using input/output examples. In Proceedings of the 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM'18). 92-102.
Huan Jin and Lei Xiong. 2019. A query expansion method based on evolving source code. Wuhan University Journal of Natural Sciences 24, 5 (2019), 391-399.
Toshihiro Kamiya. 2021. CCFinderX: An interactive code clone analysis environment. In Code Clone Analysis. Springer, 31-44.
Vineeth Kashyap, David Bingham Brown, Ben Liblit, David Melski, and Thomas Reps. 2017. Source Forager: A search engine for similar source code. arXiv:1706.02769 (2017).
Amandeep Kaur andGauravDhiman. 2019.Areviewon search-based tools and techniques to identify bad code smells in object-oriented systems. In Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications. Advances in Intelligent Systems and Computing, Vol. 741. Springer, 909-921.
Iman Keivanloo, Juergen Rilling, and Philippe Charland. 2011. SeClone-A hybrid approach to Internet-scale realtime code clone search. In Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension. IEEE, Los Alamitos, CA, 223-224.
Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 664-675.
I. Keivanloo, L. Roostapour, P. Schugerl, and J. Rilling. 2010. SE-CodeSearch: A scalable semantic Web-based source code search infrastructure. In Proceedings of the 2010 IEEE International Conference on Software Maintenance. 1-5.
Marcus Kessel and Colin Atkinson. 2018. Integrating reuse into the rapid, continuous software engineering cycle through test-driven search. In Proceedings of the 2018 IEEE/ACM 4th International Workshop on Rapid Continuous Software Engineering (RCoSE'18). 8-11.
W. M. Khoo, A. Mycroft, and R. Anderson. 2013. Rendezvous: A search engine for binary code. In Proceedings of the 2013 10th Working Conference on Mining Software Repositories (MSR'13). 329-338.
Heejung Kim, Yungbum Jung, Sunghun Kim, and Kwankeun Yi. 2011. MeCC: Memory comparison-based clone detector. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE'11). 301-310.
Jinhan Kim, Sanghoon Lee, Seung-Won Hwang, and Sunghun Kim. 2010. Towards an intelligent code search engine. In Proceedings of the 24th Conference on Artificial Intelligence.
Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCOY-A code-to-code search engine. In Proceedings of the 2018 IEEE/ACM40th International Conference on Software Engineering (ICSE'18). 946-957.
Krugle. 2022. Home Page. Retrieved June 26, 2023 from http://krugle.com.
Daniel E. Krutz and Emad Shihab. 2013. CCCD: Concolic code clone detection. In Proceedings of the 2013 20thWorking Conference on Reverse Engineering (WCRE'13). 489-490.
Frederick Wilfrid Lancaster and Emily Gallup. 1973. Information Retrieval On-Line. Technical Report. Melville Publishing Company.
Otávio Augusto Lazzarini Lemos, Sushil Bajracharya, Joel Ossher, Paulo Cesar Masiero, and Cristina Lopes. 2009. Applying test-driven code search to the reuse of auxiliary functionality. In Proceedings of the 2009 ACM Symposium on Applied Computing. ACM, New York, NY, 476-482.
Otávio Augusto Lazzarini Lemos, Sushil Bajracharya, Joel Ossher, Paulo Cesar Masiero, and Cristina Lopes. 2011. A test-driven approach to code search and its application to the reuse of auxiliary functionality. Information and Software Technology 53, 4 (2011), 294-306.
Otávio Augusto Lazzarini Lemos, Sushil Krishna Bajracharya, and Joel Ossher. 2007. CodeGenie: A tool for testdriven source code search. In Companion to the 22nd ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications Companion. ACM, New York, NY, 917-918.
Shin-Jie Lee, Xavier Lin, Wu-Chen Su, and Hsi-Min Chen. 2018. A comment-driven approach to API usage patterns discovery and search. Journal of Internet Technology 19, 5 (2018), 1587-1601.
Otávio Augusto Lazzarini Lemos, Sushil Krishna Bajracharya, Joel Ossher, Ricardo Santos Morla, Paulo Cesar Masiero, Pierre Baldi, and Cristina Videira Lopes. 2007. CodeGenie: Using test-cases to search and reuse source code. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, NY, 525-526.
Otávio A. L. Lemos, Adriano C. de Paula, Felipe C. Zanichelli, and Cristina V. Lopes. Thesaurus-based automatic query expansion for interface-driven code search. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, New York, NY, 212-221.
Otavio Augusto Lazzarini Lemos, Adriano Carvalho de Paula, Gustavo Konishi, Joel Ossher, Sushil Bajracharya, and Cristina Lopes. Using thesaurus-based tag clouds to improve test-driven code search. In Proceedings of the 2013 VII Brazilian Symposium on Software Components, Architectures, and Reuse. 99-108.
O. A. L. Lemos, A. C. de Paula, H. Sajnani, and C. V. Lopes. 2015. Can the use of types and query expansion help improve large-scale code search? In Proceedings of the IEEE 15th International Working Conference on Source Code Analysis and Manipulation. 41-50.
Hongyu Li, Seohyun Kim, and Satish Chandra. 2019. Neural code search evaluation dataset. arXiv:1908.09804 (2019).
R. Li, G. Hu, and M. Peng. 2020. Hierarchical embedding for code search in software Q&A sites. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN'20). 1-10.
Shengying Li. 2004. A Survey on Tools for Binary Code Analysis. Stony Brook University, 37-52.
Sihan Li, Xusheng Xiao, Blake Bassett, Tao Xie, and Nikolai Tillmann. 2016. Measuring code behavioral similarity for programming and software engineering education. In Proceedings of the 38th International Conference on Software Engineering Companion. ACM, New York, NY, 501-510.
W. Li, H. Qin, S. Yan, B. Shen, and Y. Chen. 2020. Learning code-query interaction for enhancing code searches. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME'20). 115-126.
Wei Li, Shuhan Yan, Beijun Shen, and Yuting Chen. 2019. Reinforcement learning of code search sessions. In Proceedings of the 2019 26th Asia-Pacific Software Engineering Conference (APSEC'19). 458-465.
Xuan Li, Zerui Wang,Qianxiang Wang, Shoumeng Yan, Tao Xie, andHong Mei. 2016. Relationship-aware code search for JavaScript frameworks. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, 690-701.
Yang Li, Suhang Wang, Quan Pan, Haiyun Peng, Tao Yang, and Erik Cambria. 2019. Learning binary codes with neural collaborative filtering for efficient recommendation systems. Knowledge-Based Systems 172 (2019), 64-75.
Zhixing Li, Tao Wang, Yang Zhang, Yun Zhan, and Gang Yin. 2016. Query reformulation by leveraging crowdwisdom for scenario-based software search. In Proceedings of the 8th Asia-Pacific Symposium on Internetware. ACM, NewYork, NY, 36-44.
Chunyang Ling, Zeqi Lin, Yanzhen Zou, and Bing Xie. 2020. Adaptive deep code search. In Proceedings of the 28th International Conference on Program Comprehension. 48-59.
Xiang Ling, LingfeiWu, Saizhuo Wang, Gaoning Pan, Tengfei Ma, Fangli Xu, Alex X. Liu, ChunmingWu, and Shouling Ji. 2020. Deep graph matching and searching for semantic code retrieval. arXiv:2010.12908 (2020).
Linus Torvalds and Junio C. Hamano. 2022. Home Page. Retrieved April 1, 2022 from https://git-scm.com/.
Chao Liu, Cuiyun Gao, Xin Xia, David Lo, John Grundy, and Xiaohu Yang. 2020. On the replicability and reproducibility of deep learning in software engineering. arXiv preprint arXiv:2006.14244 (2020).
Chao Liu, Xin Xia, David Lo, Cuiyun Gao, Xiaohu Yang, and John Grundy. 2020. Opportunities and challenges in code search tools. arXiv:2011.02297 [cs] (Nov. 2020).
Chao Liu, Xin Xia, David Lo, Zhiwei Liu, Ahmed E. Hassan, and Shanping Li. 2020. Simplifying deep-learning-based model for code search. arXiv:2005.14373 (2020).
Jason Liu, Seohyun Kim, Vijayaraghavan Murali, Swarat Chaudhuri, and Satish Chandra. 2019. Neural query expansion for code search. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. ACM, New York, NY, 29-37.
Kui Liu, Anil Koyuncu, Kisub Kim, Dongsun Kim, and Tegawendé F. Bissyandé. 2018. LSRepair: Live search of fix ingredients for automated program repair. In Proceedings of the 2018 25th Asia-Pacific Software Engineering Conference (APSEC'18). IEEE, Los Alamitos, CA, 658-662.
Wenjian Liu, Xin Peng, Zhenchang Xing, Junyi Li, Bing Xie, and Wenyun Zhao. 2018. Supporting exploratory code search with differencing and visualization. In Proceedings of the 2018 IEEE 25th International Conference on Software Analysis, Evolution, and Reengineering. 300-310.
Angela Lozano, Andy Kellens, and Kim Mens. 2011. Mendel: Source code recommendation based on a genetic metaphor. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE, Los Alamitos, CA, 384-387.
Jinting Lu, YingWei, Xiaobing Sun, Bin Li,WanzhiWen, and Cheng Zhou. 2018. Interactive query reformulation for source-code search with word relations. IEEE Access 6 (2018), 75660-75668.
Meili Lu, X. Sun, S. Wang, D. Lo, and Yucong Duan. 2015. Query expansion via WordNet for effective code search. In Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER'15). 545-549.
Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages 3 (2019), Article 152, 28 pages.
George F. Luger, P. Johnson, C. Sterm, Jean E. Newman, and Ronald Yeo. 1994. Cognitive Science: The Science of Intelligent Systems. Academic Press.
S. K. Lukins, N. A. Kraft, and L. H. Etzkorn. 2008. Source code retrieval for bug localization using latent Dirichlet allocation. In Proceedings of the 2008 15th Working Conference on Reverse Engineering. 155-164.
Fei Lv, Hongyu Zhang, Jian-Guang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. 2015. CodeHow: Effective code search based on API understanding and extended Boolean model. In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering. 260-270.
Y. Malheiros, A. Moraes, C. Trindade, and S. Meira. 2012. A source code recommender system to support newcomers. In Proceedings of the 2012 IEEE 36th Annual Computer Software and Applications Conference. 19-24.
David Mandelin, Lin Xu, Rastislav Bodík, and Doug Kimelman. 2005. Jungloid mining: Helping to navigate the API jungle. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 48-61.
Christopher D. Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Cambridge University Press.
L. W. Mar, Y. Wu, and H. C. Jiau. 2011. Recommending proper API code examples for documentation purpose. In Proceedings of the 2011 18th Asia-Pacific Software Engineering Conference. 331-338.
Gary Marchionini. 2006. Exploratory search: From finding to understanding. Communications of the ACM49, 4 (2006), 41-46.
L. Martie and A. van der Hoek. 2013. Toward social-technical code search. In Proceedings of the 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE'13). 101-104.
Lee Martie, André van der Hoek, and Thomas Kwak. 2017. Understanding the impact of support for iteration on code search. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, New York, NY, 774-785.
L. Martie, T. D. LaToza, and A. van der Hoek. 2015. CodeExchange: Supporting reformulation of Internet-scale code queries in context (T). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE'15). 24-35.
Lee Martie and André van der Hoek. 2015. Sameness: An experiment in code search. In Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE, Los Alamitos, CA, 76-87.
Max Howell. 2022. Homebrew. Retrieved June 26, 2023 from https://brew.sh.
C. McMillan, M. Grechanik, D. Poshyvanyk, Chen Fu, and Qing Xie. 2012. Exemplar: A source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering 38, 5 (2012), 1069-1087.
Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: Finding relevant functions and their usage. In Proceeding of the 33rd International Conference on Software Engineering. ACM, New York, NY, 111-120.
C. McMillan, N. Hariri, D. Poshyvanyk, J. Cleland-Huang, and B. Mobasher. 2012. Recommending source code for use in rapid software prototypes. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE'12). 848-858.
Collin McMillan, Denys Poshyvanyk, and Mark Grechanik. 2010. Recommending source code examples via API call usages and documentation. In Proceedings of the 2nd InternationalWorkshop on Recommendation Systems for Software Engineering. ACM, New York, NY, 21-25.
Collin Mcmillan, Denys Poshyvanyk, Mark Grechanik, Qing Xie, and Chen Fu. 2013. Portfolio: Searching for relevant functions and their usages in millions of lines of code. ACM Transactions on Software Engineering and Methodology 22, 4 (2013), Article 37, 30 pages.
Microsoft. 2022. Microsoft Bing. Retrieved June 26, 2023 from https://www.bing.com.
Alon Mishne, Sharon Shoham, and Eran Yahav. 2012. Typestate-based semantic code search over partial programs. In Proceedings of the ACM International Conference on Object-Oriented Programming Systems Languages and Applications. ACM, New York, NY, 997-1016.
A. Mockus. 2009. Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In Proceedings of the 2009 6th International Working Conference on Mining Software Repositories. 11-20.
Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrian Marcus. 2015. How can I use this method? In Proceedings of the 37th International Conference on Software Engineering, Vol. 1. IEEE, Los Alamitos, CA, 880-890.
Ibrahim Jameel Mujhid, Joanna C. S. Santos, Raghuram Gopalakrishnan, and Mehdi Mirakhorli. 2017.A search engine for finding and reusing architecturally significant code. Journal of Systems and Software 130 (Aug. 2017), 81-93.
Rohan Mukherjee, Swarat Chaudhuri, and Chris Jermaine. 2020. Searching a database of source codes using contextualized code search. arXiv:2001.03277 (2020).
Naoya Murakami and Hidehiko Masuhara. 2012. Optimizing a search-based code recommendation system. In Proceedings of the 3rd International Workshop on Recommendation Systems for Software Engineering. IEEE, Los Alamitos, CA, 68-72.
Naoya Murakami, Hidehiko Masuhara, and Tomoyuki Aotani. 2014. Code recommendation based on a degree-ofinterest model. In Proceedings of the 4th InternationalWorkshop on Recommendation Systems for Software Engineering. ACM, New York, NY, 28-29.
Takuma Murakami, Zhenjiang Hu, Shingo Nishioka, Akihiko Takano, and Masato Takeichi. 2004. An algebraic interface for GETA search engine. In Proceedings of the Program and Programming Language Workshop.
Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N. Nguyen, and Danny Dig. 2016. API code recommendation using statistical learning from fine-grained changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, 511-522.
Anh Tuan Nguyen, Tung Thanh Nguyen, Jafar Al-Kofahi, Hung Viet Nguyen, and Tien N. Nguyen. 2011. A topicbased approach for narrowing the search space of buggy files from a bug report. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE'11). IEEE, Los Alamitos, CA, 263-272.
T. Nguyen, P. Vu, and T. Nguyen. 2019. Personalized code recommendation. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME'19). 313-317.
T. Nguyen, P. Vu, and T. Nguyen. 2019. Recommending exception handling code. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME'19). 390-393.
Tam The Nguyen, Phong Minh Vu, and Tung Thanh Nguyen. 2019. Code search on bytecode for mobile APP development. In Proceedings of the 2019 ACM Southeast Conference. ACM, New York, NY, 253-256.
Tam The Nguyen, Phong Minh Vu, and Tung Thanh Nguyen. 2019. Recommendation of exception handling code in mobile APP development. arXiv:1908.06567 (2019).
T. Van Nguyen, A. T. Nguyen, H. D. Phan, T. D. Nguyen, and T. N. Nguyen. 2017. CombiningWord2Vec with revised vector space model for better code retrieval. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C'17). 183-185.
L. Nie, H. Jiang, Z. Ren, Z. Sun, and X. Li. 2016. Query expansion based on crowd knowledge for code search. IEEE Transactions on Services Computing 9 (Sept. 2016), 771-783.
H. Niu. 2015. Improving Code Search Using Learning-to-Rank and Query Reformulation Techniques. Master's thesis. Queen's University, Canada.
Haoran Niu, Iman Keivanloo, and Ying Zou. 2017. Learning to rank code examples for code search engines. Empirical Software Engineering 22 (2017), 1-33.
OpenHub. 2022. Synopsis/Black Duck Open Hub. Retrieved June 26, 2023 from https://www.openhub.net.
Oracle. 2022. Java Platform, Standard Edition 7, API Specification. Retrieved June 26, 2023 from https://docs.oracle. com/javase/7/docs/api/.
Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, Takashi Ishio, Daniel M. German, and Katsuro Inoue. 2017. Search-based software library recommendation using multi-objective optimization. Information and Software Technology 83, Suppl. C (2017), 55-75.
Yoann Padioleau, Julia Lawall, René Rydhof Hansen, and Gilles Muller. 2008. Documenting and automating collateral evolutions in Linux device drivers. ACM SIGOPS Operating Systems Review 42, 4 (2008), 247-260.
P. Pathak, M. Gordon, andWeiguo Fan. 2000. Effective information retrieval using genetic algorithms basedmatching functions adaptation. In Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.
[209] Raphael Pham, Yauheni Stoliar, and Kurt Schneider. 2015. Automatically recommending test code examples to inexperienced developers. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, New York, NY, 890-893.
Nina Phan, Peter Bailey, and Ross Wilkinson. 2007. Understanding the relationship of information need specificity to search query length. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 709-710.
Iasonas Polakis, Georgios Kontaxis, Spiros Antonatos, Eleni Gessiou, Thanasis Petsas, and Evangelos P. Markatos. 2010. Using social networks to harvest email addresses. In Proceedings of the 9th Annual ACM Workshop on Privacy in the Electronic Society. 11-20.
Varot Premtoon, James Koppel, and Armando Solar-Lezama. 2020. Semantic code search via equational reasoning. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 1066-1082.
Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladmir Zolotov, Julian Dolby, et al. 2021. Project CodeNet: A large-scale ai for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:2015.12655 (2021).
M. Raghothaman, Y. Wei, and Y. Hamadi. 2016. SWIM: Synthesizing what I mean-Code search and idiomatic snippet synthesis. In Proceedings of the 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE'16). 357-367.
C. Ragkhitwetsagul. 2016. Measuring code similarity in large-scaled code corpora. In Proceedings of the 2016 International Conference on Software Maintenance and Evolution (ICSME'16). 626-630.
Mohammad Masudur Rahman and Chanchal Roy. 2018. Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME'18). 473-484.
Mohammad Masudur Rahman and Chanchal Roy. 2018. NLP2API: Query reformulation for code search using crowdsourced knowledge and extra-large data analytics. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME'18). 714-714.
M. M. Rahman and C. K. Roy. 2014. On the use of context in recommending exception handling code examples. In Proceedings of the 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation. 285-294.
Mohammad Masudur Rahman and Chanchal K. Roy. 2021. A systematic literature review of automated query reformulations in source code search. arXiv preprint arXiv:2108.09646 (2021).
Mohammad M. Rahman, Chanchal K. Roy, and David Lo. 2019. Automatic query reformulation for code search using crowdsourced knowledge. Empirical Software Engineering 24, 4 (2019), 1869-1924.
M. M. Rahman, C. K. Roy, and D. Lo. RACK: Code search in the IDE using crowdsourced knowledge. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C'17). 51-54.
Karthik Ram. 2013. Git can facilitate greater reproducibility and increased transparency in science. Source Code for Biology and Medicine 8 (Feb. 2013), 7.
Steven P. Reiss. 2009. Semantics-based code search. In Proceedings of the 31st International Conference on Software Engineering. IEEE, Los Alamitos, CA, 243-253.
S. P. Reiss. 2009. Semantics-based code search demonstration proposal. In Proceedings of the 2009 IEEE International Conference on Software Maintenance. 385-386.
S. P. Reiss. 2009. Specifying what to search for. In Proceedings of the Tools and Evaluation 2009 ICSE Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. 41-44.
S. P. Reiss. 2013. Integrating S6 code search and Code Bubbles. In Proceedings of the 2013 3rd International Workshop on Developing Tools as Plug-Ins (TOPI'13). 25-30.
Steven P. Reiss, Yun Miao, and Qi Xin. 2018. Seeking the user interface. Automated Software Engineering 25, 1 (2018), 157-193.
Leiming Ren, Shinmin Shan, Kai Wang, and Kun Xue. 2020. CSDA: A novel attention-based LSTM approach for code search. Journal of Physics: Conference Series 1544 (2020), 012056.
Romain Robbes and Michele Lanza. 2010. Improving code completion with program history. Automated Software Engineering 17, 2 (2010), 181-212.
M. Roldan-Vega, G. Mallet, E. Hill, and J. A. Fails. 2013. CONQUER: A tool for NL-based query refinement and contextualizing code search results. In Proceedings of the 2013 IEEE International Conference on Software Maintenance. 512-515.
Chanchal Kumar Roy and James R. Cordy. 2007. A survey on software clone detection research. Queen's School of Computing TR 541, 115 (2007), 64-68.
Julia Rubin and Marsha Chechik. 2013. A survey of feature location techniques. In Domain Engineering. Springer, 29-58.
Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on source code: A neural code search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. ACM, New York, NY, 31-41.
Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How developers search for code: A case study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, New York, NY, 191-201.
Tobias Sager, Abraham Bernstein, Martin Pinzger, and Christoph Kiefer. 2006. Detecting similar Java classes using tree algorithms. In Proceedings of the 2006 International Workshop on Mining Software Repositories. ACM, New York, NY, 65-71.
Naiyana Sahavechaphan and Kajal Claypool. 2006. XSnippet: Mining for sample code. ACM SIGPLAN Notices 41, 10 (2006), 413-430.
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling code clone detection to big-code. In Proceedings of the 38th International Conference on Software Engineering. ACM, New York, NY, 1157-1168.
Gerard Salton, Anita Wong, and Chung-Shu Yang. 1975. A vector space model for automatic indexing. Communications of the ACM 18, 11 (1975), 613-620.
Huascar Sanchez. 2013. SNIPR: Complementing code search with code retargeting capabilities. In Proceedings of the 2013 International Conference on Software Engineering. IEEE, Los Alamitos, CA, 1423-1426.
Abdus Satter, M. G. Muntaqeem, Nadia Nahar, and Kazi Sakib. 2017. Retrieving self-executable and functionally correct code to improve source code search. In Proceedings of the 2017 24th Asia-Pacific Software Engineering Conference (APSEC'17). 749-750.
A. Satter and K. Sakib. A search log mining based query expansion technique to improve effectiveness in code search. In Proceedings of the 2016 19th International Conference on Computer and Information Technology (ICCIT'16). 586-591.
Abdus Satter and Kazi Sakib. A similarity-based method retrieval technique to improve effectiveness in code search. In Companion to the 1st International Conference on the Art, Science, and Engineering of Programming. ACM, New York, NY, 1-3.
Max Eric Henry Schumacher, Kim Tuyen Le, and Artur Andrzejak. 2020. Improving code recommendations by combining neural and classical machine learning approaches. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. ACM, New York, NY, 476-482.
Searchcode. 2022. Home Page. Retrieved June 26, 2023 from https://searchcode.com.
Shailesh Kumar Shivakumar. 2021. A survey and taxonomy of intent-based code search. International Journal of Software Innovation 9, 1 (2021), 69-110.
Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, and Yan Lei. 2020. Improving code search with co-attentive representation learning. In Proceedings of the 28th International Conference on Program Comprehension. ACM, New York, NY, 196-207.
Susan Elliott Sim, Megha Agarwala, and Medha Umarji. 2013. A controlled experiment on the process used by developers during Internet-scale code search. In Finding Source Code on the Web for Remix and Reuse. Springer, NewYork, NY, 53-77.
S. E. Sim, C. L. A. Clarke, and R. C. Holt. 1998. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 1998 6th International Workshop on Program Comprehension. 180-187.
Susan Elliott Sim, Medha Umarji, Sukanya Ratanotayanon, and Cristina V. Lopes. 2011. How well do search engines support code retrieval on the Web? ACM Transactions on Software Engineering and Methodology 21, 1 (2011), Article 4, 25 pages.
Herbert A. Simon and Allen Newell. 1971. Human problem solving: The state of the theory in 1970. American Psychologist 26, 2 (1971), 145-159.
Renuka Sindhgatta. 2006. Using an information retrieval system to retrieve source code samples. In Proceedings of the 28th International Conference on Software Engineering. ACM, New York, NY, 905-908.
Janice Singer, Timothy Lethbridge, Norman Vinson, and Nicolas Anquetil. 1997. An examination of software engineering work practices. In Proceedings of the 1997 Conference of the Centre for Advanced Studies on Collaborative Research. 21.
Raphael Sirres, Tegawendé F. Bissyandé, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, and Yves Le Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering 23, 5 (2018), 2622-2654.
Bunyamin Sisman and Avinash C. Kak. 2013. Assisting code search with automatic query reformulation for bug localization. In Proceedings of the 2013 10th Working Conference on Mining Software Repositories. 309-318.
Aishwarya Sivaraman, Tianyi Zhang, Guy Van den Broeck, and Miryung Kim. 2019. Active inductive logic programming for code search. In Proceedings of the 41st International Conference on Software Engineering. IEEE, Los Alamitos, CA, 292-303.
SourceForge. 2022. Home Page. Retrieved June 26, 2023 from https://sourceforge.net.
Stack Overflow. 2022. Home Page. Retrieved June 26, 2023 from http://stackoverflow.com.
Jamie Starke, Chris Luce, and Jonathan Sillito. 2009. Working with search results. In Proceedings of the 2009 ICSE Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. IEEE, Los Alamitos, CA, 53-56.
Kathryn Stolee and Sebastian Elbaum. 2012. Solving the Search for Suitable Code: An Initial Implementation. CSE Technical Reports. University of Nebraska-Lincoln.
Kathryn T. Stolee. 2012. Finding suitable programs: Semantic search with incomplete and lightweight specifications. In Proceedings of the 34th International Conference on Software Engineering. IEEE, Los Alamitos, CA, 1571-1574.
Kathryn T. Stolee and Sebastian Elbaum. 2012. Toward semantic search via SMT solver. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, New York, NY, Article 25, 4 pages.
Kathryn T. Stolee, Sebastian Elbaum, and Daniel Dobos. 2014. Solving the search for source code. ACM Transactions on Software Engineering and Methodology 23, 3 (June 2014), Article 26, 45 pages.
Kathryn T. Stolee, Sebastian Elbaum, and Matthew B. Dwyer. 2016. Code search with input/output queries. Journal of Systems and Software 116 (June 2016), 35-48.
J. Stylos and B. A. Myers. 2006. Mica: A web-search tool for finding API components and examples. In Proceedings of Visual Languages and Human-Centric Computing (VL/HCC '06). 195-202.
Fang-Hsiang Su, Jonathan Bell, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser, and Tony Jebara. 2016. Code relatives: Detecting similarly behaving software. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, 702-714.
Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API documentation. In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 643-652.
Zhensu Sun, Yan Liu, Chen Yang, and Yu Qian. 2020. PSCS: A path-based neural model for semantic code search. arXiv:2008.03042 (2020).
S. Surisetty. 2014. Behavior-based code search. In Proceedings of the 2014 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'14). 197-198.
J. Svajlenko, J. F. Islam, I. Keivanloo, C. K. Roy, and M. M. Mia. 2014. Towards a big data curated benchmark of interproject code clones. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution. 476-480.
J. Svajlenko and C. K. Roy. 2015. Evaluating clone detection tools with BigCloneBench. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME'15). 131-140.
Peter P. Swire. 2005. A theory of disclosure for security and competitive reasons: Open source, proprietary software, and government systems. Houston Law Review 42 (2005), 1333.
Watanabe Takuya and Hidehiko Masuhara. 2011. A spontaneous code recommendation tool based on associative search. In Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. ACM, New York, NY, 17-20.
The Apache Software Foundation. 2022. asf-Revision 1910613. Retrieved June 26, 2023 from http://svn.apache.org/repos/asf/httpd/httpd/.
Suresh Thummalapenta and Tao Xie. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, NY, 204-213.
Sander Tichelaar. 1999. FAMIX Java Language Plug-in 1.0. Technical Report. University of Bern.
Gabriel Valiente. 2002. Algorithms on Trees and Graphs. Springer Science & Business Media.
C. Van Rijsbergen. 1979. Information retrieval: Theory and practice. In Proceedings of the Joint IBM/University of Newcastle upon Tyne Seminar on Data Base Systems, Vol. 79.
Venkatesh Vinayakarao. 2015. Spotting familiar code snippet structures for program comprehension. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, New York, NY, 1054-1056.
Venkatesh Vinayakarao, Anita Sarma, Rahul Purandare, Shuktika Jain, and Saumya Jain. 2017. ANNE: Improving source code search using entity retrieval approach. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. ACM, New York, NY, 211-220.
YaoWan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, JianWu, and Philip Yu. 2019. Multi-modal attention network learning for semantic source code retrieval. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE'19). 13-25.
Shaowei Wang, David Lo, and Lingxiao Jiang. 2016. AutoQuery: Automatic construction of dependency queries for code search. Automated Software Engineering 23, 3 (2016), 393-425.
S. Wang, D. Lo, and L. Jiang. 2011. Code search via topic-enriched dependence graph matching. In Proceedings of the 2011 18th Working Conference on Reverse Engineering. 119-123.
Shaowei Wang, David Lo, and Lingxiao Jiang. 2014. Active code search: Incorporating user feedback to improve code search relevance. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. ACM, New York, NY.
Wenhua Wang, Yuqun Zhang, Zhengran Zeng, and Guandong Xu. 2020. TranS^3: A transformer-based framework for unifying code summarization and code search. arXiv:2003.03238 (2020).
Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, and Jeffrey Xu Yu. 2010. Matching dependence-related queries in the system dependence graph. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE'10). ACM, New York, NY, 457.
Yuepeng Wang, Yu Feng, Ruben Martins, Arati Kaushik, Isil Dillig, and Steven P. Reiss. 2016. Hunter: Next-generation code reuse for Java. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, 1028-1032.
Yuepeng Wang, Yu Feng, Ruben Martins, Arati Kaushik, Isil Dillig, and Steven P. Reiss. 2016. Hunter: Next-generation code reuse for Java. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, 1028-1032.
GitHub. 2022. Home Page. Retrieved June 26, 2023 from .https://www.github.com
Apache. 2022. Apache Lucene. Retrieved June 26, 2023 from https://lucene.apache.org.
Markus Weimer, Alexandros Karatzoglou, and Marcel Bruch. 2009. Maximum margin matrix factorization for code recommendation. In Proceedings of the 3rd ACM Conference on Recommender Systems. ACM, New York, NY, 309-312.
Doug Wightman, Zi Ye, Joel Brandt, and Roel Vertegaal. 2012. SnipMatch: Using source code context to enhance snippet retrieval and parameterization. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology. ACM, New York, NY, 219-228.
Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering. 1-10.
Huaiguang Wu and Yang Yang. 2019. Code search based on alteration intent. IEEE Access 7 (2019), 56796-56802.
Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E. Hassan, and Zhenchang Xing. 2017. What do developers search for on the web? Empirical Software Engineering 22, 6 (Dec. 2017), 3149-3185.
Tao Xie and Jian Pei. 2006. MAPO: Mining API usages from open source repositories. In Proceedings of the 2006 International Workshop on Mining Software Repositories. ACM, New York, NY, 54-57.
Y. Xie, T. Lin, and H. Xu. 2019. User interface code retrieval: A novel visual-representation-aware approach. IEEE Access 7 (2019), 162756-162767.
Jinxi Xu and W. Bruce Croft. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 4-11.
Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural network-based graph embedding for cross-platform binary code similarity detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, NY, 363-376.
Yinxing Xue, Zhengzi Xu, Mahinthan Chandramohan, and Yang Liu. 2019. Accurate and scalable cross-architecture cross-OS binary code search with emulation. IEEE Transactions on Software Engineering 45, 11 (2019), 1125-1149.
Shuhan Yan, Hang Yu, Yuting Chen, Beijun Shen, and Lingxiao Jiang. 2020. Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering. 344-354.
Jinqiu Yang and Lin Tan. 2012. Inferring semantically related words from software context. In Proceedings of the 2012 9th IEEE Working Conference on Mining Software Repositories. 161-170.
Jinqiu Yang and Lin Tan. 2013. SWordNet: Inferring semantically related words from software context. Empirical Software Engineering 19, 6 (2013), 1856-1886.
Yangrui Yang and Qing Huang. 2017. IECS: Intent-enforced code search via extended Boolean model. Journal of Intelligent & Fuzzy Systems 33 (Jan. 2017), 2565-2576.
Ziyu Yao, Jayavardhan Reddy Peddamail, and Huan Sun. 2019. CoaCor: Code annotation for code retrieval with reinforcement learning. In Proceedings of the 2019 World Wide Web Conference. ACM, New York, NY, 2203-2214.
Ziyu Yao, Daniel S. Weld,Wei-Peng Chen, and Huan Sun. 2018. StaQC: A systematically mined question-code dataset from Stack Overflow. In Proceedings of the 2018 World Wide Web Conference. 1693-1703.
Wei Ye, Rui Xie, Jinglei Zhang, Tianxiang Hu, Xiaoyin Wang, and Shikun Zhang. 2020. Leveraging code generation to improve code retrieval and summarization via dual learning. In Proceedings of The Web Conference 2020. ACM, New York, NY, 2309-2319.
Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th International Conference on Software Engineering. ACM, New York, NY, 404-415.
Hang Yin, Zhiyu Sun, Yanchun Sun, andWenpin Jiao. 2019. A question-driven source code recommendation service based on Stack Overflow. In Proceedings of the 2019 IEEE World Congress on Services (SERVICES'19). 358-359.
Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, and Graham Neubig. 2018. Learning to mine aligned code and natural language pairs from Stack Overflow. In Proceedings of the 2018 IEEE/ACM 15th International Conference on Mining Software Repositories. IEEE, Los Alamitos, CA, 476-486.
Alexey Zagalsky, Ohad Barzilay, and Amiram Yehudai. 2012. Example Overflow: Using social media for code recommendation. In Proceedings of the 2012 3rd InternationalWorkshop on Recommendation Systems for Software Engineering (RSSE'12). 38-42.
Amy Moormann Zaremski and Jeannette M. Wing. 1995. Signature matching: A tool for using software libraries. ACM Transactions on Software Engineering and Methodology 4, 2 (1995), 146-170.
Feng Zhang, Haoran Niu, Iman Keivanloo, and Ying Zou. 2018. Expanding queries for code search using semantically related API class-names. IEEE Transactions on Software Engineering 44, 11 (2018), 1070-1082.
Jie Zhao and Huan Sun. 2020. Adversarial training for code retrieval with question-description relevance regularization. arXiv:2010.09803 (2020).
Shufan Zhou, Beijun Shen, and Hao Zhong. 2019. Lancer: Your code tell me what you need. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE'19). 1202-1205.
S. Zhou, H. Zhong, and B. Shen. 2018. SLAMPA: Recommending code snippets with statistical language model. In Proceedings of the 2018 25th Asia-Pacific Software Engineering Conference (APSEC'18). 79-88.
Qun Zou and Changquan Zhang. 2020. Query expansion via learning change sequences. International Journal of Knowledge-Based and Intelligent Engineering Systems 24, 2 (2020), 95-105.
Yanzhen Zou, Chunyang Ling, Zeqi Lin, and Bing Xie. 2018. Graph embedding based code search in software project. In Proceedings of the 10th Asia-Pacific Symposium on Internetware. ACM, New York, NY, 1-10.