Canada Research Chairs Program Fonds de recherche du Québec Natural Sciences and Engineering Research Council of Canada Canadian Institute for Advanced Research
Asaduzzaman, M., Mashiyat, A.S., Roy, C.K., Schneider, K.A., Answering questions about unanswered questions of stack overflow. 2013 10th Working Conference on Mining Software Repositories, MSR, 2013, IEEE, 97–100.
Association, I.S., et al. Standard glossary of software engineering terminology. IEEE Std, 1990, 610–612.
Bakker, M., Chadwick, M., Sheahan, H., Tessler, M., Campbell-Gillingham, L., Balaguer, J., McAleese, N., Glaese, A., Aslanides, J., Botvinick, M., et al. Fine-tuning language models to find agreement among humans with diverse preferences. Adv. Neural Inf. Process. Syst. 35 (2022), 38176–38189.
Barua, A., Thomas, S.W., Hassan, A.E., What are developers talking about? An analysis of topics and trends in stack overflow. Empir. Softw. Eng. 19 (2014), 619–654.
Blanco, G., Pérez-López, R., Fdez-Riverola, F., Lourenço, A.M.G., Understanding the social evolution of the java community in stack overflow: A 10-year study of developer interactions. Future Gener. Comput. Syst. 105 (2020), 446–454.
Burtch, G., Lee, D., Chen, Z., The consequences of generative ai for ugc and online community engagement. 2023 Available at SSRN 4521754.
Calefato, F., Lanubile, F., Marasciulo, M.C., Novielli, N., Mining successful answers in stack overflow. 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, 2015, IEEE, 430–433.
Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., Zhang, C., Quantifying memorization across neural language models. 2022 arXiv preprint arXiv:2202.07646.
Dakhel, A.M., Majdinasab, V., Nikanjam, A., Khomh, F., Desmarais, M.C., Jiang, Z.M.J., Github copilot ai pair programmer: Asset or liability?. J. Syst. Softw., 203, 2023, 111734.
Decan, A., Mens, T., Constantinou, E., 2018. On the impact of security vulnerabilities in the npm package dependency network. In: Proceedings of the 15th International Conference on Mining Software Repositories. pp. 181–191.
del Rio-Chanona, M., Laurentsyeva, N., Wachs, J., Are large language models a threat to digital public goods? Evidence from activity on stack overflow. 2023 arXiv preprint arXiv:2307.07367.
Delile, Z., Radel, S., Godinez, J., Engstrom, G., Brucker, T., Young, K., Ghanavati, S., Evaluating privacy questions from stack overflow: Can chatgpt compete?. 2023 arXiv preprint arXiv:2306.11174.
Dias, K., Borba, P., Barreto, M., Understanding predictive factors for merge conflicts. Inf. Softw. Technol., 121, 2020, 106256.
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al. Codebert: A pre-trained model for programming and natural languages. 2020 arXiv preprint arXiv:2002.08155.
Galappaththi, A., Nadi, S., Treude, C., 2022. Does this apply to me? an empirical study of technical context in stack overflow. In: Proceedings of the 19th International Conference on Mining Software Repositories. pp. 23–34.
GitHub, Github copilot: Your ai pair programmer. 2024 URL https://github.com/features/copilot/.
Goodrich, B., Rao, V., Liu, P.J., Saleh, M., 2019. Assessing the factual accuracy of generated text. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 166–175.
Hämäläinen, P., Tavast, M., Kunnari, A., 2023. Evaluating large language models in generating synthetic hci research data: a case study. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. pp. 1–19.
Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., Wang, H., Large language models for software engineering: A systematic literature review. 2023 arXiv preprint arXiv:2308.10620.
Johnson, J., Lubo, S., Yedla, N., Aponte, J., Sharif, B., An empirical study assessing source code readability in comprehension. 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME, 2019, IEEE, 513–523.
Kabir, S., Udo-Imeh, D.N., Kou, B., Zhang, T., Who answers it better? An in-depth analysis of chatgpt and stack overflow answers to software engineering questions. 2023 arXiv preprint arXiv:2308.02312.
Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R., McHardy, R., Challenges and applications of large language models. 2023 arXiv preprint arXiv:2307.10169.
Kashefi, A., Mukerji, T., Chatgpt for programming numerical methods. J. Mach. Learn. Model. Comput., 4(2), 2023.
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., et al. Chatgpt for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ., 103, 2023, 102274.
Krippendorff, K., Computing Krippendorff's alpha-reliability. 2011 URL https://repository.upenn.edu/handle/20.500.14332/2089.
Lahitani, A.R., Permanasari, A.E., Setiawan, N.A., Cosine similarity to determine similarity measure: Study case in online essay assessment. 2016 4th International Conference on Cyber and IT Service Management, 2016, IEEE, 1–6.
Lee, P., Bubeck, S., Petro, J., Benefits, limits, and risks of gpt-4 as an ai chatbot for medicine. N. Engl. J. Med. 388:13 (2023), 1233–1239.
Li, R., Allal, L.B., Zi, Y., Muennighoff, N., Kocetkov, D., Mou, C., Marone, M., Akiki, C., Li, J., Chim, J., et al. Starcoder: may the source be with you!. 2023 arXiv preprint arXiv:2305.06161.
Li, J., Li, D., Savarese, S., Hoi, S., Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. 2023 arXiv preprint arXiv:2301.12597.
Liang, J.T., Badea, C., Bird, C., DeLine, R., Ford, D., Forsgren, N., Zimmermann, T., 2024. Can gpt-4 replicate empirical software engineering research?. In: Proceedings of the ACM on Software Engineering 1. FSE, pp. 1330–1353.
Liu, J., Tang, X., Li, L., Chen, P., Liu, Y., Which is a better programming assistant? A comparative study between chatgpt and stack overflow. 2023 arXiv preprint arXiv:2308.13851.
Lyu, M.R., Software reliability engineering: A roadmap. Future of Software Engineering, FOSE’07, 2007, IEEE, 153–170.
Mann, H.B., Whitney, D.R., On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18:1 (1947), 50–60, 10.1214/aoms/1177730491.
Marsicano, G., Pereira, D.V., da Silva, F.Q., França, C., Team maturity in software engineering teams. 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM, 2017, IEEE, 235–240.
Nasehi, S.M., Sillito, J., Maurer, F., Burns, C., What makes a good code example?: A study of programming q & a in stackoverflow. 2012 28th IEEE International Conference on Software Maintenance, ICSM, 2012, IEEE, 25–34.
Oishwee, S.J., Stakhanova, N., Codabux, Z., 2024. Large language model vs. stack overflow in addressing android permission related challenges. In: Proceedings of the 21st International Conference on Mining Software Repositories. pp. 373–383.
Oliveira, D., Bruno, R., Madeiral, F., Castor, F., Evaluating code readability and legibility: An examination of human-centric studies. 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME, 2020, IEEE, 348–359.
Orosz, G., Stack overflow is dead, almost, the pragmatic engineer blog. 2025 URL https://blog.pragmaticengineer.com/stack-overflow-is-almost-dead.
Ozkaya, I., Application of large language models to software engineering tasks: Opportunities, risks, and implications. IEEE Softw. 40:3 (2023), 4–8.
Pinto, G., Cardoso-Pereira, I., Monteiro, D., Lucena, D., Souza, A., Gama, K., 2023. Large language models for education: Grading open-ended questions using chatgpt. In: Proceedings of the XXXVII Brazilian Symposium on Software Engineering. pp. 293–302.
Ragkhitwetsagul, C., Krinke, J., Paixao, M., Bianco, G., Oliveto, R., Toxic code snippets on stack overflow. IEEE Trans. Softw. Eng. 47:3 (2019), 560–581.
Ray, P.P., Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet Things Cyber-Phys. Syst., 2023.
Robillard, M.P., DeLine, R., A field study of api learning obstacles. Empir. Softw. Eng. 16 (2011), 703–732.
Rubei, R., Di Sipio, C., Nguyen, P.T., Di Rocco, J., Di Ruscio, D., Postfinder: Mining stack overflow posts to support software developers. Inf. Softw. Technol., 127, 2020, 106367.
Sadasivan, V.S., Kumar, A., Balasubramanian, S., Wang, W., Feizi, S., Can ai-generated text be reliably detected?. 2023 arXiv preprint arXiv:2303.11156.
Salton, G., Buckley, C., Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24:5 (1988), 513–523.
Shapiro, S.S., Wilk, M.B., An analysis of variance test for normality (complete samples). Biometrika 52:3/4 (1965), 591–611.
Squire, M., Should we move to stack overflow? Measuring the utility of social media for developer support. 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2, 2015, IEEE, 219–228.
StackOverflow, Temporary policy: Generative ai (e.g. chatgpt) is banned. 2023 URL https://meta.stackoverflow.com/questions/421831/temporary-policy-generative-ai-e-g-chatgpt-is-banned.
Strobelt, H., Webson, A., Sanh, V., Hoover, B., Beyer, J., Pfister, H., Rush, A.M., Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE Trans. Vis. Comput. Graphics 29:1 (2022), 1146–1156.
Surameery, N.M.S., Shakor, M.Y., Use chat gpt to solve programming bugs. Int. J. Inf. Technol. Comput. Eng. (IJITC) 2455-5290, 3(01), 2023, 17–22.
Syam, G., Lal, S., Chen, T., Empirical study of the evolution of python questions on stack overflow. e-Inform. Softw. Eng. J., 17(1), 2023.
Tamburri, D.A., Kruchten, P., Lago, P., van Vliet, H., What is social debt in software engineering?. 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE, 2013, IEEE, 93–96.
Tang, R., Chuang, Y.-N., Hu, X., The science of detecting llm-generated texts. 2023 arXiv preprint arXiv:2303.07205.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. Llama 2: Open foundation and fine-tuned chat models. 2023 arXiv preprint arXiv:2307.09288.
Uddin, G., Baysal, O., Guerrouj, L., Khomh, F., Understanding how and why developers seek and analyze api-related opinions. IEEE Trans. Softw. Eng. 47:4 (2019), 694–735.
Verdi, M., Sami, A., Akhondali, J., Khomh, F., Uddin, G., Motlagh, A.K., An empirical study of c++ vulnerabilities in crowd-sourced code examples. IEEE Trans. Softw. Eng. 48:5 (2020), 1497–1514.
Wagner, S., Barón, M.M., Falessi, D., Baltes, S., Towards evaluation guidelines for empirical studies involving llms. 2024 arXiv preprint arXiv:2411.07668.
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C., A prompt pattern catalog to enhance prompt engineering with chatgpt. 2023 arXiv preprint arXiv:2302.11382.
Widjojo, P., Treude, C., Addressing compiler errors: Stack overflow or large language models?. 2023 arXiv preprint arXiv:2307.10793.
Xia, X., Bao, L., Lo, D., Kochhar, P.S., Hassan, A.E., Xing, Z., What do developers search for on the web?. Empir. Softw. Eng. 22 (2017), 3149–3185.
Xu, F.F., Alon, U., Neubig, G., Hellendoorn, V.J., 2022. A systematic evaluation of large language models of code. In: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. pp. 1–10.
Xue, J., Wang, L., Zheng, J., Li, Y., Tan, Y., Can chatgpt kill user-generated q & a platforms?. 2023 Available at SSRN 4448938.
Yazdaninia, M., Lo, D., Sami, A., Characterization and prediction of questions without accepted answers on stack overflow. 2021 IEEE/ACM 29th International Conference on Program Comprehension, ICPC, 2021, IEEE, 59–70.
Yli-Huumo, J., Maglyas, A., Smolander, K., How do software development teams manage technical debt?–An empirical study. J. Syst. Softw. 120 (2016), 195–218.
Zhang, T., Upadhyaya, G., Reinhardt, A., Rajan, H., Kim, M., 2018. Are code examples on an online q & a forum reliable? A study of api misuse on stack overflow. In: Proceedings of the 40th International Conference on Software Engineering. pp. 886–896.
Zheng, Z., Ning, K., Chen, J., Wang, Y., Chen, W., Guo, L., Wang, W., Towards an understanding of large language models in software engineering tasks. 2023 arXiv preprint arXiv:2308.11396.