TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models

[en] Large Language Models (LLMs) excel in code-related tasks like code generation, but benchmark evaluations often overlook task characteristics, such as difficulty. Moreover, benchmarks are usually built using tasks described with a single prompt, despite the formulation of prompts having a profound impact on the outcome. This paper introduces a generalist approach, TaskEval, a framework using diverse prompts and Item Response Theory (IRT) to efficiently assess LLMs' capabilities and benchmark task characteristics, improving the understanding of their performance. Using two code generation benchmarks, \textit{HumanEval}+ and \textit{ClassEval}, as well as 8 code generation LLMs, we show that \textit{TaskEval} is capable of characterising the properties of tasks. Using topic analysis, we identify and analyse the tasks of 17 and 21 topics within the benchmarks. We also cross-analyse tasks' characteristics with programming constructs (e.g., variable assignment, conditions, etc.) used by LLMs, emphasising some patterns with tasks' difficulty. Finally, we conduct a comparison between the difficulty assessment of tasks by human annotators and LLMs. Orthogonal to current benchmarking evaluation efforts, \textit{TaskEval} can assist researchers and practitioners in fostering better assessments of LLMs. The tasks' characteristics can be used to identify shortcomings within existing benchmarks or improve the evaluation of LLMs.

Disciplines :

Computer science

Author, co-author :

TAMBON, Florian ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

Nikanjam, Amin ; Huawei Distributed Scheduling and Data Engine Lab, Canada

Zid, Cyrine ; Polytechnique Montreal, Canada

Khomh, Foutse ; Polytechnique Montreal, Canada

Antoniol, Giuliano ; Polytechnique Montreal, Canada

External co-authors :

yes

Language :

English

Title :

TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models

Publication date :

2025

Journal title :

ACM Transactions on Software Engineering and Methodology

ISSN :

1049-331X

Publisher :

Association for Computing Machinery (ACM)

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://dl.acm.org/doi/pdf/10.1145/3773285

Available on ORBilu :

since 08 January 2026

Statistics

Number of views

18 (0 by Unilu)

Number of downloads

3 (0 by Unilu)

More statistics

OpenCitations

OpenAlex citations