OptLLM: Optimal Assignment of Queries to Large Language Models

Liu, Yueyue; Zhang, Hongyu; Miao, Yuantian; LE, Van Hoang; Li, Zhiqiang

doi:10.1109/ICWS62655.2024.00098

No full text

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

OptLLM: Optimal Assignment of Queries to Large Language Models

Liu, Yueyue; Zhang, Hongyu; Miao, Yuantian et al.

2024 • In Chang, Rong N. (Ed.) Proceedings - 2024 IEEE International Conference on Web Services, ICWS 2024

Peer reviewed

Permalink
https://hdl.handle.net/10993/67453

DOI
10.1109/ICWS62655.2024.00098

Files (0)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Cost-performance Tradeoff; Large Language Models; Multi-objective Optimization; Performance Prediction; Query Assignment; Cost performance; Cost-performance tradeoff; Language model; Large language model; Multi-objectives optimization; Optimal assignment; Performance; Performance prediction; Performance tradeoff; Query assignment; Artificial Intelligence; Computer Networks and Communications; Computer Science Applications; Information Systems; Information Systems and Management

Abstract :

[en] Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressing the cost-effective query allocation problem for LLMs. Given a set of input queries and candidate LLMs, our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences, including options for maximizing accuracy and minimizing cost. OptLLM predicts the performance of candidate LLMs on each query using a multi-label classification model with uncertainty estimation and then iteratively generates a set of non-dominated solutions by destructing and reconstructing the current solution. To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing. Our experimental results demonstrate that OptLLM substantially reduces costs by 2.40% to 49.18% while achieving the same accuracy as the best LLM. Compared to other multi-objective optimization algorithms, OptLLM improves accuracy by 2.94% to 69.05% at the same cost or saves costs by 8.79% and 95.87% while maintaining the highest attainable accuracy.

Disciplines :

Computer science

Author, co-author :

Liu, Yueyue; The University of Newcastle, School of Information and Physical Sciences, Newcastle, Australia

Zhang, Hongyu; Chongqing University, School of Big Data and Software Engineering, Chongqing, China

Miao, Yuantian; The University of Newcastle, School of Information and Physical Sciences, Newcastle, Australia

LE, Van Hoang ; University of Newcastle, Australia

Li, Zhiqiang; Shaanxi Normal University, School of Computer Science, China

External co-authors :

yes

Language :

English

Title :

OptLLM: Optimal Assignment of Queries to Large Language Models

Publication date :

2024

Event name :

2024 IEEE International Conference on Web Services (ICWS)

Event place :

Hybrid, Shenzhen, Chn

Event date :

07-07-2024 => 13-07-2024

Main work title :

Proceedings - 2024 IEEE International Conference on Web Services, ICWS 2024

Editor :

Chang, Rong N.

Publisher :

Institute of Electrical and Electronics Engineers Inc.

ISBN/EAN :

9798350368550

Peer reviewed :

Peer reviewed

Additional URL :

http://xplorestaging.ieee.org/ielx8/10707332/10707376/10707591.pdf?arnumber=10707591

Available on ORBilu :

since 26 January 2026

Statistics

Number of views

26 (0 by Unilu)

Number of downloads

0 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

E. Kasneci, K. Sesler, S. Küchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh, S. Günnemann, E. Hüllermeier et al., "Chatgpt for good? on opportunities and challenges of large language models for education, " Learning and individual differences, vol. 103, p. 102274, 2023.
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., "Training language models to follow instructions with human feedback, " Advances in Neural Information Processing Systems, vol. 35, pp. 27 730-27 744, 2022.
S. M. Xie, A. Raghunathan, P. Liang, and T. Ma, "An explanation of in-context learning as implicit Bayesian inference, " arXiv preprint arXiv: 2111. 02080, 2021.
S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer, "Rethinking the role of demonstrations: What makes in-context learning work?" in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022, pp. 11 048-11 064.
J. Liu, C. S. Xia, Y. Wang, and L. Zhang, "Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation, " arXiv preprint arXiv: 2305. 01210, 2023.
C. S. Xia, Y. Wei, and L. Zhang, "Automated program repair in the era of large pre-trained language models, " in Proceedings of the 45th International Conference on Software Engineering (ICSE), 2023.
C. S. Xia and L. Zhang, "Keep the conversation going: Fixing 162 out of 337 bugs for $0. 42 each using chatgpt, " arXiv preprint arXiv: 2304. 00385, 2023.
M. Sakota, M. Peyrard, and R. West, "Fly-swat or cannon? costeffective language model choice via meta-modeling, " arXiv preprint arXiv: 2308. 06077, 2023.
(2023) How much does it cost to use gpt models? gpt-3 pricing explained. [Online]. Available: Https: //neoteric. eu/blog/ how-much-does-it-cost-to-use-gpt-models-gpt-3-pricing-explained/
L. Chen, M. Zaharia, and J. Zou, "Frugalgpt: How to use large language models while reducing cost and improving performance, " arXiv preprint arXiv: 2305. 05176, 2023.
A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. d. l. Casas, E. B. Hanna, F. Bressand et al., "Mixtral of experts, " arXiv preprint arXiv: 2401. 04088, 2024.
H. Wu, M. Wu, W. Peng, S. Chen, and Z. Feng, "Its: Improved tabu search algorithm for path planning in uav-assisted edge computing systems, " in Proceedings of the 2023 IEEE International Conference on Web Services (ICWS). IEEE, 2023, pp. 340-349.
F. U. Haq, D. Shin, and L. Briand, "Efficient online testing for dnnenabled systems using surrogate-assisted and many-objective optimization, " in Proceedings of the 44th international conference on software engineering (ICSE), 2022, pp. 811-822.
R. Cheng, Y. Jin, M. Olhofer et al., "Test problems for large-scale multiobjective and many-objective optimization, " IEEE transactions on cybernetics, vol. 47, no. 12, pp. 4108-4121, 2016.
C. He, R. Cheng, and D. Yazdani, "Adaptive offspring generation for evolutionary large-scale multiobjective optimization, " IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 2, pp. 786-798, 2020.
OpenAI, "Gpt-4 technical report, " 2023.
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., "Llama: Open and efficient foundation language models, " arXiv preprint arXiv: 2302. 13971, 2023.
V.-H. Le and H. Zhang, "Log parsing: How far can chatgpt go?" in Proceedings of the 38th (2023) IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2023.
K. Deb, K. Sindhya, and J. Hakanen, "Multi-objective optimization, " in Decision sciences. CRC Press, 2016, pp. 161-200.
A. Ramirez, J. R. Romero, and S. Ventura, "A survey of many-objective optimisation in search-based software engineering, " Journal of Systems and Software, vol. 149, pp. 382-395, 2019.
M. Cheikh, B. Jarboui, T. Loukil, and P. Siarry, "A method for selecting pareto optimal solutions in multiobjective optimization, " Journal of Informatics and mathematical sciences, vol. 2, no. 1, pp. 51-62, 2010.
A. Konak, D. W. Coit, and A. E. Smith, "Multi-objective optimization using genetic algorithms: A tutorial, " Reliability engineering & system safety, vol. 91, no. 9, pp. 992-1007, 2006.
A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust optimization. Princeton university press, 2009, vol. 28.
X. Zhang, J. Zhao, and Y. LeCun, "Character-level convolutional networks for text classification, " Advances in neural information processing systems, vol. 28, 2015.
S. Reddy, D. Chen, and C. D. Manning, "Coqa: A conversational question answering challenge, " Transactions of the Association for Computational Linguistics, vol. 7, pp. 249-266, 2019.
A. Sinha and T. Khandait, "Impact of news on the commodity market: Dataset and results, " in Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 2. Springer, 2021, pp. 589-601.
J. Welbl, N. F. Liu, and M. Gardner, "Crowdsourcing multiple choice science questions, " arXiv preprint arXiv: 1707. 06209, 2017.
J. Zhu, S. He, J. Liu, P. He, Q. Xie, Z. Zheng, and M. R. Lyu, "Tools and benchmarks for automated log parsing, " in Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2019, pp. 121-130.
Z. A. Khan, D. Shin, D. Bianculli, and L. Briand, "Guidelines for assessing the accuracy of log message template identification techniques, " in Proceedings of the 44th International Conference on Software Engineering (ICSE), 2022, pp. 1095-1106.
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A fast and elitist multiobjective genetic algorithm: Nsga-ii, " IEEE transactions on evolutionary computation, vol. 6, no. 2, pp. 182-197, 2002.
C. C. Coello and M. S. Lechuga, "Mopso: A proposal for multiple objective particle swarm optimization, " in Proceedings of the 2002 Congress on Evolutionary Computation (CEC), vol. 2. IEEE, 2002, pp. 1051-1056.
Q. Zhang and H. Li, "Moea/d: A multiobjective evolutionary algorithm based on decomposition, " IEEE Transactions on evolutionary computation, vol. 11, no. 6, pp. 712-731, 2007.
J. Chen, V. Nair, and T. Menzies, "Beyond evolutionary algorithms for search-based software engineering, " Information and Software Technology, vol. 95, pp. 281-294, 2018.
K. Deb and J. Sundar, "Reference point based multi-objective optimization using evolutionary algorithms, " in Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO), 2006, pp. 635-642.
N. Beume, B. Naujoks, and M. Emmerich, "Sms-emoa: Multiobjective selection based on dominated hypervolume, " European Journal of Operational Research, vol. 181, no. 3, pp. 1653-1669, 2007.
Z. Wang, Y.-S. Ong, and H. Ishibuchi, "On scalable multiobjective test problems with hardly dominated boundaries, " IEEE Transactions on Evolutionary Computation, vol. 23, no. 2, pp. 217-231, 2018.
C. Audet, J. Bigeon, D. Cartier, S. Le Digabel, and L. Salomon, "Performance indicators in multiobjective optimization, " European journal of operational research, vol. 292, no. 2, pp. 397-422, 2021.
J. Blank and K. Deb, "pymoo: Multi-objective optimization in python, " IEEE Access, vol. 8, pp. 89 497-89 509, 2020.
T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, "Optuna: A nextgeneration hyperparameter optimization framework, " in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019, pp. 2623-2631.
Z. Jiang, J. Liu, Z. Chen, Y. Li, J. Huang, Y. Huo, P. He, J. Gu, and M. R. Lyu, "Llmparser: A llm-based log parsing framework, " arXiv preprint arXiv: 2310. 01796, 2023.
X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, "Large language models for software engineering: A systematic literature review, " arXiv preprint arXiv: 2308. 10620, 2023.
D. Sobania, M. Briesch, C. Hanna, and J. Petke, "An analysis of the automatic bug fixing performance of chatgpt, " arXiv preprint arXiv: 2301. 08653, 2023.
S. Zong, J. Seltzer, K. Cheng, J. Lin et al., "Which model shall i choose? cost/quality trade-offs for text classification tasks, " arXiv preprint arXiv: 2301. 07006, 2023.