Multi-Agent Meta Reinforcement Learning for Reliable and Low-Latency Distributed Inference in Resource-Constrained UAV Swarms

DHUHEIR, Marwan Abdou Hassan; Erbad, Aiman; Hamdaoui, Bechir; Belhaouari, Samir Brahim; Guizani, Mohsen; VU Thang Xuan

doi:10.1109/ACCESS.2025.3572036

Article (Scientific journals)

Multi-Agent Meta Reinforcement Learning for Reliable and Low-Latency Distributed Inference in Resource-Constrained UAV Swarms

DHUHEIR, Marwan Abdou Hassan; Erbad, Aiman; Hamdaoui, Bechir et al.

2025 • In IEEE Access, 13, p. 103045 - 103059

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/66143

DOI
10.1109/ACCESS.2025.3572036

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Multi-Agent_Meta_Reinforcement_Learning_for_Reliable_and_Low-Latency_Distributed_Inference_in_Resource-Constrained_UAV_Swarms.pdf

Author postprint (2.18 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

distributed resource optimization; energy harvesting; Industrial Internet of Things; Meta-reinforcement learning; UAV swarms; Aerial vehicle; Distributed resource optimization; Distributed resources; Energy; Industrial internet of thing; Layer distributions; Reinforcement learnings; Resources optimization; Unmanned aerial vehicle swarm; Computer Science (all); Materials Science (all); Engineering (all); Autonomous aerial vehicles; Reliability; Resource management; Data communication; Surveillance; Vectors; Optimization; Real-time systems; Energy consumption

Abstract :

[en] The integration of unmanned aerial vehicles (UAVs) in the Industrial Internet of Things (IIoT) for smart city applications has been gaining significant attention. UAV swarms are increasingly employed to monitor ground-based IIoT devices in smart cities, offering valuable support to situation-awareness IoT applications, such as surveillance, traffic management, and emergency response. A key requirement in these applications is minimizing the latency of data processing, particularly for time-sensitive tasks like image classification of IIoT device data. Due to resource limitations, UAVs often rely on online task offloading to remote machines, but this can be inefficient due to unstable connections, constrained resources, and high latency. Distributed inference enabled via swarms of collaborative UAVs presents a promising solution by partitioning tasks among UAVs based on their available resources, allowing for more efficient, collaborative processing. However, the IIoT inference distribution raises challenges in ensuring reliable data transmission with minimal latency while respecting the practical UAVs’ constraints. To address these issues, we formulate the problem of CNN layer distribution and UAV trajectory planning (LDTP) as an optimization problem to improve latency, reliability, and resource usage. Given the complexity of the LDTP solution for managing online requests, we propose a real-time, lightweight solution using multi-agent meta-reinforcement learning. Our approach is tested on CNN networks and benchmarked against state-of-the-art conventional reinforcement learning algorithms. Extensive simulations show that our model outperforms competitive methods by around 29% in terms of latency and around 23% in terms of transmission power improvements while delivering results comparable to the traditional LDTP optimization solution by around 9% in terms of latency.

Disciplines :

Computer science

Author, co-author :

DHUHEIR, Marwan Abdou Hassan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SigCom

Erbad, Aiman ; Qatar University, College of Engineering, Doha, Qatar

Hamdaoui, Bechir; Hamad Bin Khalifa University, College of Science and Engineering, Doha, Qatar

Belhaouari, Samir Brahim ; Hamad Bin Khalifa University, College of Science and Engineering, Doha, Qatar

Guizani, Mohsen ; Mohamed bin Zayed University of Artificial Intelligence, Machine Learning Department, Abu Dhabi, United Arab Emirates

VU Thang Xuan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SigCom

External co-authors :

yes

Language :

English

Title :

Multi-Agent Meta Reinforcement Learning for Reliable and Low-Latency Distributed Inference in Resource-Constrained UAV Swarms

Publication date :

20 May 2025

Journal title :

IEEE Access

ISSN :

2169-3536

Publisher :

Institute of Electrical and Electronics Engineers Inc.

Volume :

Pages :

103045 - 103059

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://xplorestaging.ieee.org/ielx8/6287639/10820123/11007537.pdf?arnumber=11007537

Funders :

Luxembourg National Research Fund through the Project RUTINE

Funding text :

This work was supported in part by Luxembourg National Research Fund through the Project RUTINE under Grant C22/IS/17220888.

Available on ORBilu :

since 27 October 2025

Statistics

Number of views

49 (10 by Unilu)

Number of downloads

6 (1 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™

Bibliography

J.-C. Padró, F.-J. Muñoz, J. Planas, and X. Pons, ‘‘Comparison of four UAV georeferencing methods for environmental monitoring purposes focusing on the combined use with airborne and satellite remote sensing platforms,’’ Int. J. Appl. Earth Observ. Geoinf., vol. 75, pp. 130–140, Mar. 2019.
P. McEnroe, S. Wang, and M. Liyanage, ‘‘A survey on the convergence of edge computing and AI for UAVs: Opportunities and challenges,’’ IEEE Internet Things J., vol. 9, no. 17, pp. 15435–15459, Sep. 2022.
L. Lu, G. Fasano, A. Carrio, M. Lei, H. Bavle, and P. Campoy, ‘‘A comprehensive survey on non-cooperative collision avoidance for micro aerial vehicles: Sensing and obstacle detection,’’ J. Field Robot., vol. 40, no. 6, pp. 1697–1720, Sep. 2023.
A. V. Savkin, C. Huang, and W. Ni, ‘‘Joint multi-UAV path planning and LoS communication for mobile-edge computing in IoT networks with RISs,’’ IEEE Internet Things J., vol. 10, no. 3, pp. 2720–2727, Feb. 2023.
A. H. Wheeb, R. Nordin, A. A. Samah, M. H. Alsharif, and M. A. Khan, ‘‘Topology-based routing protocols and mobility models for flying ad hoc networks: A contemporary review and future research directions,’’ Drones, vol. 6, no. 1, p. 9, Dec. 2021.
S. Disabato, M. Roveri, and C. Alippi, ‘‘Distributed deep convolutional neural networks for the Internet-of-Things,’’ IEEE Trans. Comput., vol. 70, no. 8, pp. 1239–1252, Aug. 2021.
M. Jouhari, A. K. Al-Ali, E. Baccour, A. Mohamed, A. Erbad, M. Guizani, and M. Hamdi, ‘‘Distributed CNN inference on resource-constrained UAVs for surveillance systems: Design and optimization,’’ IEEE Internet Things J., vol. 9, no. 2, pp. 1227–1242, Jan. 2022.
M. A. Dhuheir, E. Baccour, A. Erbad, S. S. Al-Obaidi, and M. Hamdi, ‘‘Deep reinforcement learning for trajectory path planning and distributed inference in resource-constrained UAV swarms,’’ IEEE Internet Things J., vol. 10, no. 9, pp. 8185–8201, May 2023.
Y. Cai, X. Jiang, M. Liu, N. Zhao, Y. Chen, and X. Wang, ‘‘Resource allocation for URLLC-oriented two-way UAV relaying,’’ IEEE Trans. Veh. Technol., vol. 71, no. 3, pp. 3344–3349, Mar. 2022.
S. Xu, X. Zhang, C. Li, D. Wang, and L. Yang, ‘‘Deep reinforcement learning approach for joint trajectory design in multi-UAV IoT networks,’’ IEEE Trans. Veh. Technol., vol. 71, no. 3, pp. 3389–3394, Mar. 2022.
T. Zeng, M. Mozaffari, O. Semiari, W. Saad, M. Bennis, and M. Debbah, ‘‘Wireless communications and control for swarms of cellular-connected UAVs,’’ in Proc. 52nd Asilomar Conf. Signals, Syst., Comput., Oct. 2018, pp. 719–723.
T. Zeng, O. Semiari, W. Saad, and M. Bennis, ‘‘Joint communication and control system design for connected and autonomous vehicle navigation,’’ in Proc. IEEE Int. Conf. Commun. (ICC), May 2019, pp. 1–6.
A. H. Wheeb, R. Nordin, A. A. Samah, and D. Kanellopoulos, ‘‘Performance evaluation of standard and modified OLSR protocols for uncoordinated UAV ad-hoc networks in search and rescue environments,’’ Electronics, vol. 12, no. 6, p. 1334, Mar. 2023.
X. Huo, H. Zhang, Z. Wang, H. Yan, and C. Liu, ‘‘An efficient matching game approach to association formation in UAV-enabled hierarchical distributed learning,’’ IEEE Trans. Cybern., vol. 54, no. 10, pp. 5696–5707, Oct. 2024.
W.-Q. Ren, Y.-B. Qu, C. Dong, Y.-Q. Jing, H. Sun, Q.-H. Wu, and S. Guo, ‘‘A survey on collaborative DNN inference for edge intelligence,’’ Mach. Intell. Res., vol. 20, no. 3, pp. 370–395, Jun. 2023.
B. Yang, X. Cao, C. Yuen, and L. Qian, ‘‘Offloading optimization in edge computing for deep-learning-enabled target tracking by Internet of UAVs,’’ IEEE Internet Things J., vol. 8, no. 12, pp. 9878–9893, Jun. 2021.
M. T. Chowdhury and J. Cleland-Huang, ‘‘Engineering challenges for AI-supported computer vision in small uncrewed aerial systems,’’ in Proc. IEEE/ACM 2nd Int. Conf. AI Eng.-Softw. Eng. AI (CAIN), May 2023, pp. 158–170.
J. Zhao, Y. Song, S. Liu, I. G. Harris, and S. Abdu Jyothi, ‘‘LinguaLinked: A distributed large language model inference system for mobile devices,’’ 2023, arXiv:2312.00388.
X. Qi, J. Chong, Q. Zhang, and Z. Yang, ‘‘Collaborative computation offloading in the multi-UAV fleeted mobile edge computing network via connected dominating set,’’ IEEE Trans. Veh. Technol., vol. 71, no. 10, pp. 10832–10848, Oct. 2022.
E. Baccour, N. Mhaisen, A. A. Abdellatif, A. Erbad, A. Mohamed, M. Hamdi, and M. Guizani, ‘‘Pervasive AI for IoT applications: A survey on resource-efficient distributed artificial intelligence,’’ IEEE Commun. Surveys Tuts., vol. 24, no. 4, pp. 2366–2418, 4th Quart., 2022.
H. Chang, Y. Chen, B. Zhang, and D. Doermann, ‘‘Multi-UAV mobile edge computing and path planning platform based on reinforcement learning,’’ IEEE Trans. Emerg. Topics Comput. Intell., vol. 6, no. 3, pp. 489–498, Jun. 2022.
X. Liu, B. Lai, B. Lin, and V. C. M. Leung, ‘‘Joint communication and trajectory optimization for multi-UAV enabled mobile Internet of Vehicles,’’ IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, pp. 15354–15366, Sep. 2022.
C. Hao, Y. Chen, Z. Mai, G. Chen, and M. Yang, ‘‘Joint optimization on trajectory, transmission and time for effective data acquisition in UAV-enabled IoT,’’ IEEE Trans. Veh. Technol., vol. 71, no. 7, pp. 7371–7384, Jul. 2022.
W. Ren, Y. Qu, C. Dong, Y. Jing, H. Sun, Q. Wu, and S. Guo, ‘‘A survey on collaborative DNN inference for edge intelligence,’’ 2022, arXiv:2207.07812.
S. Teerapittayanon, B. McDanel, and H. T. Kung, ‘‘Distributed deep neural networks over the cloud, the edge and end devices,’’ in Proc. IEEE 37th Int. Conf. Distrib. Comput. Syst. (ICDCS), Jun. 2017, pp. 328–339.
B. Yang, H.-H. Wu, X. Cao, X. Li, T. Kroecker, Z. Han, and L. Qian, ‘‘Intelli-eye: An UAV tracking system with optimized machine learning tasks offloading,’’ in Proc. IEEE Conf. Comput. Commun. Workshops (INFOCOM WKSHPS), Apr. 2019, pp. 1–6.
M. Dhuheir, E. Baccour, A. Erbad, S. Sabeeh, and M. Hamdi, ‘‘Efficient real-time image recognition using collaborative swarm of UAVs and convolutional networks,’’ in Proc. Int. Wireless Commun. Mobile Comput. (IWCMC), Jun. 2021, pp. 1954–1959.
G. Qu, A. Xie, S. Liu, J. Zhou, and Z. Sheng, ‘‘Reliable data transmission scheduling for UAV-assisted air-to-ground communications,’’ IEEE Trans. Veh. Technol., vol. 72, no. 10, pp. 13787–13792, Oct. 2023.
A. Ranjha and G. Kaddoum, ‘‘URLLC facilitated by mobile UAV relay and RIS: A joint design of passive beamforming, blocklength, and UAV positioning,’’ IEEE Internet Things J., vol. 8, no. 6, pp. 4618–4627, Mar. 2021.
M. Dhuheir, A. Erbad, and S. Sabeeh, ‘‘LLHR: Low latency and high reliability CNN distributed inference for resource-constrained UAV swarms,’’ in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), Mar. 2023, pp. 1–6.
Y. Hu, M. Chen, W. Saad, H. V. Poor, and S. Cui, ‘‘Distributed multi-agent meta learning for trajectory design in wireless drone networks,’’ IEEE J. Sel. Areas Commun., vol. 39, no. 10, pp. 3177–3192, Oct. 2021.
M. Dhuheir, A. Erbad, and A. Al-Fuqaha, ‘‘Meta reinforcement learning for strategic IoT deployments coverage in disaster-response UAV swarms,’’ in Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2023, pp. 6159–6164.
J. Moon, S. Papaioannou, C. Laoudias, P. Kolios, and S. Kim, ‘‘Deep reinforcement learning multi-UAV trajectory control for target tracking,’’ IEEE Internet Things J., vol. 8, no. 20, pp. 15441–15455, Oct. 2021.
M. Dhuheir, A. Erbad, A. Al-Fuqaha, and A. M. Seid, ‘‘Meta reinforcement learning for UAV-assisted energy harvesting IoT devices in disaster-affected areas,’’ IEEE Open J. Commun. Soc., vol. 5, pp. 2145–2163, 2024.
J. Zhao, J. Sun, Z. Cai, Y. Wang, and K. Wu, ‘‘Distributed coordinated control scheme of UAV swarm based on heterogeneous roles,’’ Chin. J. Aeronaut., vol. 35, no. 1, pp. 81–97, Jan. 2022.
K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
C. Alippi, S. Disabato, and M. Roveri, ‘‘Moving convolutional neural networks to embedded systems: The AlexNet and VGG-16 case,’’ in Proc. 17th ACM/IEEE Int. Conf. Inf. Process. Sensor Netw. (IPSN), Apr. 2018, pp. 212–223.
D. Van Huynh, T. Do-Duy, L. D. Nguyen, M.-T. Le, N.-S. Vo, and T. Q. Duong, ‘‘Real-time optimized path planning and energy consumption for data collection in unmanned ariel vehicles-aided intelligent wireless sensing,’’ IEEE Trans. Ind. Informat., vol. 18, no. 4, pp. 2753–2761, Apr. 2022.
M. M. Azari, F. Rosas, K.-C. Chen, and S. Pollin, ‘‘Ultra reliable UAV communication using altitude and cooperation diversity,’’ IEEE Trans. Commun., vol. 66, no. 1, pp. 330–344, Jan. 2018.
O. Ghdiri, W. Jaafar, S. Alfattani, J. B. Abderrazak, and H. Yanikomeroglu, ‘‘Offline and online UAV-enabled data collection in time-constrained IoT networks,’’ IEEE Trans. Green Commun. Netw., vol. 5, no. 4, pp. 1918–1933, Dec. 2021.
K. Chen, Y. Wang, J. Zhao, X. Wang, and Z. Fei, ‘‘URLLC-oriented joint power control and resource allocation in UAV-assisted networks,’’ IEEE Internet Things J., vol. 8, no. 12, pp. 10103–10116, Jun. 2021.
C. Finn, P. Abbeel, and S. Levine, ‘‘Model-agnostic meta-learning for fast adaptation of deep networks,’’ in Proc. Int. Conf. Mach. Learn., Jan. 2017, pp. 1126–1135.
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Harley, T. Lillicrap, D. Silver, and K. Kavukcuoglu, ‘‘Asynchronous methods for deep reinforcement learning,’’ in Proc. Int. Conf. Mach. Learn., Jan. 2016, pp. 1928–1937.
S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.