Task-oriented communications; semantic communications; data quantization
Résumé :
[en] Various applications for inter-machine communications are on the rise. Whether it is for autonomous driving vehicles or the internet of everything, machines are more connected than ever to improve their performance in fulfilling a given task. While in traditional communications the goal has often been to reconstruct the underlying message, under the emerging task-oriented paradigm, the goal of communication is to enable the receiving end to make more informed decisions or more precise estimates/computations. Motivated by these recent developments, in this paper, we perform an indirect design of the communications in a multi-agent system (MAS) in which agents cooperate to maximize the averaged sum of discounted one-stage rewards of a collaborative task. Due to the bit-budgeted communications between the agents, each agent should efficiently represent its local observation and communicate an abstracted version of the observations to improve the collaborative task performance. We first show that this problem can be approximated as a form of data-quantization problem which we call task-oriented data compression (TODC). We then introduce the state-aggregation for information compression algorithm (SAIC) to solve the formulated TODC problem. It is shown that SAIC is able to achieve near-optimal performance in terms of the achieved sum of discounted rewards. The proposed algorithm is applied to a geometric consensus problem and its performance is compared with several benchmarks. Numerical experiments confirm the promise of this indirect design approach for task-oriented multi-agent communications.
Disciplines :
Ingénierie électrique & électronique
Auteur, co-auteur :
MOSTAANI, Arsham ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SigCom
VU, Thang Xuan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SigCom
CHATZINOTAS, Symeon ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SigCom
OTTERSTEN, Björn ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
Task-Oriented Data Compression for Multi-Agent Communications Over Bit-Budgeted Channels
Date de publication/diffusion :
06 octobre 2022
Titre du périodique :
IEEE Open Journal of the Communications Society
eISSN :
2644-125X
Maison d'édition :
Institute of Electrical and Electronics Engineers (IEEE), New York, Etats-Unis - New York
A. Mostaani, T. X. Vu, S. K. Sharma, Q. Liao, and S. Chatzinotas, "Task-oriented communication system design in cyberphysical systems: A survey on theory and applications," 2021, arXiv:2102.07166.
D. Gunduz et al., "Beyond transmitting bits: Context, semantics, and task-oriented communications," 2022, arXiv:2207.09353.
E. C. Strinati and S. Barbarossa, "6G networks: Beyond Shannon towards semantic and goal-oriented communications," Comput. Netw., vol. 190, May 2021, Art. no. 107930.
H. Witsenhausen, "Indirect rate distortion problems," IEEE Trans. Inf. Theory, vol. 26, no. 5, pp. 518-521, Sep. 1980.
P. Ioannou and J. Sun, "Theory and design of robust direct and indirect adaptive-control schemes," Int. J. Control, vol. 47, no. 3, pp. 775-813, 1988.
A. Barel, R. Manor, and A. M. Bruckstein, "COME TOGETHER: Multi-agent geometric consensus," 2017, arXiv:1902.01455.
H. Xie, Z. Qin, X. Tao, and K. B. Letaief, "Task-oriented multi-user semantic communications," IEEE J. Sel. Areas Commun., vol. 40, no. 9, pp. 2584-2597, Sep. 2022.
N. Shlezinger and Y. C. Eldar, "Task-based quantization with application to MIMO receivers," 2020, arXiv:2002.04290.
M. R. Palattella and N. Accettura, "Enabling internet of everything everywhere: LPWAN with satellite backhaul," in Proc. Global Inf. Infrastruct. Netw. Symp. (GIIS), 2018, pp. 1-5.
L. Chaari, M. Fourati, and J. Rezgui, "Heterogeneous LoRaWAN & LEO satellites networks concepts, architectures and future directions," in Proc. Global Inf. Infrastruct. Netw. Symp. (GIIS), 2019, pp. 1-6.
M. M. Azari et al., "Evolution of non-terrestrial networks from 5G to 6G: A survey," IEEE Commun. Surveys Tuts., early access, Aug. 18, 2022, doi: 10.1109/COMST.2022.3199901.
G. N. Nair and R. J. Evans, "Exponential stabilisability of finitedimensional linear systems with limited data rates," Automatica, vol. 39, no. 4, pp. 585-593, 2003.
G. N. Nair and R. J. Evans, "Stabilizability of stochastic linear systems with finite feedback data rates," SIAM J. Control Optim., vol. 43, no. 2, pp. 413-436, 2004.
M. Lauer and M. A. Riedmiller, "An algorithm for distributed reinforcement learning in cooperative multi-agent systems," in Proc. Conf. Mach. Learn., 2000, pp. 535-542.
V. Kostina and B. Hassibi, "Rate-cost tradeoffs in control," IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4525-4540, Nov. 2019.
T.-Y. Tung, S. Kobus, J. R. Pujol, and D. Gunduz, "Effective communications: A joint learning and communication framework for multi-agent reinforcement learning over noisy channels," 2021, arXiv:2101.10369.
S. Arimoto, "An algorithm for computing the capacity of arbitrary discrete memoryless channels," IEEE Trans. Inf. Theory, vol. 18, no. 1, pp. 14-20, Jan. 1972.
N. Shlezinger and Y. C. Eldar, "Deep task-based quantization," Entropy, vol. 23, no. 1, p. 104, 2021.
D. V. Pynadath and M. Tambe, "The communicative multiagent team decision problem: Analyzing teamwork theories and models," J. Artif. Intell. Res., vol. 16, pp. 389-423, Jun. 2002.
D. Lee, N. He, P. Kamalaruban, and V. Cevher, "Optimization for reinforcement learning: From a single agent to cooperative agents," IEEE Signal Process. Mag., vol. 37, no. 3, pp. 123-135, May 2020.
C. Zhang and V. Lesser, "Coordinating multi-agent reinforcement learning with limited communication," in Proc. Conf. Auton. Agents Multi-Agent Syst., St. Paul, MN, USA, May 2013, pp. 1101-1108.
F. Fischer, M. Rovatsos, and G. Weiss, "Hierarchical reinforcement learning in communication-mediated multiagent coordination," in Proc. IEEE Joint Conf. Auton. Agents Multiagent Syst. (AAMAS), Jul. 2004, pp. 1334-1335.
T. Kasai, H. Tenmoto, and A. Kamiya, "Learning of communication codes in multi-agent reinforcement learning problem," in Proc. Soft Comput. Ind. Appl. (SMCia) Conf., 2008, pp. 1-6.
F. Wu, S. Zilberstein, and X. Chen, "Online planning for multi-agent systems with bounded communication," Artif. Intell., vol. 175, no. 2, pp. 487-511, Feb. 2011.
A. Amini, A. Asif, and A. Mohammadi, "CEASE: A collaborative event-triggered average-consensus sampled-data framework with performance guarantees for multi-agent systems," IEEE Trans. Signal Process., vol. 66, no. 23, pp. 6096-6109, Dec. 2018.
J. Foerster, Y. Assael, N. de Freitas, and S. Whiteson, "Learning to communicate with deep multi-agent reinforcement learning," in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 2137-2145.
A. Mostaani, O. Simeone, S. Chatzinotas, and B. Ottersten, "Learningbased physical layer communications for multiagent collaboration," in Proc. IEEE Int. Symp. Pers. Indoor Mobile Radio Commun., Sep. 2019, pp. 1-6.
A. Mostaani, T. X. Vu, S. Chatzinotas, and B. Ottersten, "State aggregation for Multiagent communication over rate-limited channels," in Proc. IEEE Global Commun. Conf. (GLOBECOM), 2020, pp. 1-7.
D. Kim et al., "Learning to schedule communication in multi-agent reinforcement learning," in Proc. Int. Conf. Learn. Represent., 2019.
R. Lowe, J. Foerster, Y.-L. Boureau, J. Pineau, and Y. Dauphin, "On the pitfalls of measuring emergent communication," in Proc. Int. Conf. Auton. Agents MultiAgent Syst., 2019, pp. 693-701.
D. P. Bertsekas and D. A. Castanon, "Adaptive aggregation methods for infinite horizon dynamic programming," IEEE Trans. Autom. Control, vol. 34, no. 6, pp. 589-598, Jun. 1989.
D. P. Bertsekas, "Feature-based aggregation and deep reinforcement learning: A survey and some new implementations," IEEE/CAA J. Automatica Sinica, vol. 6, no. 1, pp. 1-31, Jan. 2019.
D. Abel, D. Hershkowitz, and M. Littman, "Near optimal behavior via approximate state abstraction," in Proc. Int. Conf. Mach. Learn., 2016, pp. 2915-2923.
G. Rubino and B. Sericola, "On weak lumpability in Markov chains," J. Appl. Probabil., vol. 26, no. 3, pp. 446-457, 1989.
D. Bertsekas, "Biased aggregation, Rollout, and enhanced policy improvement for reinforcement learning," 2019, arXiv:1910.02426.
H. Zou, C. Zhang, S. Lasaulce, L. Saludjian, and P. Panciatici, "Decision-oriented communications: Application to energy-efficient resource allocation," in Proc. Int. Conf. Wireless Netw. Mobile Commun., 2018, pp. 1-6.
H. Mao, Z. Zhang, Z. Xiao, Z. Gong, and Y. Ni, "Learning agent communication under limited bandwidth by message pruning," 2019, arXiv:1912.05304.
S. Sukhbaatar, A. Szlam, and R. Fergus, "Learning multiagent communication with backpropagation," in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 2244-2252.
P. A. Stavrou and M. Kountouris, "A rate distortion approach to goaloriented communication," in Proc. ISIT, 2022, pp. 590-595.
F. A. Oliehoek, M. T. Spaan, and N. Vlassis, "Optimal and approximate Q-value functions for decentralized POMDPs," J. Artif. Intell. Res., vol. 32, pp. 289-353, May 2008.
G. E. Monahan, "State of the art-A survey of partially observable Markov decision processes: Theory, models, and algorithms," Manage. Sci., vol. 28, no. 1, pp. 1-16, 1982.
P. Xuan, V. Lesser, and S. Zilberstein, "Communication decisions in multi-agent cooperation: Model and experiments," in Proc. 5th Int. Conf. Auton. Agents, 2001, pp. 616-623. [Online]. Available: https: //doi.org/10.1145/375735.376469
F. A. Oliehoek, M. T. Spaan, and N. Vlassis, "DECPoMDPs with delayed communication," in Proc. Multi-Agent Sequential Decis.-Making Uncertain Domains, Honolulu, HI, USA, May 2007.
B. Larrousse, S. Lasaulce, and M. R. Bloch, "Coordination in distributed networks via coded actions with application to power control," IEEE Trans. Inf. Theory, vol. 64, no. 5, pp. 3633-3654, May 2018.
R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 2nd ed., vol. 135. Cambridge, MA, USA: MIT Press, Nov. 2017.
Y. Rizk, M. Awad, and E. W. Tunstel, "Decision making in multiagent systems: A survey," IEEE Trans. Cogn. Develop. Syst., vol. 10, no. 3, pp. 514-529, Sep. 2018.
C. Boutilier, "Multiagent systems: Challenges and opportunities for decision-theoretic planning," AI Mag., vol. 20, no. 4, pp. 35-35, 1999.
T. S. Jaakkola, M. I. Jordan, and S. P. Singh, "Convergence of stochastic iterative dynamic programming algorithms," in Proc. Adv. Neural Inf. Process. Syst., 1994, pp. 703-710.
F. Heylighen, "Stigmergy as a universal coordination mechanism I: Definition and components," Cogn. Syst. Res., vol. 38, pp. 4-13, Jun. 2016.
F. A. Oliehoek and C. Amato, A concise Introduction to Decentralized POMDPs, vol. 1. Cham, Switzerland: Springer, 2016.
S. Yüksel, "Jointly optimal LQG quantization and control policies for multi-dimensional systems," IEEE Trans. Autom. Control, vol. 59, no. 6, pp. 1612-1617, Jun. 2014.
S. Lloyd, "Least squares quantization in PCM," IEEE Trans. Inf. Theory, vol. 28, no. 2, pp. 129-137, Mar. 1982.
Y. Linde, A. Buzo, and R. Gray, "An algorithm for vector quantizer design," IEEE Trans. Commun., vol. 28, no. 1, pp. 84-95, Jan. 1980.
J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, "Counterfactual multi-agent policy gradients," in Proc. 32nd AAAI Conf. Artif. Intell., 2018, pp. 2974-2982.
C. Amato, J. S. Dibangoye, and S. Zilberstein, "Incremental policy generation for finite-horizon DEC-POMDPs," in Proc. 19th Int. Conf. Autom. Planning Scheduling, 2009, pp. 2-9.
M. G. Azar, R. Munos, M. Ghavamzadaeh, and H. J. Kappen, "Speedy Q-learning," in Proc. NIPS, 2011, pp. 2411-2419.