[en] We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take actions, agents synchronize their knowledge of the environment, and the planning problem reduces to a centralized POMDP. As such, relying on communication significantly reduces the complexity of planning. In the real world however, such communication might fail temporarily. We present a step towards more realistic communication models for Dec-POMDPs by proposing a model that: (1) allows that communication might be delayed by one or more time steps, and (2) explicitly considers future probabilities of successful communication. For our model, we discuss how to efficiently compute an (approximate) value function and corresponding policies, and we demonstrate our theoretical results with encouraging experiments.
Disciplines :
Sciences informatiques
Identifiants :
UNILU:UL-ARTICLE-2011-704
Auteur, co-auteur :
Spaan, Matthijs T. J.
Oliehoek, Frans A.
VLASSIS, Nikos ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
Langue du document :
Anglais
Titre :
Multiagent Planning under Uncertainty with Stochastic Communication Delays
Date de publication/diffusion :
2008
Nom de la manifestation :
Eighteenth International Conference on Automated Planning and Scheduling (ICAPS 2008)
Date de la manifestation :
2008
Titre de l'ouvrage principal :
338 Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling (ICAPS 2008)
Becker, R.; Lesser, V.; and Zilberstein, S. 2005. Analyzing myopic approaches for multi-agent communication. In Proc. of Intelligent Agent Technology.
Bernstein, D. S.; Givan, R.; Immerman, N.; and Zilberstein, S. 2002. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27(4):819-840.
Grizzle, J. W.; Hsu, K.; and Marcus, S. I. 1982. A decentralized control strategy for multiaccess broadcast networks. Large Scale Systems 3:75-88.
Halpern, J. Y., and Moses, Y. 1990. Knowledge and common knowledge in a distributed environment. Journal of the ACM 37(3).
Hsu, K., and Marcus, S. 1982. Decentralized control of finite state Markov processes. IEEE Transactions on Automatic Control 27(2):426-431.
Kaelbling, L. P.; Littman, M. L.; and Cassandra, A. R. 1998. Planning and acting in partially observable stochastic domains. Artificial Intelligence 101:99-134.
Oliehoek, F. A.; Spaan, M. T. J.; and Vlassis, N. 2007. Dec-POMDPs with delayed communication. In Multi-agent Sequential Decision Making in Uncertain Domains. Workshop at AAMAS07.
Oliehoek, F. A.; Spaan, M. T. J.; and Vlassis, N. 2008. Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32:289-353.
Oliehoek, F. A.; Vlassis, N.; and Spaan, M. T. J. 2007. Properties of the QBG-value function. IAS technical report IAS-UVA-07-03, University of Amsterdam.
Ooi, J. M., and Wornell, G. W. 1996. Decentralized control of a multiple access broadcast channel: Performance bounds. In Proc. 35th Conf. on Decision and Control.
Pineau, J.; Gordon, G.; and Thrun, S. 2003. Point-based value iteration: An anytime algorithm for POMDPs. In Proc. Int. Joint Conf. on Artificial Intelligence. Pynadath, D. V, and Tambe, M. 2002. The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research 16:389-423.
Roth, M.; Simmons, R.; and Veloso, M. 2005. Reasoning about joint beliefs for execution-time communication decisions. In Proc. of Int. Joint Conf. on Autonomous Agents and Multi Agent Systems.
Roth, M.; Simmons, R.; and Veloso, M. 2007. Exploiting factored representations for decentralized execution in multi-agent teams. In Proc. of Int. Joint Conf. on Autonomous Agents and Multi Agent Systems.
Schoute, F. C. 1978. Decentralized control in packet switched satellite communication. IEEE Transactions on Automatic Control 23(2):362-371.
Spaan, M. T. J., and Vlassis, N. 2005. Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24:195-220.
Varaiya, P., and Walrand, J. 1978. On delayed sharing patterns. IEEE Transactions on Automatic Control 23(3):443-445.