Q-value functions for decentralized POMDPs

Oliehoek, Frans A.; VLASSIS, Nikos

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

Q-value functions for decentralized POMDPs

Oliehoek, Frans A.; VLASSIS, Nikos

2007 • In Proc Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems

Peer reviewed

Permalink
https://hdl.handle.net/10993/11032

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

download.pdf

Publisher postprint (175.6 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and then a policy is extracted from this value function. In this paper we study whether similar Q-value functions can be defined in decentralized POMDP models (Dec-POMDPs), what the cost of computing such value functions is, and how policies can be extracted from such value functions. Using the framework of Bayesian games, we argue that searching for the optimal Q-value function may be as costly as exhaustive policy search. Then we analyze various approximate Q-value functions that allow efficient computation. Finally, we describe a family of algorithms for extracting policies from such Q-value functions.

Disciplines :

Computer science

Identifiers :

UNILU:UL-ARTICLE-2011-710

Author, co-author :

Oliehoek, Frans A.

VLASSIS, Nikos ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)

Language :

English

Title :

Q-value functions for decentralized POMDPs

Publication date :

2007

Event name :

Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems

Event date :

2007

Main work title :

Proc Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems

Pages :

833-840

Peer reviewed :

Peer reviewed

Available on ORBilu :

since 17 November 2013

Statistics

Number of views

124 (2 by Unilu)

Number of downloads

231 (1 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 22:423-455, December 2004.
D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of Markov decision processes. Math. Oper. Res., 27(4):819-840, 2002.
C. Boutilier. Planning, learning and coordination in multiagent decision processes. In TARK '96: Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge, pages 195-210, 1996.
R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, pages 136-143, 2004.
C. Guestrin, D. Koller, and R. Parr. Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems 14, pages 1523-1530, 2002.
E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In Proceedings of the Nineteenth National Conference on Artificial Intelligence, pages 709-715, 2004.
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artif. Intell., 101(1-2):99-134, 1998.
J. R. Kok and N. Vlassis. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7:1789-1828, 2006.
M. Littman, A. Cassandra, and L. Kaelbling. Learning policies for partially observable environments: Scaling up. In International Conference on Machine Learning, pages 362-370, 1995.
R. Nair, M. Tambe, M. Yokoo, D. V. Pynadath, and S. Marsella. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 705-711, 2003.
M. J. Osborne and A. Rubinstein. A Course in Game Theory. The MIT Press, July 1994.
C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of Markov decision processes. Mathematics of Operations Research, 12(3):441-451, 1987.
M. L. Puterman. Markov Decision Processes-Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, 1994.
M. Roth, R. Simmons, and M. Veloso. Reasoning about joint beliefs for execution-time communication decisions. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, pages 786-793, 2005.
P. Stone and M. Veloso. Multiagent systems: a survey from a machine learning perspective. Autonomous Robots, 8(3), 2000.
D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A heuristic search algorithm for solving decentralized POMDPs. In Proc. of the Twenty First Conference on Uncertainty in Artificial Intelligence, 2005.
N. Vlassis. A concise introduction to multiagent systems and distributed AI. Informatics Institute, University of Amsterdam, Sept. 2003.
G. Weiss, editor. Multiagent Systems: a Modern Approach to Distributed Artificial Intelligence. MIT Press, 1999.