Browse ORBi

- What it is and what it isn't
- Green Road / Gold Road?
- Ready to Publish. Now What?
- How can I support the OA movement?
- Where can I learn more?

ORBi

Optimal and approximate Q-value functions for decentralized POMDPs ; ; Vlassis, Nikos in Journal of Artificial Intelligence Research (2008), 32 Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and ... [more ▼] Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. [less ▲] Detailed reference viewed: 64 (0 UL)Exploiting locality of interaction in factored Dec-POMDPs ; ; Vlassis, Nikos et al in Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems (2008) Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We ... [more ▼] Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents in a factored representation. Factored Dec-POMDP representations have been proposed before, but only for Dec-POMDPs whose transition and observation models are fully independent. Such strong assumptions simplify the planning problem, but result in models with limited applicability. By contrast, we consider general factored Dec-POMDPs for which we analyze the model dependencies over space (locality of interaction) and time (horizon of the problem). We also present a formulation of decomposable value functions. Together, our results allow us to exploit the problem structure as well as heuristics in a single framework that is based on collaborative graphical Bayesian games (CGBGs). A preliminary experiment shows a speedup of two orders of magnitude. [less ▲] Detailed reference viewed: 23 (0 UL)Multiagent Planning under Uncertainty with Stochastic Communication Delays ; ; Vlassis, Nikos in 338 Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling (ICAPS 2008) (2008) We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal ... [more ▼] We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take actions, agents synchronize their knowledge of the environment, and the planning problem reduces to a centralized POMDP. As such, relying on communication significantly reduces the complexity of planning. In the real world however, such communication might fail temporarily. We present a step towards more realistic communication models for Dec-POMDPs by proposing a model that: (1) allows that communication might be delayed by one or more time steps, and (2) explicitly considers future probabilities of successful communication. For our model, we discuss how to efficiently compute an (approximate) value function and corresponding policies, and we demonstrate our theoretical results with encouraging experiments. [less ▲] Detailed reference viewed: 42 (0 UL)Q-value functions for decentralized POMDPs ; Vlassis, Nikos in Proc Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems (2007) Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and ... [more ▼] Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and then a policy is extracted from this value function. In this paper we study whether similar Q-value functions can be defined in decentralized POMDP models (Dec-POMDPs), what the cost of computing such value functions is, and how policies can be extracted from such value functions. Using the framework of Bayesian games, we argue that searching for the optimal Q-value function may be as costly as exhaustive policy search. Then we analyze various approximate Q-value functions that allow efficient computation. Finally, we describe a family of algorithms for extracting policies from such Q-value functions. [less ▲] Detailed reference viewed: 62 (0 UL)The Parallel Nash Memory for Asymmetric Games ; ; Vlassis, Nikos in Proc. Genetic and Evolutionary Computation Conference, Seattle, WA (2006) Coevolutionary algorithms search for test cases as part of the search process. The resulting adaptive evaluation function takes away the need to define a fixed evaluation function, but may also be ... [more ▼] Coevolutionary algorithms search for test cases as part of the search process. The resulting adaptive evaluation function takes away the need to define a fixed evaluation function, but may also be unstable and thereby prevent reliable progress. Recent work in coevolution has therefore focused on algorithms that guarantee progress with respect to a given solution concept. The Nash Memory archive guarantees monotonicity with respect to the game-theoretic solution concept of the Nash equilibrium, but is limited to symmetric games. We present an extension of the Nash Memory that guarantees monotonicity for asymmetric games. The Parallel Nash Memory is demonstrated in experiments, and its performance on general sum games is discussed. [less ▲] Detailed reference viewed: 71 (0 UL) |
||