Browse ORBi

- What it is and what it isn't
- Green Road / Gold Road?
- Ready to Publish. Now What?
- How can I support the OA movement?
- Where can I learn more?

ORBi

Q-value functions for decentralized POMDPs ; Vlassis, Nikos in Proc Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems (2007) Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and ... [more ▼] Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and then a policy is extracted from this value function. In this paper we study whether similar Q-value functions can be defined in decentralized POMDP models (Dec-POMDPs), what the cost of computing such value functions is, and how policies can be extracted from such value functions. Using the framework of Bayesian games, we argue that searching for the optimal Q-value function may be as costly as exhaustive policy search. Then we analyze various approximate Q-value functions that allow efficient computation. Finally, we describe a family of algorithms for extracting policies from such Q-value functions. [less ▲] Detailed reference viewed: 59 (0 UL)The Parallel Nash Memory for Asymmetric Games ; ; Vlassis, Nikos in Proc. Genetic and Evolutionary Computation Conference, Seattle, WA (2006) Coevolutionary algorithms search for test cases as part of the search process. The resulting adaptive evaluation function takes away the need to define a fixed evaluation function, but may also be ... [more ▼] Coevolutionary algorithms search for test cases as part of the search process. The resulting adaptive evaluation function takes away the need to define a fixed evaluation function, but may also be unstable and thereby prevent reliable progress. Recent work in coevolution has therefore focused on algorithms that guarantee progress with respect to a given solution concept. The Nash Memory archive guarantees monotonicity with respect to the game-theoretic solution concept of the Nash equilibrium, but is limited to symmetric games. We present an extension of the Nash Memory that guarantees monotonicity for asymmetric games. The Parallel Nash Memory is demonstrated in experiments, and its performance on general sum games is discussed. [less ▲] Detailed reference viewed: 66 (0 UL)Using the max-plus algorithm for multiagent decision making in coordination graphs ; Vlassis, Nikos in Proc. RoboCup Int. Symposium, Osaka, Japan (2006) Coordination graphs offer a tractable framework for cooperative multiagent decision making by decomposing the global payoff function into a sum of local terms. Each agent can in principle select an ... [more ▼] Coordination graphs offer a tractable framework for cooperative multiagent decision making by decomposing the global payoff function into a sum of local terms. Each agent can in principle select an optimal individual action based on a variable elimination algorithm performed on this graph. This results in optimal behavior for the group, but its worst-case time complexity is exponential in the number of agents, and it can be slow in densely connected graphs. Moreover, variable elimination is not appropriate for real-time systems as it requires that the complete algorithm terminates before a solution can be reported. In this paper, we investigate the max-plus algorithm, an instance of the belief propagation algorithm in Bayesian networks, as an approximate alternative to variable elimination. In this method the agents exchange appropriate payoff messages over the coordination graph, and based on these messages compute their individual actions. We provide empirical evidence that this method converges to the optimal solution for tree-structured graphs (as shown by theory), and that it finds near optimal solutions in graphs with cycles, while being much faster than variable elimination. [less ▲] Detailed reference viewed: 58 (1 UL)An analytic solution to discrete Bayesian reinforcement learning ; Vlassis, Nikos ; et al in Proc Int. Conf. on Machine Learning, Pittsburgh, USA (2006) Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving ... [more ▼] Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parameterization. [less ▲] Detailed reference viewed: 67 (0 UL)Accelerated EM-based clustering of large data sets ; ; Vlassis, Nikos in Data Mining & Knowledge Discovery (2006), 13(3), 291-307 Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive ... [more ▼] Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive an accelerated variant of the EM algorithm for Gaussian mixtures that: (1) offers speedups that are at least linear in the number of data points, (2) ensures convergence by strictly increasing a lower bound on the data log-likelihood in each learning step, and (3) allows ample freedom in the design of other accelerated variants. We also derive a similar accelerated algorithm for greedy mixture learning, where very satisfactory results are obtained. The core idea is to define a lower bound on the data log-likelihood based on a grouping of data points. The bound is maximized by computing in turn (i) optimal assignments of groups of data points to the mixture components, and (ii) optimal re-estimation of the model parameters based on average sufficient statistics computed over groups of data points. The proposed method naturally generalizes to mixtures of other members of the exponential family. Experimental results show the potential of the proposed method over other state-of-the-art acceleration techniques. [less ▲] Detailed reference viewed: 66 (0 UL)Gaussian fields for semi-supervised regression and correspondence learning ; Vlassis, Nikos in Pattern Recognition (2006), 39(10), 1864-1875 Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. In this paper we show how the GF framework can be used for semi-supervised ... [more ▼] Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. In this paper we show how the GF framework can be used for semi-supervised regression on high-dimensional data. We propose an active learning strategy based on entropy minimization and a maximum likelihood model selection method. Furthermore, we show how a recent generalization of the LLE algorithm for correspondence learning can be cast into the GF framework, which obviates the need to choose a representation dimensionality. [less ▲] Detailed reference viewed: 72 (0 UL)Point-Based Value Iteration for Continuous POMDPs ; Vlassis, Nikos ; et al in Journal of Machine Learning Research (2006), 7 We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for model-based POMDPs are restricted to discrete ... [more ▼] We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for model-based POMDPs are restricted to discrete states, actions, and observations, but many real-world problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states. We also demonstrate that continuous Bellman backups are contracting and isotonic ensuring the monotonic convergence of value-iteration algorithms. Relying on those properties, we extend the algorithm, originally developed for discrete POMDPs, to work in continuous state spaces by representing the observation, transition, and reward models using Gaussian mixtures, and the beliefs using Gaussian mixtures or particle sets. With these representations, the integrals that appear in the Bellman backup can be computed in closed form and, therefore, the algorithm is computationally feasible. Finally, we further extend to deal with continuous action and observation sets by designing effective sampling approaches. [less ▲] Detailed reference viewed: 21 (0 UL)Collaborative multiagent reinforcement learning by payoff propagation ; Vlassis, Nikos in Journal of Machine Learning Research (2006), 7 In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of ... [more ▼] In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the single-state case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decision-making analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decision-making tasks. We introduce different model-free reinforcement-learning techniques, unitedly called Sparse Cooperative Q-learning, which approximate the global action-value function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edge-based decomposition of the action-value function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence experimental evidence that our method outperforms related multiagent reinforcement-learning methods based on temporal differences. [less ▲] Detailed reference viewed: 45 (0 UL)Improving Approximate Value Iteration Using Memories and Predictive State Representations ; ; Vlassis, Nikos in Proc. National Conf. on Artificial Intelligence (2006) Detailed reference viewed: 22 (0 UL)Decentralized planning under uncertainty for teams of communicating agents ; ; Vlassis, Nikos in Proc. Int. Joint Conf. on Autonomous Agents and Multiagent Systems, Hakodate, Japan (2006) Decentralized partially observable Markov decision processes (DEC-POMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and partially observable ... [more ▼] Decentralized partially observable Markov decision processes (DEC-POMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and partially observable environment. Unfortunately, computing optimal plans in a DEC-POMDP has been shown to be intractable (NEXP-complete), and approximate algorithms for specific subclasses have been proposed. Many of these algorithms rely on an (approximate) solution of the centralized planning problem (i.e., treating the whole team as a single agent). We take a more decentralized approach, in which each agent only reasons over its own local state and some uncontrollable state features, which are shared by all team members. In contrast to other approaches, we model communication as an integral part of the agent's reasoning, in which the meaning of a message is directly encoded in the policy of the communicating agent. We explore iterative methods for approximately solving such models, and we conclude with some encouraging preliminary experimental results. [less ▲] Detailed reference viewed: 50 (0 UL)Planning with Continuous Actions in Partially Observable Environments ; Vlassis, Nikos in Proc. IEEE Int. Conf. on Robotics and Automation (2005) We present a simple randomized POMDP al gorithm for planning with continuous actions in partially observable environments. Our algorithm operates on a set of reachable belief points, sampled by letting ... [more ▼] We present a simple randomized POMDP al gorithm for planning with continuous actions in partially observable environments. Our algorithm operates on a set of reachable belief points, sampled by letting the robot interact randomly with the environment. We perform value iteration steps, ensuring that in each step the value of all sampled belief points is improved. The idea here is that by sampling actions from a continuous action space we can quickly improve the value of all belief points in the set. We demonstrate the viability of our algorithm on two sets of experiments: one involving an active localization task and one concerning robot navigation in a perceptually aliased of fice environment. [less ▲] Detailed reference viewed: 57 (0 UL)Robot planning in partially observable continuous domains ; ; Vlassis, Nikos in Proc. Robotics: Science and Systems (2005) We present a value iteration algorithm for learning to act in Partially Observable Markov Decision Processes (POMDPs) with continuous state spaces. Mainstream POMDP research focuses on the discrete case ... [more ▼] We present a value iteration algorithm for learning to act in Partially Observable Markov Decision Processes (POMDPs) with continuous state spaces. Mainstream POMDP research focuses on the discrete case and this complicates its application to, e.g., robotic problems that are naturally modeled using continuous state spaces. The main difficulty in defining a (belief-based) POMDP in a continuous state space is that expected values over states must be defined using integrals that, in general, cannot be computed in closed from. In this paper, we first show that the optimal finite-horizon value function over the continuous infinite-dimensional POMDP belief space is piecewise linear and convex, and is defined by a finite set of supporting α-functions that are analogous to the α-vectors (hyperplanes) defining the value function of a discrete-state POMDP. Second, we show that, for a fairly general class of POMDP models in which all functions of interest are modeled by Gaussian mixtures, all belief updates and value iteration backups can be carried out analytically and exact. A crucial difference with respect to the α-vectors of the discrete case is that, in the continuous case, the α-functions will typically grow in complexity (e.g., in the number of components) in each value iteration. Finally, we demonstrate PERSEUS, our previously proposed randomized point-based value iteration algorithm, in a simple robot planning problem with a continuous domain, where encouraging results are observed. [less ▲] Detailed reference viewed: 102 (0 UL)Non-communicative multi-robot coordination in dynamic environments ; ; Vlassis, Nikos in Robotics & Autonomous Systems (2005), 50(2-3), 99-114 Within a group of cooperating agents the decision making of an individual agent depends on the actions of the other agents. In dynamic environments, these dependencies will change rapidly as a result of ... [more ▼] Within a group of cooperating agents the decision making of an individual agent depends on the actions of the other agents. In dynamic environments, these dependencies will change rapidly as a result of the continuously changing state. Via a context-specific decomposition of the problem into smaller subproblems, coordination graphs offer scalable solutions to the problem of multiagent decision making. In this work, we apply coordination graphs to a continuous (robotic) domain by assigning roles to the agents and then coordinating the different roles. Moreover, we demonstrate that, with some additional assumptions, an agent can predict the actions of the other agents, rendering communication superfluous. We have successfully implemented the proposed method into our UvA Trilearn simulated robot soccer team which won the RoboCup-2003 World Championship in Padova, Italy. (C) 2004 Elsevier B.V. All rights reserved. [less ▲] Detailed reference viewed: 54 (0 UL)Perseus: Randomized point-based value iteration for POMDPs ; Vlassis, Nikos in Journal of Artificial Intelligence Research (2005), 24 Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy ... [more ▼] Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. [less ▲] Detailed reference viewed: 72 (0 UL)Self-organizing mixture models ; Vlassis, Nikos ; in Neurocomputing (2005), 63 We present an expectation-maximization (EM) algorithm that yields topology preserving maps of data based on probabilistic mixture models. Our approach is applicable to any mixture model for which we have ... [more ▼] We present an expectation-maximization (EM) algorithm that yields topology preserving maps of data based on probabilistic mixture models. Our approach is applicable to any mixture model for which we have a normal EM algorithm. Compared to other mixture model approaches to self-organizing maps (SOMs), the function our algorithm maximizes has a clear interpretation: it sums data log-likelihood and a penalty term that enforces self-organization. Our approach allows principled handling of missing data and learning of mixtures of SOMs. We present example applications illustrating our approach for continuous, discrete, and mixed discrete and continuous data. (C) 2004 Elsevier B.V. All rights reserved. [less ▲] Detailed reference viewed: 66 (0 UL)Newscast EM ; Vlassis, Nikos in Advances in Neural Information Processing Systems 17 (2005) We propose a gossip-based distributed algorithm for Gaussian mixture learning, Newscast EM. The algorithm operates on network topologies where each node observes a local quantity and can communicate with ... [more ▼] We propose a gossip-based distributed algorithm for Gaussian mixture learning, Newscast EM. The algorithm operates on network topologies where each node observes a local quantity and can communicate with other nodes in an arbitrary point-to-point fashion. The main difference between Newscast EM and the standard EM algorithm is that the M-step in our case is implemented in a decentralized manner: (random) pairs of nodes repeatedly exchange their local parameter estimates and combine them by (weighted) averaging. We provide theoretical evidence and demonstrate experimentally that, under this protocol, nodes converge exponentially fast to the correct estimates in each M-step of the EM algorithm. [less ▲] Detailed reference viewed: 61 (0 UL)Utile coordination: Learning interdependencies among cooperative agents ; ; et al in EEE Symp. on Computational Intelligence and Games, Colchester, Essex (2005) We describe Utile Coordination, an algorithm that allows a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected ... [more ▼] We describe Utile Coordination, an algorithm that allows a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state representation because only necessary coordination is modeled. We apply our method within the framework of coordination graphs in which value rules represent the coordination dependencies between the agents for a specific context. The algorithm is first applied on a small illustrative problem, and next on a large predator-prey problem in which two predators have to capture a single prey. [less ▲] Detailed reference viewed: 151 (0 UL)Gossip-based greedy Gaussian mixture learning Vlassis, Nikos ; ; in Lecture Notes in Computer Science (2005) It has been recently demonstrated that the classical EM algorithm for learning Gaussian mixture models can be successfully implemented in a decentralized manner by resorting to gossip-based randomized ... [more ▼] It has been recently demonstrated that the classical EM algorithm for learning Gaussian mixture models can be successfully implemented in a decentralized manner by resorting to gossip-based randomized distributed protocols. In this paper we describe a gossip-based implementation of an alternative algorithm for learning Gaussian mixtures in which components are added to the mixture one after another. Our new Greedy Gossip-based Gaussian mixture learning algorithm uses gossip-based parallel search, starting from multiple initial guesses, for finding good components to add to the mixture in each component allocation step. It can be executed on massive networks of small computing devices, converging to a solution exponentially faster than its centralized version, while reaching the same quality of generated models. [less ▲] Detailed reference viewed: 72 (0 UL)A point-based POMDP algorithm for robot planning ; Vlassis, Nikos in Proc. IEEE Int. Conf. on Robotics and Automation, New Orleans, Louisiana (2004) We present an approximate POMDP solution method for robot planning in partially observable environments. Our algorithm belongs to the family of point-based value iteration solution techniques for POMDPs ... [more ▼] We present an approximate POMDP solution method for robot planning in partially observable environments. Our algorithm belongs to the family of point-based value iteration solution techniques for POMDPs, in which planning is performed only on a sampled set of reachable belief points. We describe a simple, randomized procedure that performs value update steps that strictly improve the value of all belief points in each step. We demonstrate our algorithm on a robotic delivery task in an office environment and on several benchmark problems, for which we compute solutions that are very competitive to those of state-ofthe -art methods in terms of speed and solution quality. [less ▲] Detailed reference viewed: 83 (0 UL)Anytime Algorithms for Multiagent Decision Making Using Coordination Graphs Vlassis, Nikos ; ; in Proceedings of the International Conference on Systems, Man and Cybernetics (2004) Coordinationgraphsprovideatractableframe- work for cooperative multiagent decision making by decom- posing the global payoff function into a sum of local terms. In this paper we review some distributed ... [more ▼] Coordinationgraphsprovideatractableframe- work for cooperative multiagent decision making by decom- posing the global payoff function into a sum of local terms. In this paper we review some distributed algorithms for ac- tion selection in a coordination graph and discuss their pros and cons. For real-time decision making we emphasize the need for anytime algorithms for action selection: these are algorithms that improve the quality of the solution over time. We describe variable elimination, coordinate ascent, and the max-plus algorithm, the latter being an instance of the be- lief propagation algorithm in Bayesian networks. We discuss some interesting open problems related to the use of the max- plus algorithm in real-time multiagent decision making. [less ▲] Detailed reference viewed: 70 (0 UL) |
||