Browse ORBi

- What it is and what it isn't
- Green Road / Gold Road?
- Ready to Publish. Now What?
- How can I support the OA movement?
- Where can I learn more?

ORBi

Fastcore: An algorithm for fast reconstruction of context-specific metabolic network models Vlassis, Nikos ; Pacheco, Maria ; Sauter, Thomas in Proc. 8th BeNeLux Bioinformatics Conference (2013) Detailed reference viewed: 196 (34 UL)Stochastic POMDP controllers: How easy to optimize? Vlassis, Nikos ; ; Scientific Conference (2012) It was recently shown that computing an optimal stochastic controller in a discounted in finite-horizon partially observable Markov decision process is an NP-hard problem. The reduction (from the ... [more ▼] It was recently shown that computing an optimal stochastic controller in a discounted in finite-horizon partially observable Markov decision process is an NP-hard problem. The reduction (from the independent-set problem) involves designing an MDP with special state-action rewards. In this note, we show that the case of state-only-dependent rewards is also NP-hard. [less ▲] Detailed reference viewed: 60 (14 UL)Bayesian Reinforcement Learning Vlassis, Nikos ; ; et al in Wiering, Marco; van Otterlo, Martijn (Eds.) Reinforcement Learning: State of the Art (2012) This chapter surveys recent lines of work that use Bayesian techniques for reinforcement learning. In Bayesian learning, uncertainty is expressed by a prior distribution over unknown parameters and ... [more ▼] This chapter surveys recent lines of work that use Bayesian techniques for reinforcement learning. In Bayesian learning, uncertainty is expressed by a prior distribution over unknown parameters and learning is achieved by computing a posterior distribution based on the data observed. Hence, Bayesian reinforcement learning distinguishes itself from other forms of reinforcement learning by explic- itly maintaining a distribution over various quantities such as the parameters of the model, the value function, the policy or its gradient. This yields several benefits: a) domain knowledge can be naturally encoded in the prior distribution to speed up learning; b) the exploration/exploitation tradeoff can be naturally optimized; and c) notions of risk can be naturally taken into account to obtain robust policies. [less ▲] Detailed reference viewed: 118 (8 UL)From Systems Biology to Systems Biomedicine Antony, Paul ; Balling, Rudi ; Vlassis, Nikos in Current Opinion in Biotechnology (2012), 23(4), 604-8 Systems Biology is about combining theory, technology, and targeted experiments in a way that drives not only data accumulation but knowledge as well. The challenge in Systems Biomedicine is to ... [more ▼] Systems Biology is about combining theory, technology, and targeted experiments in a way that drives not only data accumulation but knowledge as well. The challenge in Systems Biomedicine is to furthermore translate mechanistic insights in biological systems to clinical application, with the central aim of improving patients’ quality of life. The challenge is to find theoretically well-chosen models for the contextually correct and intelligible representation of multiscale biological systems. In this review, we discuss the current state of Systems Biology, highlight the emergence of Systems Biomedicine, and highlight some of the topics and views that we think are important for the efficient application of Systems Theory in Biomedicine. [less ▲] Detailed reference viewed: 157 (16 UL)On the computational complexity of stochastic controller optimization in POMDPs Vlassis, Nikos ; ; in ACM Transactions on Computation Theory (2012), 4(4), 1-9 We show that the problem of finding an optimal stochastic 'blind' controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard, in PSPACE, and SQRT-SUM-hard ... [more ▼] We show that the problem of finding an optimal stochastic 'blind' controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard, in PSPACE, and SQRT-SUM-hard, hence placing it in NP would imply a breakthrough in long-standing open problems in computer science. Our optimization result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard. Nonetheless, we outline a special case that is is convex and admits efficient global solutions. [less ▲] Detailed reference viewed: 68 (8 UL)On the computational complexity of stochastic controller optimization in POMDPs Vlassis, Nikos ; ; E-print/Working paper (2011) We show that the problem of finding an optimal stochastic 'blind' controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard, in PSPACE, and SQRT-SUM-hard ... [more ▼] We show that the problem of finding an optimal stochastic 'blind' controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard, in PSPACE, and SQRT-SUM-hard, hence placing it in NP would imply a breakthrough in long-standing open problems in computer science. Our optimization result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard. Nonetheless, we outline a special case that is solvable to arbitrary accuracy in polynomial time via semidefinite or second-order cone programming. [less ▲] Detailed reference viewed: 70 (0 UL)3D visualization of archaeological uncertainty ; ; et al in Proc. ACM Symposium on Applied Perception (2010) By uncertainty, we define an archaeological expert's level of confidence in an interpretation deriving from gathered evidence. Archaeologists and computer scientists have urged caution in the use of 3D ... [more ▼] By uncertainty, we define an archaeological expert's level of confidence in an interpretation deriving from gathered evidence. Archaeologists and computer scientists have urged caution in the use of 3D for archaeological reconstructions because the availability of other possible hypotheses is not always being acknowledged. This poster presents a 3D visualization system of archaeological uncertainty. [less ▲] Detailed reference viewed: 76 (1 UL)Model-free reinforcement learning as mixture learning Vlassis, Nikos ; in Proceedings of the 26th International Conference on Machine Learning (2009) We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a ... [more ▼] We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a non-bootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging results. [less ▲] Detailed reference viewed: 63 (1 UL)Learning model-free robot control by a Monte Carlo EM algorithm Vlassis, Nikos ; ; et al in Autonomous Robots (2009), 27(2), 123-130 We address the problem of learning robot control by model-free reinforcement learning (RL). We adopt the probabilistic model for model-free RL of Vlassis and Toussaint (Proceedings of the international ... [more ▼] We address the problem of learning robot control by model-free reinforcement learning (RL). We adopt the probabilistic model for model-free RL of Vlassis and Toussaint (Proceedings of the international conference on machine learning, Montreal, Canada, 2009), and we propose a Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller parameters using information obtained from randomly generated robot trajectories. MCEM is related to, and generalizes, the PoWER algorithm of Kober and Peters (Proceedings of the neural information processing systems, 2009). In the finite-horizon case MCEM reduces precisely to PoWER, but MCEM can also handle the discounted infinite-horizon case. An interesting result is that the infinite-horizon case can be viewed as a 'randomized' version of the finite-horizon case, in the sense that the length of each sampled trajectory is a random draw from an appropriately constructed geometric distribution. We provide some preliminary experiments demonstrating the effects of fixed (PoWER) vs randomized (MCEM) horizon length in two simulated and one real robot control tasks. [less ▲] Detailed reference viewed: 82 (2 UL)Model-based Bayesian reinforcement learning in partially observable domains ; Vlassis, Nikos in Proc Int. Symp. on Artificial Intelligence and Mathematics, (2008) Detailed reference viewed: 38 (0 UL)Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs ; ; et al in Proceedings of 19th European Conference on Machine Learning (2008) Since traffic jams are ubiquitous in the modern world, optimizing, the behavior of traffic lights for efficient traffic flow is a critically important goal. Though most current traffic lights use simple ... [more ▼] Since traffic jams are ubiquitous in the modern world, optimizing, the behavior of traffic lights for efficient traffic flow is a critically important goal. Though most current traffic lights use simple heuristic protocols, more efficient controllers can be discovered automatically via multiagent reinforcement learning where each agent controls a single traffic light. However, in previous work on this approach, agents select only locally optimal actions without coordinating their behavior. This paper extends this approach to include explicit coordination between neighboring traffic lights. Coordination is achieved using the max-plus algorithm, which estimates the optimal joint action by sending locally optimized messages among connected agents. This paper presents the first application of max-plus to a large-scale problem and thus verifies its efficacy in realistic settings. It also provides empirical evidence that max-plus performs well on cyclic graphs, though it has been proven to converge only for tree-structured graphs. Furthermore, it provides a new understanding of the properties a traffic network must have for such coordination to be beneficial and shows that max-plus outperforms previous methods on networks that possess those properties. [less ▲] Detailed reference viewed: 80 (0 UL)Multiagent Planning under Uncertainty with Stochastic Communication Delays ; ; Vlassis, Nikos in 338 Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling (ICAPS 2008) (2008) We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal ... [more ▼] We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take actions, agents synchronize their knowledge of the environment, and the planning problem reduces to a centralized POMDP. As such, relying on communication significantly reduces the complexity of planning. In the real world however, such communication might fail temporarily. We present a step towards more realistic communication models for Dec-POMDPs by proposing a model that: (1) allows that communication might be delayed by one or more time steps, and (2) explicitly considers future probabilities of successful communication. For our model, we discuss how to efficiently compute an (approximate) value function and corresponding policies, and we demonstrate our theoretical results with encouraging experiments. [less ▲] Detailed reference viewed: 40 (0 UL)The cross-entropy method for policy search in decentralized POMDPs ; ; Vlassis, Nikos in Informatica (2008), 32(4), 341-357 Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial ... [more ▼] Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs. [less ▲] Detailed reference viewed: 71 (2 UL)Exploiting locality of interaction in factored Dec-POMDPs ; ; Vlassis, Nikos et al in Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems (2008) Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We ... [more ▼] Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents in a factored representation. Factored Dec-POMDP representations have been proposed before, but only for Dec-POMDPs whose transition and observation models are fully independent. Such strong assumptions simplify the planning problem, but result in models with limited applicability. By contrast, we consider general factored Dec-POMDPs for which we analyze the model dependencies over space (locality of interaction) and time (horizon of the problem). We also present a formulation of decomposable value functions. Together, our results allow us to exploit the problem structure as well as heuristics in a single framework that is based on collaborative graphical Bayesian games (CGBGs). A preliminary experiment shows a speedup of two orders of magnitude. [less ▲] Detailed reference viewed: 21 (0 UL)Optimal and approximate Q-value functions for decentralized POMDPs ; ; Vlassis, Nikos in Journal of Artificial Intelligence Research (2008), 32 Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and ... [more ▼] Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. [less ▲] Detailed reference viewed: 60 (0 UL)Distributed Decision Making for Robot Teams Vlassis, Nikos in Proc. 1st Int. Symp. on Intelligent and Distributed Computing (2007) Detailed reference viewed: 49 (0 UL)Q-value Heuristics for Approximate Solutions of Dec-POMDPs ; Vlassis, Nikos in Proc. AAAI Spring Symp. on Game Theoretic and Decision Theoretic Agents (2007) Detailed reference viewed: 30 (0 UL)A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence Vlassis, Nikos Book published by Morgan & Claypool (2007) Multiagent systems is an expanding field that blends classical fields like game theory and decentralized control with modern fields like computer science and machine learning. This monograph provides a ... [more ▼] Multiagent systems is an expanding field that blends classical fields like game theory and decentralized control with modern fields like computer science and machine learning. This monograph provides a concise introduction to the subject, covering the theoretical foundations as well as more recent developments in a coherent and readable manner. The text is centered on the concept of an agent as decision maker. Chapter 1 is a short introduction to the field of multiagent systems. Chapter 2 covers the basic theory of singleagent decision making under uncertainty. Chapter 3 is a brief introduction to game theory, explaining classical concepts like Nash equilibrium. Chapter 4 deals with the fundamental problem of coordinating a team of collaborative agents. Chapter 5 studies the problem of multiagent reasoning and decision making under partial observability. Chapter 6 focuses on the design of protocols that are stable against manipulations by self-interested agents. Chapter 7 provides a short introduction to the rapidly expanding field of multiagent reinforcement learning. The material can be used for teaching a half-semester course on multiagent systems covering, roughly, one chapter per lecture. [less ▲] Detailed reference viewed: 106 (0 UL)A spatially constrained generative model and an EM algorithm for image segmentation ; Vlassis, Nikos ; in IEEE Transactions on Neural Networks (2007), 18(3), 798-808 In this paper, we present a novel spatially constrained generative model and an expectation-maximization (EM) algorithm for model-based image segmentation. The generative model assumes that the unobserved ... [more ▼] In this paper, we present a novel spatially constrained generative model and an expectation-maximization (EM) algorithm for model-based image segmentation. The generative model assumes that the unobserved class labels of neighboring pixels in the image are generated by prior distributions with similar parameters, where similarity is defined by entropic quantities relating to the neighboring priors. In order to estimate model pa rameters from observations, we derive a spatially constrained EM algorithm that iteratively maximizes a lower bound on the data log-likelihood, where the penalty term is data-dependent. Our algorithm is very easy to implement and is similar to the standard EM algorithm for Gaussian mixtures with the main difference that the labels posteriors are "smoothed" over pixels between each E-and M-step by a standard image filter. Experiments on synthetic and real images show that our algorithm achieves competitive segmentation results compared to other Markov-based methods, and is in general faster. [less ▲] Detailed reference viewed: 71 (2 UL)Q-value functions for decentralized POMDPs ; Vlassis, Nikos in Proc Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems (2007) Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and ... [more ▼] Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and then a policy is extracted from this value function. In this paper we study whether similar Q-value functions can be defined in decentralized POMDP models (Dec-POMDPs), what the cost of computing such value functions is, and how policies can be extracted from such value functions. Using the framework of Bayesian games, we argue that searching for the optimal Q-value function may be as costly as exhaustive policy search. Then we analyze various approximate Q-value functions that allow efficient computation. Finally, we describe a family of algorithms for extracting policies from such Q-value functions. [less ▲] Detailed reference viewed: 58 (0 UL) |
||