[en] We present a simple randomized POMDP al gorithm for planning with continuous actions in partially observable environments. Our algorithm operates on a set of reachable belief points, sampled by letting the robot interact randomly with the environment. We perform value iteration steps, ensuring that in each step the value of all sampled belief points is improved. The idea here is that by sampling actions from a continuous action space we can quickly improve the value of all belief points in the set. We demonstrate the viability of our algorithm on two sets of experiments: one involving an active localization task and one concerning robot navigation in a perceptually aliased of fice environment.
Disciplines :
Sciences informatiques
Identifiants :
UNILU:UL-ARTICLE-2011-724
Auteur, co-auteur :
Spaan, M. T. J.
VLASSIS, Nikos ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
Langue du document :
Anglais
Titre :
Planning with Continuous Actions in Partially Observable Environments
J. Latombe, Robot Motion Planning. Kluwer Academic Publishers, 1991.
L. P. Kaelbling, M. L. Litttnan, and A. R. Cassandra, "Planning and acting in partially observable stochastic domains," Artificial Intelligence, vol. 101, pp. 99-134, 1998.
C. H. Papadimitriou and J. N. Tsitsiklis, 'The complexity of Markov decision processes," Mathematics of operations research, vol. 12, no. 3, pp. 441-450, 1987.
O. Madani, S. Hanks, and A. Condon, "On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems," in Proc. 16th National Conf. on Artificial Intelligence, Orlando, Florida, July 1999.
W. S. Lovejoy, "Computationally feasible bounds for partially observed Markov decision processes," Operations Research, vol. 39, no. 1, pp. 162-175, 1991.
M. Hauskrecht, "Value function approximations for partially observable Markov decision processes," Journal of Artificial Intelligence Research, vol. 13, pp. 33-95, 2000.
K.-M. Poon, "A fast heuristic algorithm for decision-theoretic planning," Master's thesis, The Hong-Kong University of Science and Technology, 2001.
N. Roy and G. Gordon, "Exponential family PCA for belief compression in POMDPs," in Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press, 2003.
J. Pineau, G. Gordon, and S. Thrun, "Point-based value iteration: An anytime algorithm for POMDPs," in Proc. Int. Joint Conf. on Artificial Intelligence, Acapulco, Mexico, Aug. 2003.
M. T. J. Spaan and N. Vlassis, "A point-based POMDP algorithm for robot planning," in Proceedings of the IEEE International Conference on Robotics and Automation, New Orleans, Louisiana, 2004, pp. 2399-2404.
_, "Perseus: randomized point-based value iteration for POMDPs," Informatics Institute, University of Amsterdam, Tech. Rep. IAS-UVA-04-02, Nov. 2004.
S. Thrun, "Monte Carlo POMDPs," in Advances in Neural Information Processing Systems 12, S. Solla, T. Leen, and K.-R. Müller, Eds. MIT Press, 2000, pp. 1064-1070.
A. Y. Ng and M. Jordan, "PEGASUS: A policy search method for large MDPs and POMDPs," in Proc. of Uncertainty in Artificial Intelligence, 2000.
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
E. J. Sondik, 'The optimal control of partially observable Markov decision processes," Ph.D. dissertation, Stanford University, 1971.
N. L. Zhang and W. Zhang, "Speeding up the convergence of value iteration in partially observable Markov decision processes," Journal of Artificial Intelligence Research, vol. 14, pp. 29-51, 2001.