[en] We present an approximate POMDP solution method for robot planning in partially observable environments. Our algorithm belongs to the family of point-based value iteration solution techniques for POMDPs, in which planning is performed only on a sampled set of reachable belief points. We describe a simple, randomized procedure that performs value update steps that strictly improve the value of all belief points in each step. We demonstrate our algorithm on a robotic delivery task in an office environment and on several benchmark problems, for which we compute solutions that are very competitive to those of state-ofthe -art methods in terms of speed and solution quality.
Disciplines :
Sciences informatiques
Identifiants :
UNILU:UL-ARTICLE-2011-732
Auteur, co-auteur :
Spaan, Matthijs T. J.
VLASSIS, Nikos ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
Langue du document :
Anglais
Titre :
A point-based POMDP algorithm for robot planning
Date de publication/diffusion :
2004
Nom de la manifestation :
IEEE Int. Conf. on Robotics and Automation, New Orleans, Louisiana
Date de la manifestation :
2004
Titre de l'ouvrage principal :
Proc. IEEE Int. Conf. on Robotics and Automation, New Orleans, Louisiana
J. Latombe, Robot Motion Planning. Kluwer Academic Publishers, 1991.
E. J. Sondik, "The optimal control of partially observable Markov decision processes," Ph.D. dissertation, Stanford University, 1971.
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, "Planning and acting in partially observable stochastic domains," Artificial Intelligence, vol. 101, pp. 99-134, 1998.
R. Simmons and S. Koenig, "Probabilistic robot navigation in partially observable environments," in Proceedings of the International Joint Conference on Artificial Intelligence, 1995, pp. 1080-1087.
A. R. Cassandra, L. P. Kaelbling, and J. A. Kurien, "Acting under uncertainty: Discrete bayesian models for mobile robot navigation," in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 1996.
G. Theocharous and S. Mahadevan, "Approximate planning with hierarchical partially observable Markov decision processes for robot navigation," in Proc. IEEE Int. Conf. on Robotics and Automation, Washington D.C., 2002.
C. H. Papadimitriou and J. N. Tsitsiklis, "The complexity of Markov decision processes," Mathematics of operations research, vol. 12, no. 3, pp. 441-450, 1987.
O. Madani, S. Hanks, and A. Condon, "On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems," in Proc. 16th National Conf. on Artificial Intelligence, Orlando, Florida, July 1999.
M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, "Learning policies for partially observable environments: Scaling up," in Proc. 12th Int. Conf. on Machine Learning, San Francisco, CA, 1995.
M. Hauskrecht, "Value function approximations for partially observable Markov decision processes," Journal of Artificial Intelligence Research, vol. 13, pp. 33-95, 2000.
K.-M. Poon, "A fast heuristic algorithm for decision-theoretic planning," Master's thesis, The Hong-Kong University of Science and Technology, 2001.
N. Roy and G. Gordon, "Exponential family PCA for belief compression in POMDPs," in Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press, 2003.
J. Pineau, G. Gordon, and S. Thrun, "Point-based value iteration: An anytime algorithm for POMDPs," in Proc. Int. Joint Conf. on Artificial Intelligence, Acapulco, Mexico, Aug. 2003.
N. Vlassis and M. T. J. Spaan, "A fast point-based algorithm for POMDPs," in Benelearn 2004: Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands, Brussels, Belgium, Jan. 2004, pp. 170-176, (Also presented at the NIPS 16 workshop 'Planning for the Real-world', Whistler, Canada, Dec 2003).
N. Vlassis and M. T. J. Spaan, "A fast point-based algorithm for POMDPs," in Benelearn 2004: Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands, Brussels, Belgium, Jan. 2004, pp. 170-176, (Also presented at the NIPS 16 workshop 'Planning for the Real-World', Whistler, Canada, Dec 2003).
S. Thrun, "Monte carlo POMDPs," in Advances in Neural Information Processing Systems 12, S. Solla, T. Leen, and K.-R. Müller, Eds. MIT Press, 2000, pp. 1064-1070.
_, "Probabilistic algorithms in robotics," AI Magazine, vol. 21, no. 4, pp. 93-109, Win 2000.
H. T. Cheng, "Algorithms for partially observable Markov decision processes," Ph.D. dissertation, University of British Columbia, 1988.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
A. Likas, N. Vlassis, and J. J. Verbeek, "The global k-means clustering algorithm," Pattern Recognition, vol. 36, no. 2, pp. 451-461, Feb. 2003.
N. Vlassis, B. Terwijn, and B. Kröse, "Auxiliary particle filter robot localization from high-dimensional sensor observations," in Proc. IEEE Int. Conf. on Robotics and Automation, Washington D.C., May 2002, pp. 7-12.
P. Poupart and C. Boutilier, "Bounded finite state controllers," in Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf, Eds., 2004.