Aberdeen, D., and Baxter, J. 2002. Scaling internalstate policy-gradient methods for POMDPs. In ICML, 3-10.
Castro, P. S., and Precup, D. 2007. Using linear programming for Bayesian exploration in Markov decision processes. In IJCAI, 2437-2442.
Dearden, R.; Friedman, N.; and Andre, D. 1999. Model based Bayesian exploration. In UAI, 150-159.
Duff, M. 2002. Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. Ph.D. Dissertation, University of Massassachusetts Amherst.
Heckerman, D. 1999. A tutorial on learning with bayesian networks. In Jordan, M., ed., Learning in Graphical Models. Cambridge, MA: MIT Press.
Hohl, N., and Stone, P. 2004. Policy gradient reinforcement learning for fast quadrupedal locomotion. In ICRA.
Jaulmes, R.; Pineau, J.; and Precup, D. 2005. Active learning in partially observable Markov decision processes. In ECML, 601-608.
Kearns, M.; Mansour, Y.; and Ng, A. 2002. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning 49:193-208. (Pubitemid 34325686)
Meuleau, N.; Peshkin, L.; Kim, K.-E.; and Kaelbling, L. P. 1999. Learning finite-state controllers for partially observable environments. In UAI, 427-436.
Ng, A. Y., and Jordan, M. I. 2000. PEGASUS: a policy search method for large MDPs and POMDPs. In UAI, 406-415.
Ng, A.; Kim, H. J.; Jordan, M.; and Sastry, S. 2003. Autonomous helicopter flight via reinforcement learning. In NIPS.
Ng, A.; Parr, R.; and Koller, D. 2000. Policy search via density estimation. In NIPS, 1022-1028.
Porta, J. M.; Vlassis, N. A.; Spaaan, M. T. J.; and Poupart, P. 2006. Point-based value iteration for continuous pomdps. Journal of Machine Learning Research 7:2329-2367. (Pubitemid 44708007)
Poupart, P.; Vlassis, N.; Hoey, J.; and Regan, K. 2006. An analytic solution to discrete Bayesian reinforcement learning. In ICML, 697-704.
Smallwood, R. D., and Sondik, E. J. 1973. The optimal control of partially observable Markov processes over a finite horizon. Operations Research 21:1071-1088.
Strens, M. 2000. A Bayesian framework for reinforcement learning. In ICML, 943-950.
Tesauro, G. J. 1995. Temporal difference learning and TD-Gammon. Communications of the ACM 38:58-68.
Wang, T.; Lizotte, D.; Bowling, M.; and Schuurmans, D. 2005. Bayesian sparse sampling for on-line reward optimization. In ICML, 956-963.