[en] Partially observable Markov decision process ; stochastic controller ; bilinear program ; computational complexity ; Motzkin-Straus theorem ; sum-of-square-roots problem ; matrix fractional program ; nonlinear optimization

[en] We show that the problem of finding an optimal stochastic 'blind' controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard, in PSPACE, and SQRT-SUM-hard, hence placing it in NP would imply a breakthrough in long-standing open problems in computer science. Our optimization result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard. Nonetheless, we outline a special case that is is convex and admits efficient global solutions.

Luxembourg Centre for Systems Biomedicine (LCSB): Machine Learning (Vlassis Group)