References of "Palm, G"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailRobust Exploration/Exploitation Trade-Offs in Safety-Critical Applications
Tokic, M.; Ertle, P.; Palm, G. et al

in 8th IFAC Int. Symposium on Fault Detection, Supervision and Safety for Technical Processes, Mexico City 29-31 August 2012 (2012)

With regard to future service robots, unsafe exceptional circumstances can occur in complex systems that are hardly to foresee. In this paper, the assumption of having no knowledge about the environment ... [more ▼]

With regard to future service robots, unsafe exceptional circumstances can occur in complex systems that are hardly to foresee. In this paper, the assumption of having no knowledge about the environment is investigated using reinforcement learning as an option for learning behavior by trial-and-error. In such a scenario, action-selection decisions are made based on future reward predictions for minimizing costs in reaching a goal. It is shown that the selection of safetycritical actions leading to highly negative costs from the environment is directly related to the exploration/exploitation dilemma in temporal-di erence learning. For this, several exploration policies are investigated with regard to worst- and best-case performance in a dynamic environment. Our results show that in contrast to established exploration policies like epsilon-Greedy and Softmax, the recently proposed VDBE-Softmax policy seems to be more appropriate for such applications due to its robustness of the exploration parameter for unexpected situations. [less ▲]

Detailed reference viewed: 55 (1 UL)