[en] In this paper we apply deep reinforcement learning techniques on a multicopter for learning a stable hovering task in
a continuous action state environment. We present a framework based on OpenAI GYM, Gazebo and RotorS MAV simulator, utilized for successfully training different agents to perform various tasks. The deep reinforcement learning method used for the training is model-free, on-policy, actor-critic based algorithm called Trust Region Policy Optimization (TRPO). Two neural networks have been used as a nonlinear function approximators. Our experiments showed that such learning approach achieves successful results, and facilitates the process of controller design.