[en] Unmanned aerial vehicles (UAVs), more commonly named drones, are one of the most versatile robotic platforms for their high mobility and low-cost design. Therefore, they have been applied to numerous civil applications. These robots generally can complete autonomous or semi-autonomous missions by undertaking complex calculations on their autopilot system based on the sensors' observations to control their attitude and speed and to plan and track a trajectory for navigating in a possibly unknown environment without human intervention. However, to enable higher degrees of autonomy, the perception system is paramount for extracting valuable knowledge that allows interaction with the external world.
Therefore, this thesis aims to solve the core perception challenges of an autonomous surveillance application carried out by an aerial robot in an outdoor urban environment. We address a simplified use case of patrolling missions to monitor a confined area around buildings that is supposedly under access restriction. Hence, we identify the main research questions involved in this application context. On the one hand, the drone has to locate itself in a controlled navigation environment, keep track of its pose while flying, and understand the geometrical structure of the 3D scene around it. On the other hand, the surveillance mission entails detecting and localising people in the monitored area. Consequently, we develop numerous methodologies to address these challenging questions. Furthermore, constraining the UAV's sensor array to a monocular RGB camera, we approach the raised problems with algorithms in the computer vision field.
First, we train a neural network with an unsupervised learning paradigm to predict the drone ego-motion and the geometrical scene structure. Hence, we introduce a novel algorithm that integrates a model-free epipolar method to adjust online the rotational drift of the trajectory estimated by the trained pose network. Second, we employ an efficient Convolutional Neural Network (CNN) architecture to regress the UAV global metric pose directly from a single colour image.
Moreover, we investigate how dynamic objects in the camera field of view affect the localisation performance of such an approach. Following, we discuss the implementation of an object detection network and derive the equations to find the 3D position of the detected people in a reconstructed environment. Next, we describe the theory behind structure-from-motion and use it to recreate a 3D model of a dataset recorded with a drone at the University of Luxembourg's Belval campus.
Ultimately, we perform multiple experiments to validate and evaluate our proposed algorithms with other state-of-the-art methodologies. Results show the superiority of our methods in different metrics. Also, in our analysis, we determine the limitations and highlight the benefits of the adopted strategies compared to other approaches. Finally, the introduced dataset provides an additional tool for benchmarking perception algorithms and future application developments.