![]() Cazzato, Dario ![]() ![]() ![]() in Journal of Imaging (2020), 6(8), 78 The spread of Unmanned Aerial Vehicles (UAVs) in the last decade revolutionized many applications fields. Most investigated research topics focus on increasing autonomy during operational campaigns ... [more ▼] The spread of Unmanned Aerial Vehicles (UAVs) in the last decade revolutionized many applications fields. Most investigated research topics focus on increasing autonomy during operational campaigns, environmental monitoring, surveillance, maps, and labeling. To achieve such complex goals, a high-level module is exploited to build semantic knowledge leveraging the outputs of the low-level module that takes data acquired from multiple sensors and extracts information concerning what is sensed. All in all, the detection of the objects is undoubtedly the most important low-level task, and the most employed sensors to accomplish it are by far RGB cameras due to costs, dimensions, and the wide literature on RGB-based object detection. This survey presents recent advancements in 2D object detection for the case of UAVs, focusing on the differences, strategies, and trade-offs between the generic problem of object detection, and the adaptation of such solutions for operations of the UAV. Moreover, a new taxonomy that considers different heights intervals and driven by the methodological approaches introduced by the works in the state of the art instead of hardware, physical and/or technological constraints is proposed. [less ▲] Detailed reference viewed: 146 (7 UL)![]() Cazzato, Dario ![]() in Sensors (2020), 20(13), 3739 The automatic detection of eye positions, their temporal consistency, and their mapping into a line of sight in the real world (to find where a person is looking at) is reported in the scientific ... [more ▼] The automatic detection of eye positions, their temporal consistency, and their mapping into a line of sight in the real world (to find where a person is looking at) is reported in the scientific literature as gaze tracking. This has become a very hot topic in the field of computer vision during the last decades, with a surprising and continuously growing number of application fields. A very long journey has been made from the first pioneering works, and this continuous search for more accurate solutions process has been further boosted in the last decade when deep neural networks have revolutionized the whole machine learning area, and gaze tracking as well. In this arena, it is being increasingly useful to find guidance through survey/review articles collecting most relevant works and putting clear pros and cons of existing techniques, also by introducing a precise taxonomy. This kind of manuscripts allows researchers and technicians to choose the better way to move towards their application or scientific goals. In the literature, there exist holistic and specifically technological survey documents (even if not updated), but, unfortunately, there is not an overview discussing how the great advancements in computer vision have impacted gaze tracking. Thus, this work represents an attempt to fill this gap, also introducing a wider point of view that brings to a new taxonomy (extending the consolidated ones) by considering gaze tracking as a more exhaustive task that aims at estimating gaze target from different perspectives: from the eye of the beholder (first-person view), from an external camera framing the beholder’s, from a third-person view looking at the scene where the beholder is placed in, and from an external view independent from the beholder. [less ▲] Detailed reference viewed: 83 (5 UL)![]() Cimarelli, Claudio ![]() ![]() ![]() in 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2019, November 25) Robot self-localization is essential for operating autonomously in open environments. When cameras are the main source of information for retrieving the pose, numerous challenges are posed by the presence ... [more ▼] Robot self-localization is essential for operating autonomously in open environments. When cameras are the main source of information for retrieving the pose, numerous challenges are posed by the presence of dynamic objects, due to occlusion and continuous changes in the appearance. Recent research on global localization methods focused on using a single (or multiple) Convolutional Neural Network (CNN) to estimate the 6 Degrees of Freedom (6-DoF) pose directly from a monocular camera image. In contrast with the classical approaches using engineered feature detector, CNNs are usually more robust to environmental changes in light and to occlusions in outdoor scenarios. This paper contains an attempt to empirically demonstrate the ability of CNNs to ignore dynamic elements, such as pedestrians or cars, through learning. For this purpose, we pre-process a dataset for pose localization with an object segmentation network, masking potentially moving objects. Hence, we compare the pose regression CNN trained and/or tested on the set of masked images and the original one. Experimental results show that the performances of the two training approaches are similar, with a slight reduction of the error when hiding occluding objects from the views. [less ▲] Detailed reference viewed: 82 (10 UL)![]() Cazzato, Dario ![]() ![]() ![]() in IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society (2019, October) The reliability of aircraft inspection is of paramountimportance to safety of flights. Continuing airworthiness of air-craft structures is largely based upon the visual detection of smalldefects made by ... [more ▼] The reliability of aircraft inspection is of paramountimportance to safety of flights. Continuing airworthiness of air-craft structures is largely based upon the visual detection of smalldefects made by trained inspection personnel with expensive,critical and time consuming tasks. At this aim, Unmanned AerialVehicles (UAVs) can be used for autonomous inspections, aslong as it is possible to localize the target while flying aroundit and correct the position. This work proposes a solution todetect the airplane pose with regards to the UAVs position whileflying autonomously around the airframe at close range forvisual inspection tasks. The system works by processing imagescoming from an RGB camera mounted on board, comparingincoming frames with a database of natural landmarks whoseposition on the airframe surface is known. The solution has beentested in real UAV flight scenarios, showing its effectiveness inlocalizing the pose with high precision. The advantages of theproposed methods are of industrial interest since we remove manyconstraint that are present in the state of the art solutions. [less ▲] Detailed reference viewed: 119 (8 UL)![]() Cimarelli, Claudio ![]() ![]() ![]() in International Conference on Computer Analysis of Images and Patterns (2019, August 22) Precise and robust localization is of fundamental importance for robots required to carry out autonomous tasks. Above all, in the case of Unmanned Aerial Vehicles (UAVs), efficiency and reliability are ... [more ▼] Precise and robust localization is of fundamental importance for robots required to carry out autonomous tasks. Above all, in the case of Unmanned Aerial Vehicles (UAVs), efficiency and reliability are critical aspects in developing solutions for localization due to the limited computational capabilities, payload and power constraints. In this work, we leverage novel research in efficient deep neural architectures for the problem of 6 Degrees of Freedom (6-DoF) pose estimation from single RGB camera images. In particular, we introduce an efficient neural network to jointly regress the position and orientation of the camera with respect to the navigation environment. Experimental results show that the proposed network is capable of retaining similar results with respect to the most popular state of the art methods while being smaller and with lower latency, which are fundamental aspects for real-time robotics applications. [less ▲] Detailed reference viewed: 131 (7 UL)![]() Cazzato, Dario ![]() ![]() ![]() in Proceedings of the 2019 3rd International Conference on Artificial Intelligence and Virtual Reality (2019, July) The ability of the robots to imitate human movements has been an active research study since the dawn of the robotics. Obtaining a realistic imitation is essential in terms of perceived quality in human ... [more ▼] The ability of the robots to imitate human movements has been an active research study since the dawn of the robotics. Obtaining a realistic imitation is essential in terms of perceived quality in human-robot interaction, but it is still a challenge due to the lack of effective mapping between human movements and the degrees of freedom of robotics systems. If high-level programming interfaces, software and simulation tools simplified robot programming, there is still a strong gap between robot control and natural user interfaces. In this paper, a system to reproduce on a robot the head movements of a user in the field of view of a consumer camera is presented. The system recognizes the presence of a user and its head pose in real-time by using a deep neural network, in order to extract head position angles and to command the robot head movements consequently, obtaining a realistic imitation. At the same time, the system represents a natural user interface to control the Aldebaran NAO and Pepper humanoid robots with the head movements, with applications in human-robot interaction. [less ▲] Detailed reference viewed: 200 (18 UL)![]() Sanchez Lopez, Jose Luis ![]() ![]() in 2019 International Conference on Unmanned Aircraft Systems (ICUAS) (2019, June) In this work, we present a semantic situation awareness system for multirotor aerial robots, based on 2D LIDAR measurements, targeting the understanding of the environment and assuming to have a precise ... [more ▼] In this work, we present a semantic situation awareness system for multirotor aerial robots, based on 2D LIDAR measurements, targeting the understanding of the environment and assuming to have a precise robot localization as an input of our algorithm. Our proposed situation awareness system calculates a semantic map of the objects of the environment as a list of circles represented by their radius, and the position and the velocity of their center in world coordinates. Our proposed algorithm includes three main parts. First, the LIDAR measurements are preprocessed and an object segmentation clusters the candidate objects present in the environment. Secondly, a Convolutional Neural Network (CNN) that has been designed and trained using an artificially generated dataset, computes the radius and the position of the center of individual circles in sensor coordinates. Finally, an indirect-EKF provides the estimate of the semantic map in world coordinates, including the velocity of the center of the circles in world coordinates.We have quantitative and qualitative evaluated the performance of our proposed situation awareness system by means of Software-In-The-Loop simulations using VRep with one and multiple static and moving cylindrical objects in the scene, obtaining results that support our proposed algorithm. In addition, we have demonstrated that our proposed algorithm is capable of handling real environments thanks to real laboratory experiments with non-cylindrical static (i.e. a barrel) and moving (i.e. a person) objects. [less ▲] Detailed reference viewed: 120 (5 UL)![]() ; ; Cazzato, Dario ![]() in Signal Processing (2019), 154 The apparition of low-cost depth cameras has lead to the development of several reconstruction methods that work well with rigid objects, but tend to fail when used to manually scan a standing person ... [more ▼] The apparition of low-cost depth cameras has lead to the development of several reconstruction methods that work well with rigid objects, but tend to fail when used to manually scan a standing person. Specific methods for body scanning have been proposed, but they have some ad-hoc requirements that make them unsuitable in a wide range of applications: they either require rotation platforms, multiple sensors and a priori template model. Scanning a person with a hand-held low-cost depth camera is still a challenging unsolved problem. This work proposes a novel solution to easily scan standing persons by combining depth information with fiducial markers without using a template model. In our approach, a set of markers placed in the ground are used to improve camera tracking by a novel algorithm that fuses depth information with the known location of the markers. The proposed method analyzes the video sequence and automatically divides it into fragments that are employed to build partial overlapping scans of the subject. Then, a registration step (both rigid and non-rigid) is applied to create a final mesh of the scanned subject. The proposed method has been compared with the state-of-the-art KinectFusion [1], ElasticFusion [2], ORB-SLAM [3, 4], and BundleFusion [5] methods, exhibiting superior performance. [less ▲] Detailed reference viewed: 131 (3 UL)![]() Cazzato, Dario ![]() in AIVR 2019: Proceedings of the 2019 3rd International Conference on Artificial Intelligence and Virtual Reality (2019) Detailed reference viewed: 105 (10 UL)![]() Cazzato, Dario ![]() in Proceedings of the IEEE International Conference on Computer Vision Workshops (2019) Detailed reference viewed: 68 (1 UL)![]() Cazzato, Dario ![]() in Proceedings of the 2nd International Conference on Applications of Intelligent Systems (2019) Detailed reference viewed: 72 (2 UL)![]() ; Cazzato, Dario ![]() in Open Computer Science (2019), 9(1), 145--159 Detailed reference viewed: 67 (1 UL)![]() Cazzato, Dario ![]() in Paladyn. Journal of Behavioral Robotics (2018) Automatic gaze estimation not based on commercial and expensive eye tracking hardware solutions can enable several applications in the fields of human computer interaction (HCI) and human behavior ... [more ▼] Automatic gaze estimation not based on commercial and expensive eye tracking hardware solutions can enable several applications in the fields of human computer interaction (HCI) and human behavior analysis. It is therefore not surprising that several related techniques and methods have been investigated in recent years. However, very few camera-based systems proposed in the literature are both real-time and robust. In this work, we propose a real-time user-calibration-free gaze estimation system that does not need person-dependent calibration, can deal with illumination changes and head pose variations, and can work with a wide range of distances from the camera. Our solution is based on a 3-D appearance-based method that processes the images from a built-in laptop camera. Real-time performance is obtained by combining head pose information with geometrical eye features to train a machine learning algorithm. Our method has been validated on a data set of images of users in natural environments, and shows promising results. The possibility of a real-time implementation, combined with the good quality of gaze tracking, make this system suitable for various HCI applications. [less ▲] Detailed reference viewed: 90 (1 UL)![]() Cazzato, Dario ![]() in Journal of Imaging (2018) Recent improvements in the field of assistive technologies have led to innovative solutions aiming at increasing the capabilities of people with disability, helping them in daily activities with ... [more ▼] Recent improvements in the field of assistive technologies have led to innovative solutions aiming at increasing the capabilities of people with disability, helping them in daily activities with applications that span from cognitive impairments to developmental disabilities. In particular, in the case of Autism Spectrum Disorder (ASD), the need to obtain active feedback in order to extract subsequently meaningful data becomes of fundamental importance. In this work, a study about the possibility of understanding the visual exploration in children with ASD is presented. In order to obtain an automatic evaluation, an algorithm for free (i.e., without constraints, nor using additional hardware, infrared (IR) light sources or other intrusive methods) gaze estimation is employed. Furthermore, no initial calibration is required. It allows the user to freely rotate the head in the field of view of the sensor, and it is insensitive to the presence of eyeglasses, hats or particular hairstyles. These relaxations of the constraints make this technique particularly suitable to be used in the critical context of autism, where the child is certainly not inclined to employ invasive devices, nor to collaborate during calibration procedures.The evaluation of children’s gaze trajectories through the proposed solution is presented for the purpose of an Early Start Denver Model (ESDM) program built on the child’s spontaneous interests and game choice delivered in a natural setting. [less ▲] Detailed reference viewed: 98 (1 UL) |
||