![]() Mohamed Ali, Mohamed Adel ![]() ![]() ![]() in IEEE Conference on Computer Vision and Pattern Recognition. (2022) Pose estimation enables vision-based systems to refer to their environment, supporting activities ranging from scene navigation to object manipulation. However, end-to-end approaches, that have achieved ... [more ▼] Pose estimation enables vision-based systems to refer to their environment, supporting activities ranging from scene navigation to object manipulation. However, end-to-end approaches, that have achieved state-of-the-art performance in many perception tasks, are still unable to compete with 3D geometry-based methods in pose estimation. Indeed, absolute pose regression has been proven to be more related to image retrieval than to 3D structure. Our assumption is that statistical features learned by classical convolutional neural networks do not carry enough geometrical information for reliably solving this task. This paper studies the use of deep equivariant features for end-to-end pose regression. We further propose a translation and rotation equivariant Convolutional Neural Network whose architecture directly induces representations of camera motions into the feature space. In the context of absolute pose regression, this geometric property allows for implicitly augmenting the training data under a whole group of image plane-preserving transformations. Therefore, directly learning equivariant features efficiently compensates for learning intermediate representations that are indirectly equivariant yet data-intensive. Extensive experimental validation demonstrates that our lightweight model outperforms existing ones on standard datasets. [less ▲] Detailed reference viewed: 159 (6 UL)![]() Mohamed Ali, Mohamed Adel ![]() ![]() ![]() in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (2021, October) Detailed reference viewed: 86 (24 UL)![]() Oyedotun, Oyebade ![]() ![]() ![]() in Neurocomputing (2021) Detailed reference viewed: 179 (10 UL)![]() Garcia Sanchez, Albert ![]() ![]() ![]() in Proceedings of Conference on Computer Vision and Pattern Recognition Workshops (2021, June) Being capable of estimating the pose of uncooperative objects in space has been proposed as a key asset for enabling safe close-proximity operations such as space rendezvous, in-orbit servicing and active ... [more ▼] Being capable of estimating the pose of uncooperative objects in space has been proposed as a key asset for enabling safe close-proximity operations such as space rendezvous, in-orbit servicing and active debris removal. Usual approaches for pose estimation involve classical computer vision-based solutions or the application of Deep Learning (DL) techniques. This work explores a novel DL-based methodology, using Convolutional Neural Networks (CNNs), for estimating the pose of uncooperative spacecrafts. Contrary to other approaches, the proposed CNN directly regresses poses without needing any prior 3D information. Moreover, bounding boxes of the spacecraft in the image are predicted in a simple, yet efficient manner. The performed experiments show how this work competes with the state-of-the-art in uncooperative spacecraft pose estimation, including works which require 3D information as well as works which predict bounding boxes through sophisticated CNNs. [less ▲] Detailed reference viewed: 269 (35 UL)![]() Mohamed Ali, Mohamed Adel ![]() ![]() ![]() in 2021 IEEE International Conference on Image Processing (ICIP) (2021) Detailed reference viewed: 103 (20 UL)![]() ; Mohamed Ali, Mohamed Adel ![]() ![]() in European Conference on Space Debris (2021), 8(1), Detailed reference viewed: 141 (14 UL)![]() Baptista, Renato ![]() ![]() ![]() in International Conference on Pattern Recognition (ICPR) Workshop on 3D Human Understanding, Milan 10-15 January 2021 (2020) In this paper, we propose 3DBodyTex.Pose, a dataset that addresses the task of 3D human pose estimation in-the-wild. Generalization to in-the-wild images remains limited due to the lack of adequate ... [more ▼] In this paper, we propose 3DBodyTex.Pose, a dataset that addresses the task of 3D human pose estimation in-the-wild. Generalization to in-the-wild images remains limited due to the lack of adequate datasets. Existent ones are usually collected in indoor controlled environments where motion capture systems are used to obtain the 3D ground-truth annotations of humans. 3DBodyTex.Pose offers high quality and rich data containing 405 different real subjects in various clothing and poses, and 81k image samples with ground-truth 2D and 3D pose annotations. These images are generated from 200 viewpoints among which 70 challenging extreme viewpoints. This data was created starting from high resolution textured 3D body scans and by incorporating various realistic backgrounds. Retraining a state-of-the-art 3D pose estimation approach using data augmented with 3DBodyTex.Pose showed promising improvement in the overall performance, and a sensible decrease in the per joint position error when testing on challenging viewpoints. The 3DBodyTex.Pose is expected to offer the research community with new possibilities for generalizing 3D pose estimation from monocular in-the-wild images. [less ▲] Detailed reference viewed: 183 (16 UL)![]() ; Baptista, Renato ![]() ![]() in 6th Annual Conf. on Computational Science & Computational Intelligence, Las Vegas 5-7 December 2019 (2019, December) This work presents a new view-invariant action recognition system that is able to classify human actions by using a single RGB camera, including challenging camera viewpoints. Understanding actions from ... [more ▼] This work presents a new view-invariant action recognition system that is able to classify human actions by using a single RGB camera, including challenging camera viewpoints. Understanding actions from different viewpoints remains an extremely challenging problem, due to depth ambiguities, occlusion, and a large variety of appearances and scenes. Moreover, using only the information from the 2D perspective gives different interpretations for the same action seen from different viewpoints. Our system operates in two subsequent stages. The first stage estimates the 2D human pose using a convolution neural network. In the next stage, the 2D human poses are lifted to 3D human poses, using a temporal convolution neural network that enforces the temporal coherence over the estimated 3D poses. The estimated 3D poses from different viewpoints are then aligned to the same camera reference frame. Finally, we propose to use a temporal convolution network-based classifier for cross-view action recognition. Our results show that we can achieve state of art view-invariant action recognition accuracy even for the challenging viewpoints by only using RGB videos, without pre-training on synthetic or motion capture data. [less ▲] Detailed reference viewed: 304 (12 UL)![]() Al Ismaeil, Kassem ![]() Doctoral thesis (2015) Sensing using 3D technologies has seen a revolution in the past years where cost-effective depth sensors are today part of accessible consumer electronics. Their ability in directly capturing depth videos ... [more ▼] Sensing using 3D technologies has seen a revolution in the past years where cost-effective depth sensors are today part of accessible consumer electronics. Their ability in directly capturing depth videos in real-time has opened tremendous possibilities for multiple applications in computer vision. These sensors, however, have major shortcomings due to their high noise contamination, including missing and jagged measurements, and their low spatial resolutions. In order to extract detailed 3D features from this type of data, a dedicated data enhancement is required. We propose a generic depth multi-frame super-resolution framework that addresses the limitations of state-of-the-art depth enhancement approaches. The proposed framework does not need any additional hardware or coupling with di erent modalities. It is based on a new data model that uses densely upsampled low resolution observations. This results in a robust median initial estimation, further refined by a deblurring operation using a bilateral total variation as the regularization term. The upsampling operation ensures a systematic improvement in the registration accuracy. This is explored in different scenarios based on the motions involved in the depth video. For the general and most challenging case of objects deforming non-rigidly in full 3D, we propose a recursive dynamic multi-frame super-resolution algorithm where the relative local 3D motions between consecutive frames are directly accounted for. We rely on the assumption that these 3D motions can be decoupled into lateral motions and radial displacements. This allows to perform a simple local per-pixel tracking where both depth measurements and deformations are optimized. As compared to alternative approaches, the results show a clear improvement in reconstruction accuracy and in robustness to noise, to relative large non-rigid deformations, and to topological changes. Moreover, the proposed approach, implemented on a CPU, is shown to be computationally efficient and working in real-time. [less ▲] Detailed reference viewed: 254 (39 UL)![]() Al Ismaeil, Kassem ![]() ![]() in IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'15), (Best paper award) (2015, June 12) This paper proposes to enhance low resolution dynamic depth videos containing freely non–rigidly moving objects with a new dynamic multi–frame super–resolution algorithm. Existent methods are either ... [more ▼] This paper proposes to enhance low resolution dynamic depth videos containing freely non–rigidly moving objects with a new dynamic multi–frame super–resolution algorithm. Existent methods are either limited to rigid objects, or restricted to global lateral motions discarding radial displacements. We address these shortcomings by accounting for non–rigid displacements in 3D. In addition to 2D optical flow, we estimate the depth displacement, and simultaneously correct the depth measurement by Kalman filtering. This concept is incorporated efficiently in a multi–frame super–resolution framework. It is formulated in a recursive manner that ensures an efficient deployment in real–time. Results show the overall improved performance of the proposed method as compared to alternative approaches, and specifically in handling relatively large 3D motions. Test examples range from a full moving human body to a highly dynamic facial video with varying expressions. [less ▲] Detailed reference viewed: 266 (21 UL)![]() Aouada, Djamila ![]() ![]() ![]() in 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP'15) (2015, March) All existent methods for the statistical analysis of super–resolution approaches have stopped at the variance term, not accounting for the bias in the mean square error. In this paper we give an original ... [more ▼] All existent methods for the statistical analysis of super–resolution approaches have stopped at the variance term, not accounting for the bias in the mean square error. In this paper we give an original derivation of the bias term. We propose to use a patch-based method inspired by the work of (Chatterjee and Milanfar, 2009). Our approach, however, is completely new as we derive a new affine bias model dedicated for the multi-frame super resolution framework. We apply the proposed statistical performance analysis to the Upsampling for Precise Super–Resolution (UP-SR) algorithm. This algorithm was shown experimentally to be a good solution for enhancing the resolution of depth sequences in both cases of global and local motions. Its performance is herein analyzed theoretically in terms of its approximated mean square error, using the proposed derivation of the bias. This analysis is validated experimentally on simulated static and dynamic depth sequences with a known ground truth. This provides an insightful understanding of the effects of noise variance, number of observed low resolution frames, and super–resolution factor on the final and intermediate performance of UP–SR. Our conclusion is that increasing the number of frames should improve the performance while the error is increased due to local motions, and to the upsampling which is part of UP-SR. [less ▲] Detailed reference viewed: 262 (13 UL)![]() Aouada, Djamila ![]() ![]() in 11th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS'14) (2014) We address the limitation of low resolution depth cameras in the context of face recognition. Considering a face as a surface in 3-D, we reformulate the recently proposed Upsampling for Precise ... [more ▼] We address the limitation of low resolution depth cameras in the context of face recognition. Considering a face as a surface in 3-D, we reformulate the recently proposed Upsampling for Precise Super–Resolution algorithm as a new approach on three dimensional points. This reformulation allows an efficient implementation, and leads to a largely enhanced 3-D face reconstruction. Moreover, combined with a dedicated face detection and representation pipeline, the proposed method provides an improved face recognition system using low resolution depth cameras. We show experimentally that this system increases the face recognition rate as compared to directly using the low resolution raw data. [less ▲] Detailed reference viewed: 250 (31 UL)![]() Afzal, Hassan ![]() ![]() ![]() in Second International Conference on 3D Vision (2014) In this work we propose KinectDeform, an algorithm which targets enhanced 3D reconstruction of scenes containing non-rigidly deforming objects. It provides an innovation to the existing class of ... [more ▼] In this work we propose KinectDeform, an algorithm which targets enhanced 3D reconstruction of scenes containing non-rigidly deforming objects. It provides an innovation to the existing class of algorithms which either target scenes with rigid objects only or allow for very limited non-rigid deformations or use pre-computed templates to track them. KinectDeform combines a fast non-rigid scene tracking algorithm based on octree data representation and hierarchical voxel associations with a recursive data filtering mechanism. We analyze its performance on both real and simulated data and show improved results in terms of smoothness and feature preserving 3D reconstructions with reduced noise. [less ▲] Detailed reference viewed: 483 (64 UL)![]() Al Ismaeil, Kassem ![]() ![]() in 20th International Conference on Image Processing (2013, September) We enhance the resolution of depth videos acquired with low resolution time-of-flight cameras. To that end, we propose a new dedicated dynamic super-resolution that is capable to accurately super-resolve a ... [more ▼] We enhance the resolution of depth videos acquired with low resolution time-of-flight cameras. To that end, we propose a new dedicated dynamic super-resolution that is capable to accurately super-resolve a depth sequence containing one or multiple moving objects without strong constraints on their shape or motion, thus clearly outperforming any existing super-resolution techniques that perform poorly on depth data and are either restricted to global motions or not precise because of an implicit estimation of motion. Our proposed approach is based on a new data model that leads to a robust registration of all depth frames after a dense upsampling. The texture-less nature of depth images allows to robustly handle sequences with multiple moving objects as confirmed by our experiments. [less ▲] Detailed reference viewed: 263 (23 UL)![]() Al Ismaeil, Kassem ![]() ![]() in International Symposium on Image and Signal Processing and Analysis, ISPA (2013) A critical step in multi-frame super-resolution is the registration of frames based on their motion. We improve the performance of current state-of-the-art super-resolution techniques by proposing a more ... [more ▼] A critical step in multi-frame super-resolution is the registration of frames based on their motion. We improve the performance of current state-of-the-art super-resolution techniques by proposing a more robust and accurate registration as early as in the initialization stage of the high resolution estimate. Indeed, we solve the limitations on scale and motion inherent to the classical Shift & Add approach by upsampling the low resolution frames up to the super-resolution factor prior to estimating motion or to median filtering. This is followed by an appropriate selective optimization, leading to an enhanced Shift & Add. Quantitative and qualitative evaluations have been conducted at two levels; the initial estimation and the final optimized superresolution. Results show that the proposed algorithm outperforms existing state-of-art methods. © 2013 University of Trieste and University of Zagreb. [less ▲] Detailed reference viewed: 123 (1 UL)![]() Al Ismaeil, Kassem ![]() ![]() in Computer Analysis of Images and Patterns, 15th International Conference, CAIP 2013, York, UK, August 27-29, 2013, Proceedings, Part II (2013) Detailed reference viewed: 251 (15 UL)![]() Al Ismaeil, Kassem ![]() ![]() in 8th International Symposium on Image and Signal Processing and Analysis (2013) A critical step in multi-frame super-resolution is the registration of frames based on their motion. We improve the performance of current state-of-the-art super-resolution techniques by proposing a more ... [more ▼] A critical step in multi-frame super-resolution is the registration of frames based on their motion. We improve the performance of current state-of-the-art super-resolution techniques by proposing a more robust and accurate registration as early as in the initialization stage of the high resolution estimate. Indeed, we solve the limitations on scale and motion inherent to the classical Shift & Add approach by upsampling the low resolution frames up to the super-resolution factor prior to estimating motion or to median filtering. This is followed by an appropriate selective optimization, leading to an enhanced Shift & Add. Quantitative and qualitative evaluations have been conducted at two levels; the initial estimation and the final optimized super-resolution. Results show that the proposed algorithm outperforms existing state-of-art methods. [less ▲] Detailed reference viewed: 193 (16 UL)![]() Al Ismaeil, Kassem ![]() ![]() in Pattern Recognition (ICPR), 2012 21st International Conference on (2012) The well-known bilateral filter is used to smooth noisy images while keeping their edges. This filter is commonly used with Gaussian kernel functions without real justification. The choice of the kernel ... [more ▼] The well-known bilateral filter is used to smooth noisy images while keeping their edges. This filter is commonly used with Gaussian kernel functions without real justification. The choice of the kernel functions has a major effect on the filter behavior. We propose to use exponential kernels with L1 distances instead of Gaussian ones. We derive Stein's Unbiased Risk Estimate to find the optimal parameters of the new filter and compare its performance with the conventional one. We show that this new choice of the kernels has a comparable smoothing effect but with sharper edges due to the faster, smoothly decaying kernels. [less ▲] Detailed reference viewed: 158 (11 UL)![]() ; Al Ismaeil, Kassem ![]() in Computer Vision – ECCV 2012, 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI (2012) This paper deals with the problem of self-calibrating a moving camera with constant parameters. We propose a new set of quartic trivariate polynomial equations in the unknown coordinates of the plane at ... [more ▼] This paper deals with the problem of self-calibrating a moving camera with constant parameters. We propose a new set of quartic trivariate polynomial equations in the unknown coordinates of the plane at infinity derived under the no-skew assumption. Our new equations allow to further enforce the constancy of the principal point across all images while retrieving the plane at infinity. Six such polynomials, four of which are independent, are obtained for each triplet of images. The proposed equations can be solved along with the so-called modulus constraints and allow to improve the performance of existing methods. [less ▲] Detailed reference viewed: 107 (9 UL) |
||