[en] This study investigates the inability of two popular data splitting techniques: train/test split and k-fold cross-validation that are to create training and validation data sets, and to achieve sufficient generality for supervised deep learning (DL) methods. This failure is mainly caused by their limited ability of new data creation. In response, the bootstrap is a computer based statistical resampling method that has been used efficiently for estimating the distribution of a sample estimator and to assess a model without having knowledge about the population. This paper couples cross-validation and bootstrap to have their respective advantages in view of data generation strategy and to achieve better generalization of a DL model. This paper contributes by: (i) developing an algorithm for better selection of training and validation data sets, (ii) exploring the potential of bootstrap for drawing statistical inference on the necessary performance metrics (e.g., mean square error), and (iii) introducing a method that can assess and improve the efficiency of a DL model. The proposed method is applied for semantic segmentation and is demonstrated via a DL based classification algorithm, PointNet, through aerial laser scanning point cloud data.
Research center :
ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Nurunnabi, Abdul Awal Md ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Engineering (DoE)
Teferle, Felix Norman ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Engineering (DoE)
Laefer, Debra; 2Center for Urban Science and Progress ; Department of Civil and Urban Engineering > New York University, USA
Remondino, Fabio; 33D Optical Metrology (3DOM) unit > Bruno Kessler Foundation (FBK), Trento, Italy
Karas, Ismail Rakip; Department of Computer Engineering > Karabuk University, Karabuk, Turkey
Li, Jonatha; 5Geography and Environmental Management > University of Waterloo, Canada
External co-authors :
yes
Language :
English
Title :
kCV-B: Bootstrap with Cross-Validation for Deep Learning Model Development, Assessment and Selection
Publication date :
October 2022
Event name :
The 7th Smart City Applications, International Conference
Event place :
Castelo Branco, Portugal
Event date :
from 19-10-2022 to 21-10-2022
Audience :
International
Main work title :
kCV-B: Bootstrap with Cross-Validation for Deep Learning Model Development, Assessment and Selection
AHN3: Actueel Hoogtebestand Nederland. Available at: https: //app. pdok. nl/ahn3-downloadpage/.
Aljumaily, H., Laefer, D., Cuadra, D., 2015. Big-data approach for three-dimensional building extraction from aerial laser scanning. J. Comp. Civil Eng., ASCE, https: //dx. doi. org/10. 1061/ (ASCE)CP. 1943-5487. 0000524.
Basiri, S., Ollila, E., Koivunen, V., 2017. Enhanced bootstrap method for statistical inference in the ICA model. Signal Process, 138, 53-62.
Becker, C., Rosinskaya, E., Häni, N., d'Angelo, E., Strecha, C., 2018. Classification of aerial photogrammetric 3D point clouds. Photogramm Eng Remote Sensing, 84 (5), 287-295.
Bishop, C. M., 2006. Pattern Recognition and Machine Learning. Springer, New York, USA.
Boos, D. D., Stefanski, L. A., 2013. Essential Statistical Inference: Theory and Methods. Springer.
Boulch, A., 2020. ConvPoint: Continuous convolutions for point cloud processing. Comput Graph, 88, 24-34.
Daszykowski, M., Walczak, B., Massart, D. L., 2002. Representative subset selection. Anal Chim Acta., 468, 91-103.
Davison, A., Hinkley, D., 1997. Bootstrap Methods and their Application. Cambridge Univ. Press, Cambridge.
Efron, B., Tibshirani, R., 1993. An Introduction to the Bootstrap. Chapman and Hall, New York.
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT press.
Harrington, P. D., 2017. Multiple versus single set validation of multivariate models to avoid mistakes. Crit Rev Anal Chem., 48, 33-46.
Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z., Trigoni N., Markham, A., 2020. Randla-Net: Efficient semantic segmentation of large-scale point clouds. IEEE CVPR, 11108-11117.
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K., 2015. Spatial transformer networks. ArXiv: 1506. 02025.
James, G., Witten, D., Hastie, T., Tibshirani, R., 2015. An Introduction to Statistical Learning, Springer.
Kingma, D. P., Ba, J., 2014. Adam: A method for stochastic optimization. ArXiv preprint arXiv: 1412. 6980.
Krizhevsky, A., Sutskever, I., Hinton, G., 2012. ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 1097-1105.
LeCun, Y., et al., 1989. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1 (4), 541-551.
Manly, B., 2020. Randomization, Bootstrap and Monte Carlo Methods in Biology. Boca Raton, FL: Chapman and Hall/CRC.
Majgaonkar, O., Panchal, K., Laefer, D. F., Stanley, M., Zaki, Y., 2021. Assessing LiDAR training data quantities for classification models. Int. Arch. of the Photogramm. Remote Sens. And Spat. Info. Sci., Vol. 46, 101-106.
Montavon, G., Samek, W., Müller, K-R., 2018. Methods for interpreting and understanding deep neural networks. Digit. Signal Process., 73, 1-15.
Nurunnabi, A., Belton, D., West, G., 2012. Robust segmentation for multiple planar surface extraction in laser scanning 3D point cloud data. IEEE ICPR, 1367-1370.
Nurunnabi, A., West, G., Belton, D., 2015. Outlier detection and robust normal-curvature estimation in mobile laser scanning 3D point cloud data. Pattern Recognit., 48 (4), 1404-1419.
Nurunnabi, A., Sadahiro, Y., Lindenbergh, R., Belton, D., 2019. Robust cylinder fitting in laser scanning point cloud data. Measurement, Vol. 138, 632-651.
Nurunnabi, A., Teferle, F. N., Li, J., Lindenbergh, R. C., Hunegnaw, A., 2021a. An efficient deep learning approach for ground point filtering in aerial laser scanning point clouds. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci, XLIII-B1, 31-38.
Nurunnabi, A., Teferle, F. N., Li, J., Lindenbergh, R. C., Parvaz, S., 2021b. Investigation of PointNet for semantic segmentation of large-scale outdoor point clouds", Int. Arch. of the Photogramm. Remote Sens. And Spat. Info. Sci., Vol. XLVI-4/W5, 397-404.
Nurunnabi, A., Teferle, F. N., 2022. Resampling methods for a reliable validation set in deep learning-based point cloud classification. Int. Arch. of the Photogramm. Remote Sens. And Spat. Info. Sci., Vol. XLIII-B2-2022, 617-624.
Puzyn, T., Mostrag-Szlichtyng, A., Gajewicz, A., Skrzyski, M., Worth, A. P., 2011. Investigating the influence of data splitting on the predictive ability of QSAR/QSPR models. Struct Chem., 22, 795-804.
Qi, C. R., Su, H., Mo, K., Guibas, L. J., 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. IEEE CVPR, 652-660.
Raschka, S., 2020. Model evaluation, model selection, and algorithm selection in machine learning. ArXiv: 1811. 12808v3
Su, Y., et al., 2022. DLA-Net: Learning dual local attention features for semantic segmentation of large-scale building facade point clouds, Pattern Recognit., 123, 108272.
Taylor, B. J., 2005. Methods and Procedures for the Verification and Validation of Artificial Neural Networks. Springer-Verlag, New York, Inc., Secaucus, NJ, USA.
Thomas, J. D., Efron, B. 1996. Bootstrap confidence intervals. Stat Sci., 11 (3), 189-212.
Tsamardinos, I., Greasidou, E., Borboudakis, G., 2018. Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation. Mach Learn, 107 (12), 1895-1922.
Tuia, D., Persello, C., Bruzzone, L., 2016. Domain adaptation for the classification of remote sensing data: An overview of recent advances. IEEE Geosci. Remote Sens. Mag., 4 (2), 41-57.
Varney, N., Asari, V. K., Graehling, Q., 2020. DALES: A large-scale aerial LiDAR data set for semantic segmentation IEEE CVPR Workshops, 186-187.
Wainer, J., Cawley, G., 2021. Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Syst. Appl., 182, 115222.
Weidner, L., Walton, G., 2021. The influence of training data variability on a supervised machine learning classifier for Structure from Motion (SfM) point clouds of rock slopes. Eng Geol., 294 (106344), 1-16.
Xie, W., Liang, G., Dong, Z., Tan, B., Zhang, B., 2019. An improved oversampling algorithm based on the samples selection strategy for classifying imbalanced data. Math. Prob. Eng. https: //doi. org/10. 1155/2019/3526539.