References of "Guo, Yuejun 50039982"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailAn Empirical Study on Data Distribution-Aware Test Selection for Deep Learning Enhancement
Hu, Qiang UL; Guo, Yuejun UL; Cordy, Maxime UL et al

in ACM Transactions on Software Engineering and Methodology (2022)

Similar to traditional software that is constantly under evolution, deep neural networks (DNNs) need to evolve upon the rapid growth of test data for continuous enhancement, e.g., adapting to distribution ... [more ▼]

Similar to traditional software that is constantly under evolution, deep neural networks (DNNs) need to evolve upon the rapid growth of test data for continuous enhancement, e.g., adapting to distribution shift in a new environment for deployment. However, it is labor-intensive to manually label all the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, DNNs will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: 1) using different retraining processes; 2) ignoring data distribution shifts; 3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose a novel distribution-aware test (DAT) selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to 5 times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively. [less ▲]

Detailed reference viewed: 274 (62 UL)
Full Text
Peer Reviewed
See detailTowards Exploring the Limitations of Active Learning: An Empirical Study
Hu, Qiang UL; Guo, Yuejun UL; Cordy, Maxime UL et al

in The 36th IEEE/ACM International Conference on Automated Software Engineering. (2021)

Deep neural networks (DNNs) are being increasingly deployed as integral parts of software systems. However, due to the complex interconnections among hidden layers and massive hyperparameters, DNNs ... [more ▼]

Deep neural networks (DNNs) are being increasingly deployed as integral parts of software systems. However, due to the complex interconnections among hidden layers and massive hyperparameters, DNNs require being trained using a large number of labeled inputs, which calls for extensive human effort for collecting and labeling data. Spontaneously, to alleviate this growing demand, a surge of state-of-the-art studies comes up with different metrics to select a small yet informative dataset for the model training. These research works have demonstrated that DNN models can achieve competitive performance using a carefully selected small set of data. However, the literature lacks proper investigation of the limitations of data selection metrics, which is crucial to apply them in practice. In this paper, we fill this gap and conduct an extensive empirical study to explore the limits of selection metrics. Our study involves 15 selection metrics evaluated over 5 datasets (2 image classification tasks and 3 text classification tasks), 10 DNN architectures, and 20 labeling budgets (ratio of training data being labeled). Our findings reveal that, while selection metrics are usually effective in producing accurate models, they may induce a loss of model robustness (against adversarial examples) and resilience to compression. Overall, we demonstrate the existence of a trade-off between labeling effort and different model qualities. This paves the way for future research in devising selection metrics considering multiple quality criteria. [less ▲]

Detailed reference viewed: 255 (49 UL)
Peer Reviewed
See detailA scalable method to construct compact road networks from GPS trajectories
Guo, Yuejun UL; Bardera, Anton; FOrt, Marta et al

in International Journal of Geographical Information Science (2020), 0(0), 1-37

The automatic generation of road networks from GPS tracks is a challenging problem that has been receiving considerable attention in the last years. Although dozens of methods have been proposed, current ... [more ▼]

The automatic generation of road networks from GPS tracks is a challenging problem that has been receiving considerable attention in the last years. Although dozens of methods have been proposed, current techniques suffer from two main shortcomings: the quality of the produced road networks is still far from those produced manually, and the methods are slow, making them not scalable to large inputs. In this paper, we present a fast four-step density-based approach to construct a road network from a set of trajectories. A key aspect of our method is the use of an improved version of the Slide method to adjust trajectories to build a more compact density surface. The network has comparable or better quality than that of state-of-the-art methods and is simpler (includes fewer nodes and edges). Furthermore, we also propose a split-and-merge strategy that allows splitting the data domain into smaller regions that can be processed independently, making the method scalable to large inputs. The performance of our method is evaluated with extensive experiments on urban and hiking data. [less ▲]

Detailed reference viewed: 57 (12 UL)