Abstract :
[en] The rapid evolution of machine learning model training has outpaced the development of corresponding measurement and testing tools, leading to two significant challenges. Firstly, developers of deep learning frameworks struggle to identify operators that fail to meet precision criteria, as these issues may only manifest in a few data points. Secondly, model trainers lack effective methods to estimate precision loss caused by operators during training. To address these issues, we introduce a Pythonic framework inspired by common network layer definitions in deep learning. Our framework includes two new layers: the Fuzz Layer and the Check Layer, designed to aid in measurement and testing. The Fuzz Layer introduces minor perturbations to tensor inputs for any deterministic layer under testing (LUT). The Check Layer then measures precision by analyzing the differences before and after the perturbation. This approach estimates a lower bound of the maximal relative error and alerts developers or trainers of potential bugs if the difference exceeds a predefined tolerance. Additionally, Check Layers can be used independently to conduct precision tests for specific layers, ensuring the precision of operators during runtime. Despite the additional memory and time requirements, this runtime testing ensures proper training of the original model. We demonstrate the utility of our framework, FuMi, through two experiments. First, we tested 21 torch operators across 9 popular machine learning models using PyTorch for various tasks, finding that the
conv2d
and
linear
operators often fail to meet precision requirements. Second, to showcase the generalizability of our framework, we tested the
ATTENTION
operator. By comparing different implementations of state-of-the-art
ATTENTION
operators, we found that the maximum relative error of the
ATTENTION
operator is not less than 1%, which is 13 times larger than that measured by Predoo (a unit test tool). This framework provides a robust solution for identifying precision issues in deep learning models, ensuring reliable and accurate model training.