Results 1-11 of 11.
((uid:50040093))

Bookmark and Share    
See detailConvergence Analysis of Decentralized ASGD
Dalle Lucca Tosi, Mauro UL; Theobald, Martin UL

E-print/Working paper (2023)

Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models ... [more ▼]

Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of O(σ ɛ⁻²) + O(Q S_avg ɛ⁻³ᐟ²)+ O(S_avg ɛ⁻¹)) for the convergence rate of DASGD, where S_avg is the average staleness between models, Q is a constant that bounds the norm of the gradients, and ɛ is a (small) error that is allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be O(σ ɛ⁻²) + O(√(Ŝ_avg Ŝ_max) ɛ⁻¹)), with Ŝ_max and Ŝ_avg representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We anticipate that our results will be of high relevance for the adoption of DASGD by a broad community of researchers and developers. [less ▲]

Detailed reference viewed: 20 (1 UL)
See detailOPTWIN: Drift identification with optimal sub-windows
Dalle Lucca Tosi, Mauro UL; Theobald, Martin UL

E-print/Working paper (2023)

Online Learning (OL) is a field of research that is increasingly gaining attention both in academia and industry. One of the main challenges of OL is the inherent presence of concept drifts, which are ... [more ▼]

Online Learning (OL) is a field of research that is increasingly gaining attention both in academia and industry. One of the main challenges of OL is the inherent presence of concept drifts, which are commonly defined as unforeseeable changes in the statistical properties of an incoming data stream over time. The detection of concept drifts typically involves analyzing the error rates produced by an underlying OL algorithm in order to identify if a concept drift occurred or not, such that the OL algorithm can adapt accordingly. Current concept-drift detectors perform very well, i.e., with low false negative rates, but they still tend to exhibit high false positive rates in the concept-drift detection. This may impact the performance of the learner and result in an undue amount of computational resources spent on retraining a model that actually still performs within its expected range. In this paper, we propose OPTWIN, our “OPTimal WINdow” concept drift detector. OPTWIN uses a sliding window of events over an incoming data stream to track the errors of an OL algorithm. The novelty of OPTWIN is to consider both the means and the variances of the error rates produced by a learner in order to split the sliding window into two provably optimal subwindows, such that the split occurs at the earliest event at which a statistically significant difference according to either the 𝑡- or the 𝑓 -tests occurred. We assessed OPTWIN over the MOA framework, using ADWIN, DDM, EDDM, STEPD and ECDD as baselines over 7 synthetic and real-world datasets, and in the presence of both sudden and gradual concept drifts. In our experiments, we show that OPTWIN surpasses the F1-score of the baselines in a statistically significant manner while maintaining a lower detection delay and saving up to 21% of time spent on retraining the models. [less ▲]

Detailed reference viewed: 45 (5 UL)
Full Text
Peer Reviewed
See detailEfficient Hessian-based DNN Optimization via Chain-Rule Approximation
Temperoni, Alessandro UL; Dalle Lucca Tosi, Mauro UL; Theobald, Martin UL

in Proceedings of the 6th Joint International Conference on Data Science Management of Data (10th ACM IKDD CODS and 28th COMAD) (2023)

Learning non use-case specific models has been shown to be a challenging task in Deep Learning (DL). Hyperparameter tuning requires long training sessions that have to be restarted any time the network or ... [more ▼]

Learning non use-case specific models has been shown to be a challenging task in Deep Learning (DL). Hyperparameter tuning requires long training sessions that have to be restarted any time the network or the dataset changes and are not affordable by most stakeholders in industry and research. Many attempts have been made to justify and understand the source of the use-case specificity that distinguishes DL problems. To this date, second-order optimization methods have been partially shown to be effective in some cases but have not been sufficiently investigated in the context of learning and optimization. In this work, we present a chain rule for the efficient approximation of the Hessian matrix (i.e., the second-order derivatives) of the weights across the layers of a Deep Neural Network (DNN). We show the application of our approach for weight optimization during DNN training, as we believe that this is a step that particularly suffers from the enormous variety of the optimizers provided by state-of-the-art libraries such as Keras and PyTorch. We demonstrate—both theoretically and empirically—the improved accuracy of our approximation technique and that the Hessian is a useful diagnostic tool which helps to more rigorously optimize training. Our preliminary experiments prove the efficiency as well as the improved convergence of our approach which both are crucial aspects for DNN training. [less ▲]

Detailed reference viewed: 55 (14 UL)
See detailTensAIR: Online Learning from Data Streams via Asynchronous Iterative Routing
Dalle Lucca Tosi, Mauro UL; Ellampallil Venugopal, Vinu UL; Theobald, Martin UL

E-print/Working paper (2023)

Online learning (OL) from data streams is an emerging area of research that encompasses numerous challenges from stream processing, machine learning, and networking. Recent extensions of stream-processing ... [more ▼]

Online learning (OL) from data streams is an emerging area of research that encompasses numerous challenges from stream processing, machine learning, and networking. Recent extensions of stream-processing platforms, such as Apache Kafka and Flink, already provide basic extensions for the training of neural networks in a stream-processing pipeline. However, these extensions are not scalable and flexible enough for many real-world use-cases, since they do not integrate the neural-network libraries as a first-class citizen into their architectures. In this paper, we present TensAIR, which provides an end-to-end dataflow engine for OL from data streams via a protocol to which we refer as asynchronous iterative routing. TensAIR supports the common dataflow operators, such as Map, Reduce, Join, and has been augmented by the data-parallel OL functions train and predict. These belong to the new Model operator, in which an initial TensorFlow model (either freshly initialized or pre-trained) is replicated among multiple decentralized worker nodes. Our decentralized architecture allows TensAIR to efficiently shard incoming data batches across the distributed model replicas, which in turn trigger the model updates via asynchronous stochastic gradient descent. We empirically demonstrate that TensAIR achieves a nearly linear scale-out in terms of (1) the number of worker nodes deployed in the network, and (2) the throughput at which the data batches arrive at the dataflow operators. We exemplify the versatility of TensAIR by investigating both sparse (Word2Vec) and dense (CIFAR-10) use-cases, for which we are able to demonstrate very significant performance improvements in comparison to Kafka, Flink, and Horovod. We also demonstrate the magnitude of these improvements by depicting the possibility of real-time concept drift adaptation of a sentiment analysis model trained over a Twitter stream. [less ▲]

Detailed reference viewed: 39 (11 UL)
Full Text
Peer Reviewed
See detailConvergence time analysis of Asynchronous Distributed Artificial Neural Networks
Dalle Lucca Tosi, Mauro UL; Ellampallil Venugopal, Vinu; Theobald, Martin UL

in 5th Joint International Conference on Data Science Management of Data (9th ACM IKDD CODS and 27th COMAD) (2022)

Artificial Neural Networks (ANNs) have drawn academy and industry attention for their ability to represent and solve complex problems. Researchers are studying how to distribute their computation to ... [more ▼]

Artificial Neural Networks (ANNs) have drawn academy and industry attention for their ability to represent and solve complex problems. Researchers are studying how to distribute their computation to reduce their training time. However, the most common approaches in this direction are synchronous, letting computational resources sub-utilized. Asynchronous training does not have this drawback but is impacted by staled gradient updates, which have not been extended researched yet. Considering this, we experimentally investigate how stale gradients affect the convergence time and loss value of an ANN. In particular, we analyze an asynchronous distributed implementation of a Word2Vec model, in which the impact of staleness is negligible and can be ignored considering the computational speedup we achieve by allowing the staleness. [less ▲]

Detailed reference viewed: 65 (11 UL)
Full Text
Peer Reviewed
See detailUnderstanding the evolution of a scientific field by clustering and visualizing knowledge graphs
Dalle Lucca Tosi, Mauro UL; dos Reis, Julio Cesar

in Journal of Information Science (2022), 48(1), 71--89

The process of tracking the evolution of a scientific field is arduous. It allows researchers to understand trends in areas of science and predict how they may evolve. Nowadays, most of the automated ... [more ▼]

The process of tracking the evolution of a scientific field is arduous. It allows researchers to understand trends in areas of science and predict how they may evolve. Nowadays, most of the automated mechanisms developed to assist researchers in this process do not consider the content of articles to identify changes in its structure, only the articles metadata. These methods are not suited to easily assist researchers to study the concepts that compose an area and its evolution. In this article, we propose a method to track the evolution of a scientific field at a concept-level. Our method structures a scientific field using two knowledge graphs, representing distinct periods of the studied field. Then, it clusters them and identifies correspondent clusters between the knowledge graphs, representing the same sub-areas in distinct time periods. Our solution enables to compare the corresponding clusters, tracking their evolution. We apply and experiment our method in two case studies concerning the Artificial Intelligence and the Biotechnology fields. Findings indicate befitting results regarding the way their evolution can be assessed with our implemented software tool. From our analyses, we perceived evolution in broader sub-areas of a scientific field, as the growth of the "Convolutional Neural Network" area from 2006; to specific ones, as the decrease of researches using mice to study BRAF-mutation lung cancer from 2018. This work contributes with the development of a web application with interactive user interfaces to assist researchers in representing, analyzing, and tracking the evolution of scientific fields at a concept-level. [less ▲]

Detailed reference viewed: 44 (8 UL)
Full Text
Peer Reviewed
See detailSciKGraph: A knowledge graph approach to structure a scientific field
Dalle Lucca Tosi, Mauro UL; dos Reis, Julio Cesar

in Journal of Informetrics (2021), 15(1), 101109

Understanding the structure of a scientific domain and extracting specific information from it is laborious. The high amount of manual effort required to this end indicates that the way knowledge has been ... [more ▼]

Understanding the structure of a scientific domain and extracting specific information from it is laborious. The high amount of manual effort required to this end indicates that the way knowledge has been structured and visualized until the present day should be improved in software tools. Nowadays, scientific domains are organized based on citation networks or bag-of-words techniques, disregarding the intrinsic semantics of concepts presented in literature documents. We propose a novel approach to structure scientific fields, which uses semantic analysis from natural language texts to construct knowledge graphs. Then, our approach clusters knowledge graphs in their main topics and automatically extracts information such as the most relevant concepts in topics and overlapping concepts between topics. We evaluate the proposed model in two datasets from distinct areas. The results achieve up to 84% of accuracy in the task of document classification without using annotated data to segment topics from a set of input documents. Our solution identifies coherent keyphrases and key concepts considering the dataset used. The SciKGraph framework contributes by structuring knowledge that might aid researchers in the study of their areas, reducing the effort and amount of time devoted to groundwork. [less ▲]

Detailed reference viewed: 36 (6 UL)
Full Text
Peer Reviewed
See detailKeyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs
Dalle Lucca Tosi, Mauro UL; Reis, Julio Cesar Dos

in International Journal of Metadata, Semantics and Ontologies (2021), 15(2), 121--132

The keyphrase extraction task is a fundamental and challenging task designed to automatically extract a set of keyphrases from textual documents. Keyphrases are fundamental to assist publishers in ... [more ▼]

The keyphrase extraction task is a fundamental and challenging task designed to automatically extract a set of keyphrases from textual documents. Keyphrases are fundamental to assist publishers in indexing documents and readers in identifying the most relevant ones. They are short phrases composed of one or more terms used to best represent a textual document and its main topics. In this article, we extend our research on C-Rank, an unsupervised approach that automatically extracts keyphrases from single documents. C-Rank uses a concept-linking approach that links concepts in common between single documents and an external background knowledge base. Our approach uses those concepts as candidate keyphrases, which are modeled in a co-occurrence graph. On this basis, keyphrases are extracted relying on heuristics and their centrality in the graph. We advance our study over C-Rank by evaluating it using different concept-linking approaches - Babelfy and DBPedia Spotlight. The evaluation was performed in five gold-standard datasets composed of distinct types of data - academic articles, academic abstracts, and news articles. Our findings indicate that C-Rank achieves state-of-the-art results extracting keyphrases from scientific documents by experimentally comparing it to other unsupervised existing approaches. [less ▲]

Detailed reference viewed: 36 (3 UL)
Full Text
Peer Reviewed
See detailC-rank: a concept linking approach to unsupervised keyphrase extraction
Dalle Lucca Tosi, Mauro UL; Reis, Julio Cesar Dos

in Research Conference on Metadata and Semantics Research (2019)

Keyphrase extraction is the task of identifying a set of phrases that best represent a natural language document. It is a fundamental and challenging task that assists publishers to index and recommend ... [more ▼]

Keyphrase extraction is the task of identifying a set of phrases that best represent a natural language document. It is a fundamental and challenging task that assists publishers to index and recommend relevant documents to readers. In this article, we introduce C-Rank, a novel unsupervised approach to automatically extract keyphrases from single documents by using concept linking. Our method explores Babelfy to identify candidate keyphrases, which are weighted based on heuristics and their centrality inside a co-occurrence graph where keyphrases appear as vertices. It improves the results obtained by graph-based techniques without training nor background data inserted by users. Evaluations are performed on SemEval and INSPEC datasets, producing competitive results with state-of-the-art tools. Furthermore, C-Rank generates intermediate structures with semantically annotated data that can be used to analyze larger textual compendiums, which might improve domain understatement and enrich textual representation methods. [less ▲]

Detailed reference viewed: 27 (0 UL)
Full Text
See detailHybrid model for word prediction using naive bayes and latent information
Goulart, Henrique X.; Dalle Lucca Tosi, Mauro UL; Goncalves, Daniel et al

E-print/Working paper (2018)

Historically, the Natural Language Processing area has been given too much attention by many researchers. One of the main motivations beyond this interest is related to the word prediction problem, which ... [more ▼]

Historically, the Natural Language Processing area has been given too much attention by many researchers. One of the main motivations beyond this interest is related to the word prediction problem, which states that given a set of words in a sentence, one can recommend the next word. In literature, this problem is solved by methods based on syntactic or semantic analysis. Solely, each of these analyses cannot achieve practical results for end-user applications. For instance, the Latent Semantic Analysis can handle semantic features of text, but cannot suggest words considering syntactical rules [1]. On the other hand, there are models that treat both methods together and achieve state-of-the-art results, e.g. Deep Learning. These models can demand high computational effort, which can make the model infeasible for certain types of applications. With the advance of the technology and mathematical models, it is possible to develop faster systems with more accuracy. This work proposes a hybrid word suggestion model, based on Naive Bayes and Latent Semantic Analysis, considering neighboring words around unfilled gaps. Results show that this model could achieve 44.2% of accuracy in the MSR Sentence Completion Challenge. [less ▲]

Detailed reference viewed: 53 (4 UL)