References of "Hammerschmidt, Christian 50001936"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailWorking with Deep Generative Models and Tabular Data Imputation
Camino, Ramiro Daniel UL; Hammerschmidt, Christian UL; State, Radu UL

Scientific Conference (2020, July 17)

Datasets with missing values are very common in industry applications. Missing data typically have a negative impact on machine learning models. With the rise of generative models in deep learning, recent ... [more ▼]

Datasets with missing values are very common in industry applications. Missing data typically have a negative impact on machine learning models. With the rise of generative models in deep learning, recent studies proposed solutions to the problem of imputing missing values based various deep generative models. Previous experiments with Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) showed promising results in this domain. Initially, these results focused on imputation in image data, e.g. filling missing patches in images. Recent proposals addressed missing values in tabular data. For these data, the case for deep generative models seems to be less clear. In the process of providing a fair comparison of proposed methods, we uncover several issues when assessing the status quo: the use of under-specified and ambiguous dataset names, the large range of parameters and hyper-parameters to tune for each method, and the use of different metrics and evaluation methods. [less ▲]

Detailed reference viewed: 130 (4 UL)
Full Text
Peer Reviewed
See detailTime Series Modeling of Market Price in Real-Time Bidding
Du, Manxing UL; Hammerschmidt, Christian UL; Varisteas, Georgios UL et al

in 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2019, April)

Real-Time-Bidding (RTB) is one of the most popular online advertisement selling mechanisms. Modeling the highly dynamic bidding environment is crucial for making good bids. Market prices of auctions ... [more ▼]

Real-Time-Bidding (RTB) is one of the most popular online advertisement selling mechanisms. Modeling the highly dynamic bidding environment is crucial for making good bids. Market prices of auctions fluctuate heavily within short time spans. State-of-the-art methods neglect the temporal dependencies of bidders’ behaviors. In this paper, the bid requests are aggregated by time and the mean market price per aggregated segment is modeled as a time series. We show that the Long Short Term Memory (LSTM) neural network outperforms the state-of-the-art univariate time series models by capturing the nonlinear temporal dependencies in the market price. We further improve the predicting performance by adding a summary of exogenous features from bid requests. [less ▲]

Detailed reference viewed: 174 (17 UL)
Full Text
Peer Reviewed
See detailAn Experimental Analysis of Fraud Detection Methods in Enterprise Telecommunication Data using Unsupervised Outlier Ensembles
Kaiafas, Georgios UL; Hammerschmidt, Christian UL; Lagraa, Sofiane UL et al

in Kaiafas, Georgios; Hammerschmidt, Christian; State, Radu (Eds.) 16th IFIP/IEEE Symposium on Integrated Network and Service Management (IM 2019) (2019)

Detailed reference viewed: 116 (13 UL)
Full Text
Peer Reviewed
See detailGenerating Multi-Categorical Samples with Generative Adversarial Networks
Camino, Ramiro Daniel UL; Hammerschmidt, Christian UL; State, Radu UL

Scientific Conference (2018, July)

We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have ... [more ▼]

We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into account the structure of the data. We evaluate the performance of our architecture on datasets with different sparsity, number of features, ranges of categorical values, and dependencies among the features. Our proposed architecture and method outperforms existing models. [less ▲]

Detailed reference viewed: 161 (24 UL)
See detailLearning Finite Automata via Flexible State-Merging and Applications in Networking
Hammerschmidt, Christian UL

Doctoral thesis (2017)

Being able to model behavior described by a linear sequence of observations (such as log files) goes a long way towards better understanding the underlying processes. This improved understanding can be ... [more ▼]

Being able to model behavior described by a linear sequence of observations (such as log files) goes a long way towards better understanding the underlying processes. This improved understanding can be very helpful in a number of activities, ranging from software (reverse) engineering to network traffic analysis. The developments in this thesis were driven by specific goals in predicting (human) behaviors captured by a software appliance observing network traffic and user requests to specific resources. Its final contributions have exceeded the original goals of the project in two important ways: I present (1) a flexible learning algorithm for finite automata accompanied by theoretical underpinning and its implementation, a contribution towards better learning algorithms, and (2) applications of the algorithm to use-cases in computer networking and beyond. The central algorithm considered in the thesis is a blue-fringe state-merging automaton learning algorithm, conducting a greedy search over feasible solutions. Its key components are a heuristic to search for consistent merges and an evaluation metric to assess the quality of a merge by assigning scores to merges. I generalize this framework by making the heuristic components explicitly parametric. While state-merging algorithms were originally defined for probabilistic and non-probabilistic finite state machines and later used to derive algorithms for more extended models such as real-time automata, the work presented here extends the scope of the algorithms to a wide range of ad-hoc defined models as well as enables the user to implement modifications to the heuristic search process. These modifications help to account for domain knowledge and richer semantics of models with a regular language core. I provide an implementation and a Python interface of the flexible state-merging framework, including stream/online and interactive variants of the algorithm based on a C++ implementation of the blue-fringe greedy search algorithm called DFASAT. The algorithm and the framework encompass and improve upon state-of-the-art approaches. The application problems considered in this thesis can be seen as classical classification and anomaly detection tasks in machine learning. The application domain is network traffic analysis with a focus on network security. I discuss the problematic properties of data from computer networks and address how using automaton models can help mitigate them. I then use the flexible state-merging approach for host profiling. I show how to efficiently learn finite state automata as behavioral profiles. These profiles can serve as digital fingerprints and help to identify malicious traffic such as botnet traffic. Moreover, I show how communication profiles can be used for sequence clustering on NetFlow data to distinguish different behaviors over time. [less ▲]

Detailed reference viewed: 223 (43 UL)
Peer Reviewed
See detailReliable Machine Learning for Networking: Key Concerns and Approaches
Hammerschmidt, Christian UL; Garcia, Sebastian; Verwer, Sicco et al

Poster (2017, October)

Machine learning has become one of the go-to methods for solving problems in the field of networking. This development is driven by data availability in large-scale networks and the commodification of ... [more ▼]

Machine learning has become one of the go-to methods for solving problems in the field of networking. This development is driven by data availability in large-scale networks and the commodification of machine learning frameworks. While this makes it easier for researchers to implement and deploy machine learning solutions on networks quickly, there are a number of vital factors to account for when using machine learning as an approach to a problem in networking and translate testing performance to real networks deployments successfully. This paper, rather than presenting a particular technical result, discusses the necessary considerations to obtain good results when using machine learning to analyze network-related data. [less ▲]

Detailed reference viewed: 153 (3 UL)
Full Text
Peer Reviewed
See detailflexfringe: A Passive Automaton Learning Package
Verwer, Sicco E.; Hammerschmidt, Christian UL

in Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on (2017, September)

Detailed reference viewed: 478 (1 UL)
Full Text
Peer Reviewed
See detailHuman in the Loop: Interactive Passive Automata Learning via Evidence-Driven State-Merging Algorithms
Hammerschmidt, Christian UL; State, Radu UL; Verwer, Sicco

Poster (2017, August)

We present an interactive version of an evidence-driven state-merging (EDSM) algorithm for learning variants of finite state automata. Learning these automata often amounts to recovering or reverse ... [more ▼]

We present an interactive version of an evidence-driven state-merging (EDSM) algorithm for learning variants of finite state automata. Learning these automata often amounts to recovering or reverse engineering the model generating the data despite noisy, incomplete, or imperfectly sampled data sources rather than optimizing a purely numeric target function. Domain expertise and human knowledge about the target domain can guide this process, and typically is captured in parameter settings. Often, domain expertise is subconscious and not expressed explicitly. Directly interacting with the learning algorithm makes it easier to utilize this knowledge effectively. [less ▲]

Detailed reference viewed: 74 (1 UL)
Full Text
Peer Reviewed
See detailBotGM: Unsupervised Graph Mining to Detect Botnets in Traffic Flows
Lagraa, Sofiane UL; François, Jérôme; Lahmadi, Abdelkader et al

in CSNet 2017 Conference Proceedings (2017)

Botnets are one of the most dangerous and serious cybersecurity threats since they are a major vector of large-scale attack campaigns such as phishing, distributed denial-of-service (DDoS) attacks ... [more ▼]

Botnets are one of the most dangerous and serious cybersecurity threats since they are a major vector of large-scale attack campaigns such as phishing, distributed denial-of-service (DDoS) attacks, trojans, spams, etc. A large body of research has been accomplished on botnet detection, but recent security incidents show that there are still several challenges remaining to be addressed, such as the ability to develop detectors which can cope with new types of botnets. In this paper, we propose BotGM, a new approach to detect botnet activities based on behavioral analysis of network traffic flow. BotGM identifies network traffic behavior using graph-based mining techniques to detect botnets behaviors and model the dependencies among flows to traceback the root causes then. We applied BotGM on a publicly available large dataset of Botnet network flows, where it detects various botnet behaviors with a high accuracy without any prior knowledge of them. [less ▲]

Detailed reference viewed: 143 (3 UL)
Full Text
Peer Reviewed
See detailInterpreting Finite Automata for Sequential Data
Hammerschmidt, Christian UL; Verwer, S.; Lin, Q. et al

in Interpretable Machine Learning for Complex Systems: NIPS 2016 workshop proceedings (2016)

Detailed reference viewed: 266 (26 UL)
Full Text
Peer Reviewed
See detailEfficient Learning of Communication Profiles from IP Flow Records
Hammerschmidt, Christian UL; Marchal, Samuel; Pellegrino, Gaetano et al

Poster (2016, November)

The task of network traffic monitoring has evolved drastically with the ever-increasing amount of data flowing in large scale networks. The automated analysis of this tremendous source of information ... [more ▼]

The task of network traffic monitoring has evolved drastically with the ever-increasing amount of data flowing in large scale networks. The automated analysis of this tremendous source of information often comes with using simpler models on aggregated data (e.g. IP flow records) due to time and space constraints. A step towards utilizing IP flow records more effectively are stream learning techniques. We propose a method to collect a limited yet relevant amount of data in order to learn a class of complex models, finite state machines, in real-time. These machines are used as communication profiles to fingerprint, identify or classify hosts and services and offer high detection rates while requiring less training data and thus being faster to compute than simple models. [less ▲]

Detailed reference viewed: 253 (7 UL)
Full Text
Peer Reviewed
See detailBehavioral Clustering of Non-Stationary IP Flow Record Data
Hammerschmidt, Christian UL; Marchal, Samuel; State, Radu UL et al

Poster (2016, October)

Detailed reference viewed: 187 (5 UL)
Full Text
Peer Reviewed
See detailLearning Deterministic Finite Automata from Infinite Alphabets
Pellegrino, Gaetano; Hammerschmidt, Christian UL; Lin, Qin et al

Scientific Conference (2016, October)

We proposes an algorithm to learn automata infinite alphabets, or at least too large to enumerate. We apply it to define a generic model intended for regression, with transitions constrained by intervals ... [more ▼]

We proposes an algorithm to learn automata infinite alphabets, or at least too large to enumerate. We apply it to define a generic model intended for regression, with transitions constrained by intervals over the alphabet. The algorithm is based on the Red \& Blue framework for learning from an input sample. We show two small case studies where the alphabets are respectively the natural and real numbers, and show how nice properties of automata models like interpretability and graphical representation transfer to regression where typical models are hard to interpret. [less ▲]

Detailed reference viewed: 151 (3 UL)
Full Text
Peer Reviewed
See detailFlexible State-Merging for learning (P)DFAs in Python
Hammerschmidt, Christian UL; Loos, Benjamin Laurent UL; Verwer, Sicco et al

Scientific Conference (2016, October)

We present a Python package for learning (non-)probabilistic deterministic finite state automata and provide heuristics in the red-blue framework. As our package is built along the API of the popular ... [more ▼]

We present a Python package for learning (non-)probabilistic deterministic finite state automata and provide heuristics in the red-blue framework. As our package is built along the API of the popular \texttt{scikit-learn} package, it is easy to use and new learning methods are easy to add. It provides PDFA learning as an additional tool for sequence prediction or classification to data scientists, without the need to understand the algorithm itself but rather the limitations of PDFA as a model. With applications of automata learning in diverse fields such as network traffic analysis, software engineering and biology, a stratified package opens opportunities for practitioners. [less ▲]

Detailed reference viewed: 164 (10 UL)
Full Text
Peer Reviewed
See detailShort-term Time Series Forecasting with Regression Automata
Lin, Qin; Hammerschmidt, Christian UL; Pellegrino, Gaetano et al

Poster (2016)

We present regression automata (RA), which are novel type syntactic models for time series forecasting. Building on top of conventional state-merging algorithms for identifying automata, RA use numeric ... [more ▼]

We present regression automata (RA), which are novel type syntactic models for time series forecasting. Building on top of conventional state-merging algorithms for identifying automata, RA use numeric data in addition to symbolic values and make predictions based on this data in a regression fashion. We apply our model to the problem of hourly wind speed and wind power forecasting. Our results show that RA outperform other state-of-the-art approaches for predicting both wind speed and power generation. In both cases, short-term predictions are used for resource allocation and infrastructure load balancing. For those critical tasks, the ability to inspect and interpret the generative model RA provide is an additional benefit. [less ▲]

Detailed reference viewed: 78 (4 UL)