References of "Theobald, Martin 50026247"
     in
Bookmark and Share    
Full Text
See detailAsynchronous Stream Data Processing using a Light-Weight and High-Performance Dataflow Engine
Ellampallil Venugopal, Vinu UL; Theobald, Martin UL

Presentation (2020, December 11)

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous ... [more ▼]

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous progress in distributed stream processing systems in the past few years, the high-throughput and low-latency (a.k.a. high sustainable-throughput) requirement of modern applications is pushing the limits of traditional data processing infrastructures. This paper introduces a new distributed stream data processing engine (DSPE), called “Asynchronous Iterative Routing” or simply AIR, which implements a light-weight, dynamic sharding protocol. AIR expedites a direct and asynchronous communication among all the worker nodes via multiple Message Passing Interface (MPI) communication channels and thereby completely avoids any additional communication overhead with a dedicated master node. With its unique design, AIR scales out to clusters consisting of up to 8 nodes and 224 cores, performing much better than existing DSPEs, and it performs up to 15 times better than Spark and Flink in terms of sustainable-throughput. [less ▲]

Detailed reference viewed: 26 (2 UL)
See detail27th International Symposium on Temporal Representation and Reasoning, {TIME} 2020, September 23-25, 2020, Bozen-Bolzano, Italy
Mu{\~{n}}oz{-}Velasco, Emilo; Ozaki, Ana; Theobald, Martin UL

Book published by Schloss Dagstuhl - Leibniz-Zentrum f{\"{u}}r Informatik (2020)

Detailed reference viewed: 36 (5 UL)
Full Text
Peer Reviewed
See detailAIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing
Ellampallil Venugopal, Vinu UL; Theobald, Martin UL; Chaychi, Samira UL et al

in AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing (2020, September 01)

Distributed Stream Processing Engines (DSPEs) are currently among the most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow ... [more ▼]

Distributed Stream Processing Engines (DSPEs) are currently among the most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow programs and big data analytics. In this paper, we describe the architecture of our AIR engine, which is designed from scratch in C++ using the Message Passing Interface (MPI), pthreads for multithreading, and is directly deployed on top of a common HPC workload manager such as SLURM. AIR implements a light-weight, dynamic sharding protocol (referred to as “Asynchronous Iterative Routing”), which facilitates a direct and asynchronous communication among all worker nodes and thereby completely avoids any additional communication overhead with a dedicated master node. With its unique design, AIR fills the gap between the prevalent scale-out (but Java-based) architectures like Apache Spark and Flink, on one hand, and recent scale-up (and C++ based) prototypes such as StreamBox and PiCo, on the other hand. Our experiments over various benchmark settings confirm that AIR performs as good as the best scale-up SPEs on a single-node setup, while it outperforms existing scale-out DSPEs in terms of processing latency and sustainable throughput by a factor of up to 15 in a distributed setting. [less ▲]

Detailed reference viewed: 40 (5 UL)
Full Text
Peer Reviewed
See detailGuided Inductive Logic Programming: Cleaning Knowledge Bases with Iterative User Feedback
Wu, Yan; Chen, Jinchuan; Haxhidauti, Plarent et al

in Guided Inductive Logic Programming: Cleaning Knowledge Bases with Iterative User Feedback (2020, March 12)

Domain-oriented knowledge bases (KBs) such as DBpedia and YAGO are largely constructed by applying a set of predefined extraction rules to the semi-structured contents of Wikipedia articles. Although both ... [more ▼]

Domain-oriented knowledge bases (KBs) such as DBpedia and YAGO are largely constructed by applying a set of predefined extraction rules to the semi-structured contents of Wikipedia articles. Although both of these large-scale KBs achieve very high average precision values (above 95% for YAGO3), subtle mistakes in a few of the underlying extraction rules may still impose a substantial amount of systematic extraction mistakes for specific relations. For example, by applying the same regular expressions to extract person names of both Asian and Western nationality, YAGO erroneously swaps most of the family and given names of Asian person entities. For traditional rule-learning approaches based on Inductive Logic Programming (ILP), it is very difficult to detect these systematic extraction mistakes, since they usually occur only in a relatively small subdomain of the relations’ arguments. In this paper, we thus propose a guided form of ILP, coined “GILP”, that iteratively asks for small amounts of user feedback over a given KB to learn a set of data-cleaning rules that (1) best match the feedback and (2) also generalize to a larger portion of facts in the KB. We propose both algorithms and respective metrics to automatically assess the quality of the learned rules with respect to the user feedback. [less ▲]

Detailed reference viewed: 20 (1 UL)
Full Text
Peer Reviewed
See detailBenchmarking Synchronous and Asynchronous Stream Processing Systems
Ellampallil Venugopal, Vinu UL; Theobald, Martin UL

in Ellampallil Venugopal, Vinu; Theobald, Martin (Eds.) Benchmarking Synchronous and Asynchronous Stream Processing Systems (2020, January 02)

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous ... [more ▼]

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous progress in distributed stream processing systems in the past few years, the high-throughput and low-latency (a.k.a. high sustainable-throughput) requirement of modern applications is pushing the limits of traditional data processing infrastructures. To understand the upper bound of the maximum sustainable throughput that is possible for a given node configuration, we have designed multiple hard-coded multi-threaded processes (called ad-hoc dataflows) in C++ using Message Passing Interface (MPI) and Pthread libraries. Our preliminary results show that our ad-hoc design on average is 5.2 times better than Flink and 9.3 times better than Spark. [less ▲]

Detailed reference viewed: 39 (6 UL)
See detailScalable Uncertainty Management - 13th International Conference (SUM 2019), Compiegne, France, December 16-18, 2019, Proceedings
Ben Amor, Nahla; Quost, Benjamin; Theobald, Martin UL

Book published by Springer (2019)

Detailed reference viewed: 34 (3 UL)
Full Text
Peer Reviewed
See detailOuter and Anti Joins in Temporal-Probabilistic Databases
Papaioannou, Katerina; Theobald, Martin UL; Böhlen, Michael H.

in 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019 (2019, October 16)

Detailed reference viewed: 62 (2 UL)
Full Text
See detailLineage-Aware Temporal Windows: Supporting Set Operations in Temporal-Probabilistic Databases
Papaioannou, Katerina; Theobald, Martin UL; Böhlen, Michael H.

in CoRR (2019), abs/1910.00474

Detailed reference viewed: 20 (2 UL)
Full Text
Peer Reviewed
See detailAnytime Approximation in Probabilistic Databases via Scaled Dissociations
Van den Heuvel, Maarten; Ivanov, Peter; Gatterbauer, Wolfgang et al

in Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019 (2019, June 22)

Detailed reference viewed: 34 (1 UL)
Full Text
Peer Reviewed
See detailGeneralized Lineage-Aware Temporal Windows: Supporting Outer and Anti Joins in Temporal-Probabilistic Databases
Papaioannou, Katerina; Theobald, Martin UL; Böhlen, Michael H.

in CoRR (2019), abs/1902.04379

Detailed reference viewed: 21 (1 UL)
See detailIndexing for Graph Query Evaluation
Fletcher; Theobald, Martin UL

in Sakr, Sharif; Zomaya, Albert Y. (Eds.) Encyclopedia of Big Data Technologies (2019)

Detailed reference viewed: 56 (3 UL)
Full Text
Peer Reviewed
See detailInteractive feature selection for efficient customer recognition in contact centers: Dealing with common names.
Saberi, Morteza; Theobald, Martin UL; Hussain, Omar Khadeer et al

in Expert Systems with Applications (2018), 113

Detailed reference viewed: 163 (7 UL)
Full Text
Peer Reviewed
See detailA General Framework for Anytime Approximation in Probabilistic Databases
Van den Heuvel, Maarten; Geerts, Floris; Theobald, Martin UL et al

in CoRR (2018), abs/1806.10078

Detailed reference viewed: 62 (5 UL)
Full Text
Peer Reviewed
See detailSupporting Set Operations in Temporal-Probabilistic Databases
Papaioannou, Katerina; Theobald, Martin UL; Böhlen, Michael

in Proceedings of the 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018 (2018, April 16)

Detailed reference viewed: 107 (7 UL)
Full Text
See detailProceedings - 2017 ILILAS Distinguished Lectures
Bouvry, Pascal UL; Bisdorff, Raymond; Schommer, Christoph UL et al

Report (2018)

The Proceedings summarizes the 12 lectures that have taken place within the ILIAS Dinstguished Lecture series 2017. It contains a brief abstract of the talks as well as some additional information about ... [more ▼]

The Proceedings summarizes the 12 lectures that have taken place within the ILIAS Dinstguished Lecture series 2017. It contains a brief abstract of the talks as well as some additional information about each speaker. [less ▲]

Detailed reference viewed: 393 (46 UL)
See detailGCAI 2017: 3rd Global Conference on Artificial Intelligence, Miami, FL, USA, 18-22 October 2017
Benzmüller, Christoph UL; Lisetti, Christine; Theobald, Martin UL

Book published by EPiC Series in Computing, EasyChair (2017)

Detailed reference viewed: 122 (16 UL)
Full Text
Peer Reviewed
See detailJ-REED: Joint Relation Extraction and Entity Disambiguation
Nguyen, Dat Ba; Theobald, Martin UL; Weikum, Gerhard

in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017 (2017)

Detailed reference viewed: 135 (20 UL)