[en] Due to their promise of delivering real-time network insights, today's streaming
analytics platforms are increasingly being used in the communications networks
where the impact of the insights go beyond sentiment and trend analysis to
include real-time detection of security attacks and prediction of network
state (i.e., is the network transitioning towards an outage). Current
streaming analytics platforms operate under the assumption that arriving
traffic is to the order of kilobytes produced at very high frequencies.
However, communications networks, especially the telecommunication networks,
challenge this assumption because some of the arriving traffic in these
networks is to the order of gigabytes, but produced at medium to low velocities.
Furthermore, these large datasets may need to be ingested in their entirety
to render network insights in real-time. Our interest is to subject
today's streaming analytics platforms --- constructed from state-of-the art
software components (Kafka, Spark, HDFS, ElasticSearch) --- to traffic densities
observed in such communications networks. We find that filtering on such large
datasets is best done in a common upstream point instead of being pushed to, and
repeated, in downstream components. To demonstrate the advantages of such an
approach, we modify Apache Kafka to perform limited \emph{native} data
transformation and filtering, relieving the downstream Spark application from
doing this. Our approach outperforms four prevalent analytics pipeline
architectures with negligible overhead compared to standard Kafka.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Services and Data management research group (SEDAN) Nokia Bell Labs
Disciplines :
Computer science
Author, co-author :
Falk, Eric ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Gurbani, Vijay K.; Nokia Bell Labs
State, Radu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
External co-authors :
yes
Language :
English
Title :
Query-able Kafka: An agile data analytics pipeline for mobile wireless networks
Publication date :
August 2017
Event name :
43rd International Conference on Very Large Data Bases
Event date :
from 28-08-2017 to 01-09-2017
Journal title :
Proceedings of the 43rd International Conference on Very Large Data Bases 2017
Special issue title :
Proceedings of the 43rd International Conference on Very Large Data Bases 2017