Reference : Query-able Kafka: An agile data analytics pipeline for mobile wireless networks
Scientific congresses, symposiums and conference proceedings : Paper published in a journal
Engineering, computing & technology : Computer science
Computational Sciences
http://hdl.handle.net/10993/32831
Query-able Kafka: An agile data analytics pipeline for mobile wireless networks
English
Falk, Eric mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > >]
Gurbani, Vijay K. mailto [Nokia Bell Labs]
State, Radu mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > >]
Aug-2017
Proceedings of the 43rd International Conference on Very Large Data Bases 2017
10
Proceedings of the 43rd International Conference on Very Large Data Bases 2017
1646-1657
Yes
No
43rd International Conference on Very Large Data Bases
from 28-08-2017 to 01-09-2017
[en] Due to their promise of delivering real-time network insights, today's streaming
analytics platforms are increasingly being used in the communications networks
where the impact of the insights go beyond sentiment and trend analysis to
include real-time detection of security attacks and prediction of network
state (i.e., is the network transitioning towards an outage). Current
streaming analytics platforms operate under the assumption that arriving
traffic is to the order of kilobytes produced at very high frequencies.
However, communications networks, especially the telecommunication networks,
challenge this assumption because some of the arriving traffic in these
networks is to the order of gigabytes, but produced at medium to low velocities.
Furthermore, these large datasets may need to be ingested in their entirety
to render network insights in real-time. Our interest is to subject
today's streaming analytics platforms --- constructed from state-of-the art
software components (Kafka, Spark, HDFS, ElasticSearch) --- to traffic densities
observed in such communications networks. We find that filtering on such large
datasets is best done in a common upstream point instead of being pushed to, and
repeated, in downstream components. To demonstrate the advantages of such an
approach, we modify Apache Kafka to perform limited \emph{native} data
transformation and filtering, relieving the downstream Spark application from
doing this. Our approach outperforms four prevalent analytics pipeline
architectures with negligible overhead compared to standard Kafka.
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Services and Data management research group (SEDAN) ; Nokia Bell Labs
Researchers ; Professionals ; Students ; General public ; Others
http://hdl.handle.net/10993/32831

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Limited access
vldb-kafka.pdfAuthor postprint601.6 kBRequest a copy

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.