Feature location benchmark for extractive software product line adoption research using realistic and synthetic Eclipse variants; ; Papadakis, Mike et alin Information and Software Technology (2018) Detailed reference viewed: 79 (3 UL) Mutant Quality IndicatorsPapadakis, Mike ; Titcheu Chekam, Thierry ; Le Traon, Yves ![]() in 13th International Workshop on Mutation Analysis (MUTATION'18) (2018) Detailed reference viewed: 200 (14 UL) A training-resistant anomaly detection systemMuller, Steve ; ; et alin Computers & Security (2018), 76 Modern network intrusion detection systems rely on machine learning techniques to detect traffic anomalies and thus intruders. However, the ability to learn the network behaviour in real-time comes at a ... [more ▼] Modern network intrusion detection systems rely on machine learning techniques to detect traffic anomalies and thus intruders. However, the ability to learn the network behaviour in real-time comes at a cost: malicious software can interfere with the learning process, and teach the intrusion detection system to accept dangerous traffic. This paper presents an intrusion detection system (IDS) that is able to detect common network attacks including but not limited to, denial-of-service, bot nets, intrusions, and network scans. With the help of the proposed example IDS, we show to what extent the training attack (and more sophisticated variants of it) has an impact on machine learning based detection schemes, and how it can be detected. © 2018 Elsevier Ltd [less ▲] Detailed reference viewed: 126 (6 UL) Mining Fix Patterns for FindBugs ViolationsLiu, Kui ; ; Bissyande, Tegawendé François D Assise et alin IEEE Transactions on Software Engineering (2018) Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the ... [more ▼] Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the adoption of these tools is hindered by their high false positive rates. If the false positive rate is too high, developers may get acclimated to violation reports from these tools, causing concrete and severe bugs being overlooked. Fortunately, some violations are actually addressed and resolved by developers. We claim that those violations that are recurrently fixed are likely to be true positives, and an automated approach can learn to repair similar unseen violations. However, there is lack of a systematic way to investigate the distributions on existing violations and fixed ones in the wild, that can provide insights into prioritizing violations for developers, and an effective way to mine code and fix patterns which can help developers easily understand the reasons of leading violations and how to fix them. In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair. [less ▲] Detailed reference viewed: 85 (3 UL) PROFICIENT: Productivity Tool for Semantic Interoperability in an Open IoT EcosystemKolbe, Niklas ; Robert, Jérémy ; et alin Proceedings of the 14th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (2017, November 07) The Internet of Things (IoT) is promising to open up opportunities for businesses to offer new services to uncover untapped needs. However, before taking advantage of such opportunities, there are still ... [more ▼] The Internet of Things (IoT) is promising to open up opportunities for businesses to offer new services to uncover untapped needs. However, before taking advantage of such opportunities, there are still challenges ahead, one of which is the development of strategies to abstract from the heterogeneity of APIs that shape today's IoT. It is becoming increasingly complex for developers and smart connected objects to efficiently discover, parse, aggregate and process data from disparate information systems, as different protocols, data models, and serializations for APIs exist on the market. Standards play an indisputable role in reducing such a complexity, but will not solve all problems related to interoperability. For example, it will remain a permanent need to help and guide data/service providers to efficiently describe the data/services they would like to expose to the IoT. This paper presents PROFICIENT, a productivity tool that fulfills this need, which is showcased and evaluated considering recent open messaging standards and a smart parking scenario. [less ▲] Detailed reference viewed: 195 (18 UL) On Locating Malicious Code in Piggybacked Android AppsLi, Li ; Li, Daoyuan ; Bissyande, Tegawendé François D Assise et alin Journal of Computer Science & Technology (2017) To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to ... [more ▼] To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. Towards addressing this need, we propose in this work a tool-based approach called HookRanker, which provides ranked lists of potentially malicious packages based on the way malware behaviour code is triggered. With experiments on a ground truth of piggybacked apps, we are able to automatically locate the malicious packages from piggybacked Android apps with an accuracy@5 of 83.6% for such packages that are triggered through method invocations and an accuracy@5 of 82.2% for such packages that are triggered independently. [less ▲] Detailed reference viewed: 138 (9 UL) Towards a Plug-and-Play and Holistic Data Mining Framework for Understanding and Facilitating Operations in Smart BuildingsLi, Daoyuan ; Bissyande, Tegawendé François D Assise ; Klein, Jacques et alReport (2017) Nowadays, a significant portion of the total energy consumption is attributed to the buildings sector. In order to save energy and protect the environment, energy consumption in buildings must be more ... [more ▼] Nowadays, a significant portion of the total energy consumption is attributed to the buildings sector. In order to save energy and protect the environment, energy consumption in buildings must be more efficient. At the same time, buildings should offer the same (if not more) comfort to their occupants. Consequently, modern buildings have been equipped with various sensors and actuators and interconnected control systems to meet occupants’ requirements. Unfortunately, so far, Building Automation Systems data have not been well-exploited due to technical and cost limitations. Yet, it can be exceptionally beneficial to take full advantage of the data flowing inside buildings in order to diagnose issues, explore solutions and improve occupant-building interactions. This paper presents a plug-and-play and holistic data mining framework named PHoliData for smart buildings to collect, store, visualize and mine useful information and domain knowledge from data in smart buildings. PHoliData allows non technical experts to easily explore and understand their buildings with minimum IT support. An architecture of this framework has been introduced and a prototype has been implemented and tested against real-world settings. Discussions with industry experts have suggested the system to be extremely helpful for understanding buildings, since it can provide hints about energy efficiency improvements. Finally, extensive experiments have demonstrated the feasibility of such a framework in practice and its advantage and potential for buildings operators. [less ▲] Detailed reference viewed: 93 (7 UL) Raising Time Awareness in Model-Driven Engineering; Hartmann, Thomas ; Mouline, Ludovic et alin 2017 ACM/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (2017, September) The conviction that big data analytics is a key for the success of modern businesses is growing deeper, and the mobilisation of companies into adopting it becomes increasingly important. Big data ... [more ▼] The conviction that big data analytics is a key for the success of modern businesses is growing deeper, and the mobilisation of companies into adopting it becomes increasingly important. Big data integration projects enable companies to capture their relevant data, to efficiently store it, turn it into domain knowledge, and finally monetize it. In this context, historical data, also called temporal data, is becoming increasingly available and delivers means to analyse the history of applications, discover temporal patterns, and predict future trends. Despite the fact that most data that today’s applications are dealing with is inherently temporal current approaches, methodologies, and environments for developing these applications don’t provide sufficient support for handling time. We envision that Model-Driven Engineering (MDE) would be an appropriate ecosystem for a seamless and orthogonal integration of time into domain modelling and processing. In this paper, we investigate the state-of-the-art in MDE techniques and tools in order to identify the missing bricks for raising time-awareness in MDE and outline research directions in this emerging domain. [less ▲] Detailed reference viewed: 150 (9 UL) Knowledge-based Consistency Index for Fuzzy Pairwise Comparison MatricesKubler, Sylvain ; ; et alin Knowledge-based Consistency Index for Fuzzy Pairwise Comparison Matrices (2017, July 10) Abstract—Fuzzy AHP is today one of the most used Multiple Criteria Decision-Making (MCDM) techniques. The main argument to introduce fuzzy set theory within AHP lies in its ability to handle uncertainty ... [more ▼] Abstract—Fuzzy AHP is today one of the most used Multiple Criteria Decision-Making (MCDM) techniques. The main argument to introduce fuzzy set theory within AHP lies in its ability to handle uncertainty and vagueness arising from decision makers (when performing pairwise comparisons between a set of criteria/alternatives). As humans usually reason with granular information rather than precise one, such pairwise comparisons may contain some degree of inconsistency that needs to be properly tackled to guarantee the relevance of the result/ranking. Over the last decades, several consistency indexes designed for fuzzy pairwise comparison matrices (FPCMs) were proposed, as will be discussed in this article. However, for some decision theory specialists, it appears that most of these indexes fail to be properly “axiomatically” founded, thus leading to misleading results. To overcome this, a new index, referred to as KCI (Knowledge-based Consistency Index) is introduced in this paper, and later compared with an existing index that is axiomatically well founded. The comparison results show that (i) both indexes perform similarly from a consistency measurement perspective, but (ii) KCI contributes to significantly reduce the computation time, which can save expert’s time in some MCDM problems. [less ▲] Detailed reference viewed: 105 (4 UL) Analyzing Complex Data in Motion at Scale with Temporal GraphsHartmann, Thomas ; Fouquet, François ; Jimenez, Matthieu et alin Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering (2017, July) Modern analytics solutions succeed to understand and predict phenomenons in a large diversity of software systems, from social networks to Internet-of-Things platforms. This success challenges analytics ... [more ▼] Modern analytics solutions succeed to understand and predict phenomenons in a large diversity of software systems, from social networks to Internet-of-Things platforms. This success challenges analytics algorithms to deal with more and more complex data, which can be structured as graphs and evolve over time. However, the underlying data storage systems that support large-scale data analytics, such as time-series or graph databases, fail to accommodate both dimensions, which limits the integration of more advanced analysis taking into account the history of complex graphs, for example. This paper therefore introduces a formal and practical definition of temporal graphs. Temporal graphs pro- vide a compact representation of time-evolving graphs that can be used to analyze complex data in motion. In particular, we demonstrate with our open-source implementation, named GREYCAT, that the performance of temporal graphs allows analytics solutions to deal with rapidly evolving large-scale graphs. [less ▲] Detailed reference viewed: 205 (13 UL) Enriching a Situation Awareness Framework for IoT with Knowledge Base and Reasoning ComponentsKolbe, Niklas ; ; Kubler, Sylvain et alin Modeling and Using Context (2017, July) Theimportanceofsystem-levelcontext-andsituationaware- ness increases with the growth of the Internet of Things (IoT). This paper proposes an integrated approach to situation awareness by providing a ... [more ▼] Theimportanceofsystem-levelcontext-andsituationaware- ness increases with the growth of the Internet of Things (IoT). This paper proposes an integrated approach to situation awareness by providing a semantically rich situation model together with reliable situation infer- ence based on Context Spaces Theory (CST) and Situation Theory (ST). The paper discusses benefits of integrating the proposed situation aware- ness framework with knowledge base and efficient reasoning techniques taking into account uncertainty and incomplete knowledge about situa- tions. The paper discusses advantages and impact of proposed context adaptation in dynamic IoT environments. Practical issues of two-way mapping between IoT messaging standards and CST are also discussed. [less ▲] Detailed reference viewed: 125 (4 UL) Towards Semantic Interoperability in an Open IoT Ecosystem for Connected Vehicle ServicesKolbe, Niklas ; Kubler, Sylvain ; Robert, Jérémy et alin 2017 IEEE Global Internet of Things Summit (GIoTS) Proceedings (2017, July) A present challenge in today’s Internet of Things (IoT) ecosystem is to enable interoperability across hetero- geneous systems and service providers. Restricted access to data sources and services limits ... [more ▼] A present challenge in today’s Internet of Things (IoT) ecosystem is to enable interoperability across hetero- geneous systems and service providers. Restricted access to data sources and services limits the capabilities of a smart city to improve social, environmental and economic aspects. Interoperability in the IoT is concerned with both, messaging interfaces and semantic understanding of heterogeneous data. In this paper, the first building blocks of an emerging open IoT ecosystem developed at the EU level are presented. Se- mantic web technologies are applied to the existing messaging components to support and improve semantic interoperability. The approach is demonstrated with a proof-of-concept for connected vehicle services in a smart city setting. [less ▲] Detailed reference viewed: 224 (11 UL) Impact of Tool Support in Patch ConstructionKoyuncu, Anil ; Bissyande, Tegawendé François D Assise ; Kim, Dongsun et alScientific Conference (2017, July) In this work, we investigate the practice of patch construction in the Linux kernel development, focusing on the differences between three patching processes: (1) patches crafted entirely manually to fix ... [more ▼] In this work, we investigate the practice of patch construction in the Linux kernel development, focusing on the differences between three patching processes: (1) patches crafted entirely manually to fix bugs, (2) those that are derived from warnings of bug detection tools, and (3) those that are automatically generated based on fix patterns. With this study, we provide to the research community concrete insights on the practice of patching as well as how the development community is currently embracing research and commercial patching tools to improve productivity in repair. The result of our study shows that tool-supported patches are increasingly adopted by the developer community while manually-written patches are accepted more quickly. Patch application tools enable developers to remain committed to contributing patches to the code base. Our findings also include that, in actual development processes, patches generally implement several change operations spread over the code, even for patches fixing warnings by bug detection tools. Finally, this study has shown that there is an opportunity to directly leverage the output of bug detection tools to readily generate patches that are appropriate for fixing the problem, and that are consistent with manually-written patches. [less ▲] Detailed reference viewed: 165 (18 UL) The Next Evolution of MDE: A Seamless Integration of Machine Learning into Domain ModelingHartmann, Thomas ; ; Fouquet, François et alin Software & Systems Modeling (2017) Machine learning algorithms are designed to resolve unknown behaviors by extracting commonalities over massive datasets. Unfortunately, learning such global behaviors can be inaccurate and slow for ... [more ▼] Machine learning algorithms are designed to resolve unknown behaviors by extracting commonalities over massive datasets. Unfortunately, learning such global behaviors can be inaccurate and slow for systems composed of heterogeneous elements, which behave very differently, for instance as it is the case for cyber-physical systems andInternet of Things applications. Instead, to make smart deci-sions, such systems have to continuously refine the behavior on a per-element basis and compose these small learning units together. However, combining and composing learned behaviors from different elements is challenging and requires domain knowledge. Therefore, there is a need to structure and combine the learned behaviors and domain knowledge together in a flexible way. In this paper we propose to weave machine learning into domain modeling. More specifically, we suggest to decompose machine learning into reusable, chainable, and independently computable small learning units, which we refer to as microlearning units.These micro learning units are modeled together with and at the same level as the domain data. We show, based on asmart grid case study, that our approach can be significantly more accurate than learning a global behavior, while the performance is fast enough to be used for live learning. [less ▲] Detailed reference viewed: 262 (12 UL) An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation that Avoids the Unreliable Clean Program AssumptionTitcheu Chekam, Thierry ; Papadakis, Mike ; Le Traon, Yves et alin International Conference on Software Engineering (ICSE 2017) (2017, May 28) Many studies suggest using coverage concepts, such as branch coverage, as the starting point of testing, while others as the most prominent test quality indicator. Yet the relationship between coverage ... [more ▼] Many studies suggest using coverage concepts, such as branch coverage, as the starting point of testing, while others as the most prominent test quality indicator. Yet the relationship between coverage and fault-revelation remains unknown, yielding uncertainty and controversy. Most previous studies rely on the Clean Program Assumption, that a test suite will obtain similar coverage for both faulty and fixed (‘clean’) program versions. This assumption may appear intuitive, especially for bugs that denote small semantic deviations. However, we present evidence that the Clean Program Assumption does not always hold, thereby raising a critical threat to the validity of previous results. We then conducted a study using a robust experimental methodology that avoids this threat to validity, from which our primary finding is that strong mutation testing has the highest fault revelation of four widely-used criteria. Our findings also revealed that fault revelation starts to increase significantly only once relatively high levels of coverage are attained. [less ▲] Detailed reference viewed: 347 (37 UL) Euphony: Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android MalwareHurier, Médéric ; ; et alin MSR 2017 (2017, May 21) Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the ... [more ▼] Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the proliferation of collective repositories sharing the latest specimens. Having access to a large number of samples opens new research directions aiming at efficiently vetting apps. However, automatically inferring a reference ground-truth from those repositories is not straightforward and can inadvertently lead to unforeseen misconceptions. On the one hand, samples are often mis-labeled as different parties use distinct naming schemes for the same sample. On the other hand, samples are frequently mis-classified due to conceptual errors made during labeling processes. In this paper, we analyze the associations between all labels given by different vendors and we propose a system called EUPHONY to systematically unify common samples into family groups. The key novelty of our approach is that no a-priori knowledge on malware families is needed. We evaluate our approach using reference datasets and more than 0.4 million additional samples outside of these datasets. Results show that EUPHONY provides competitive performance against the state-of-the-art. [less ▲] Detailed reference viewed: 270 (21 UL) Understanding Android App PiggybackingLi, Li ; Li, Daoyuan ; Bissyande, Tegawendé François D Assise et alPoster (2017, May) The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a ... [more ▼] The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a large user base. Despite the fact that the literature has already presented a number of tools to detect piggybacked apps, there is still lacking a comprehensive investigation on the piggybacking processes. To fill this gap, in this work, we collect a large set of benign/piggybacked app pairs that can be taken as benchmark apps for further investigation. We manually look into these benchmark pairs for understanding the characteristics of piggybacking apps and eventually we report 20 interesting findings. We expect these findings to initiate new research directions such as practical and scalable piggybacked app detection, explainable malware detection, and malicious code location. [less ▲] Detailed reference viewed: 220 (11 UL) The Multi-Generation Repackaging HypothesisLi, Li ; Bissyande, Tegawendé François D Assise ; Bartel, Alexandre et alPoster (2017, May) App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately ... [more ▼] App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately, although most research involves pairwise similarity comparison to distinguish repackaged apps from their “original” counterparts, no work has considered the threat to validity of not being able to discover the true original apps. We provide in this paper preliminary insights of an investigation into the Multi-Generation Repackaging Hypothesis: is the original in a repackaging process the outcome of a previous repackaging process? Leveraging the Androzoo dataset of over 5 million Android apps, we validate this hypothesis in the wild, calling upon the community to take this threat into account in new solutions for repackaged app detection. [less ▲] Detailed reference viewed: 207 (10 UL) Automatically Locating Malicious Packages in Piggybacked Android AppsLi, Li ; Li, Daoyuan ; Bissyande, Tegawendé François D Assise et alin Abstract book of the 4th IEEE/ACM International Conference on Mobile Software Engineering and Systems (MobileSoft 2017) (2017, May) To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to ... [more ▼] To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. Towards addressing this need, we propose in this work a tool-based approach called HookRanker, which provides ranked lists of potentially malicious packages based on the way malware behaviour code is triggered. With experiments on a ground truth set of piggybacked apps, we are able to automatically locate the malicious packages from piggybacked Android apps with an accuracy of 83.6% in verifying the top five reported items. [less ▲] Detailed reference viewed: 247 (22 UL) Sensing by Proxy in Buildings with Agglomerative Clustering of Indoor Temperature MovementsLi, Daoyuan ; Bissyande, Tegawendé François D Assise ; Klein, Jacques et alin The 32nd ACM Symposium on Applied Computing (SAC 2017) (2017, April) As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more ... [more ▼] As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more concerned with their privacy in buildings, explicit sensing techniques can lead to uncomfortableness and resistance from occupants. In this paper, we adapt a sensing by proxy paradigm that monitors building status and coarse occupant activities through agglomerative clustering of indoor temperature movements. Through extensive experimentation on 86 classrooms, offices and labs in a five-story school building in western Europe, we prove that indoor temperature movements can be leveraged to infer latent information about indoor environments, especially about rooms' relative physical locations and rough type of occupant activities. Our results evidence a cost-effective approach to extending commercial building control systems and gaining extra relevant intelligence from such systems. [less ▲] Detailed reference viewed: 183 (19 UL) |
||