References of "Bissyande, Tegawendé François D Assise 50000802"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailFaCoY - A Code-to-Code Search Engine
Kim, Kisub UL; Kim, Dongsun UL; Bissyande, Tegawendé François D Assise UL et al

in International Conference on Software Engineering (ICSE 2018) (2018, May 27)

Code search is an unavoidable activity in software development. Various approaches and techniques have been explored in the literature to support code search tasks. Most of these approaches focus on ... [more ▼]

Code search is an unavoidable activity in software development. Various approaches and techniques have been explored in the literature to support code search tasks. Most of these approaches focus on serving user queries provided as natural language free-form input. However, there exists a wide range of use-case scenarios where a code-to-code approach would be most beneficial. For example, research directions in code transplantation, code diversity, patch recommendation can leverage a code-to-code search engine to find essential ingredients for their techniques. In this paper, we propose FaCoY, a novel approach for statically finding code fragments which may be semantically similar to user input code. FaCoY implements a query alternation strategy: instead of directly matching code query tokens with code in the search space, FaCoY first attempts to identify other tokens which may also be relevant in implementing the functional behavior of the input code. With various experiments, we show that (1) FaCoY is more effective than online code-to-code search engines; (2) FaCoY can detect more semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-theart; (3) FaCoY, while static, can detect code fragments which are indeed similar with respect to runtime execution behavior; and (4) FaCoY can be useful in code/patch recommendation. [less ▲]

Detailed reference viewed: 212 (20 UL)
Full Text
Peer Reviewed
See detailTowards Estimating and Predicting User Perception on Software Product Variants
Martinez, Jabier; Sottet, Jean-Sebastien; Garcia-Frey, Alfonso et al

in 17th International Conference on Software Reuse (ICSR) (2018, May)

Detailed reference viewed: 94 (3 UL)
Full Text
Peer Reviewed
See detailCharacterising Deprecated Android APIs
Li, Li; Gao, Jun UL; Bissyande, Tegawendé François D Assise UL et al

in 15th International Conference on Mining Software Repositories (MSR 2018) (2018, May)

Detailed reference viewed: 187 (9 UL)
Full Text
Peer Reviewed
See detailExtracting Statistical Graph Features for Accurate and Efficient Time Series Classification
Li, Daoyuan UL; Lin, Jessica; Bissyande, Tegawendé François D Assise UL et al

in 21st International Conference on Extending Database Technology (2018, March)

This paper presents a multiscale visibility graph representation for time series as well as feature extraction methods for time series classification (TSC). Unlike traditional TSC approaches that seek to ... [more ▼]

This paper presents a multiscale visibility graph representation for time series as well as feature extraction methods for time series classification (TSC). Unlike traditional TSC approaches that seek to find global similarities in time series databases (eg., Nearest Neighbor with Dynamic Time Warping distance) or methods specializing in locating local patterns/subsequences (eg., shapelets), we extract solely statistical features from graphs that are generated from time series. Specifically, we augment time series by means of their multiscale approximations, which are further transformed into a set of visibility graphs. After extracting probability distributions of small motifs, density, assortativity, etc., these features are used for building highly accurate classification models using generic classifiers (eg., Support Vector Machine and eXtreme Gradient Boosting). Thanks to the way how we transform time series into graphs and extract features from them, we are able to capture both global and local features from time series. Based on extensive experiments on a large number of open datasets and comparison with five state-of-the-art TSC algorithms, our approach is shown to be both accurate and efficient: it is more accurate than Learning Shapelets and at the same time faster than Fast Shapelets. [less ▲]

Detailed reference viewed: 712 (13 UL)
Full Text
Peer Reviewed
See detailAugmenting and Structuring User Queries to Support Efficient Free-Form Code Search
Sirres, Raphael; Bissyande, Tegawendé François D Assise UL; Kim, Dongsun et al

in Empirical Software Engineering (2018), 90

Detailed reference viewed: 115 (6 UL)
Full Text
Peer Reviewed
See detailPredicting the Fault Revelation Utility of Mutants
Titcheu Chekam, Thierry UL; Papadakis, Mike UL; Bissyande, Tegawendé François D Assise UL et al

in 40th International Conference on Software Engineering, Gothenburg, Sweden, May 27 - 3 June 2018 (2018)

Detailed reference viewed: 270 (22 UL)
Full Text
Peer Reviewed
See detailMining Fix Patterns for FindBugs Violations
Liu, Kui UL; Kim, Dongsun; Bissyande, Tegawendé François D Assise UL et al

in IEEE Transactions on Software Engineering (2018)

Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the ... [more ▼]

Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the adoption of these tools is hindered by their high false positive rates. If the false positive rate is too high, developers may get acclimated to violation reports from these tools, causing concrete and severe bugs being overlooked. Fortunately, some violations are actually addressed and resolved by developers. We claim that those violations that are recurrently fixed are likely to be true positives, and an automated approach can learn to repair similar unseen violations. However, there is lack of a systematic way to investigate the distributions on existing violations and fixed ones in the wild, that can provide insights into prioritizing violations for developers, and an effective way to mine code and fix patterns which can help developers easily understand the reasons of leading violations and how to fix them. In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair. [less ▲]

Detailed reference viewed: 146 (7 UL)
Full Text
Peer Reviewed
See detailFeature location benchmark for extractive software product line adoption research using realistic and synthetic Eclipse variants
Martinez, Jabier; Ziadi, Tewfik; Papadakis, Mike UL et al

in Information and Software Technology (2018)

Detailed reference viewed: 149 (5 UL)
Full Text
Peer Reviewed
See detailOn Locating Malicious Code in Piggybacked Android Apps
Li, Li UL; Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL et al

in Journal of Computer Science and Technology (2017)

To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to ... [more ▼]

To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. Towards addressing this need, we propose in this work a tool-based approach called HookRanker, which provides ranked lists of potentially malicious packages based on the way malware behaviour code is triggered. With experiments on a ground truth of piggybacked apps, we are able to automatically locate the malicious packages from piggybacked Android apps with an accuracy@5 of 83.6% for such packages that are triggered through method invocations and an accuracy@5 of 82.2% for such packages that are triggered independently. [less ▲]

Detailed reference viewed: 200 (10 UL)
Full Text
See detailTowards a Plug-and-Play and Holistic Data Mining Framework for Understanding and Facilitating Operations in Smart Buildings
Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL; Klein, Jacques UL et al

Report (2017)

Nowadays, a significant portion of the total energy consumption is attributed to the buildings sector. In order to save energy and protect the environment, energy consumption in buildings must be more ... [more ▼]

Nowadays, a significant portion of the total energy consumption is attributed to the buildings sector. In order to save energy and protect the environment, energy consumption in buildings must be more efficient. At the same time, buildings should offer the same (if not more) comfort to their occupants. Consequently, modern buildings have been equipped with various sensors and actuators and interconnected control systems to meet occupants’ requirements. Unfortunately, so far, Building Automation Systems data have not been well-exploited due to technical and cost limitations. Yet, it can be exceptionally beneficial to take full advantage of the data flowing inside buildings in order to diagnose issues, explore solutions and improve occupant-building interactions. This paper presents a plug-and-play and holistic data mining framework named PHoliData for smart buildings to collect, store, visualize and mine useful information and domain knowledge from data in smart buildings. PHoliData allows non technical experts to easily explore and understand their buildings with minimum IT support. An architecture of this framework has been introduced and a prototype has been implemented and tested against real-world settings. Discussions with industry experts have suggested the system to be extremely helpful for understanding buildings, since it can provide hints about energy efficiency improvements. Finally, extensive experiments have demonstrated the feasibility of such a framework in practice and its advantage and potential for buildings operators. [less ▲]

Detailed reference viewed: 142 (7 UL)
Full Text
Peer Reviewed
See detailImpact of Tool Support in Patch Construction
Koyuncu, Anil UL; Bissyande, Tegawendé François D Assise UL; Kim, Dongsun UL et al

Scientific Conference (2017, July)

In this work, we investigate the practice of patch construction in the Linux kernel development, focusing on the differences between three patching processes: (1) patches crafted entirely manually to fix ... [more ▼]

In this work, we investigate the practice of patch construction in the Linux kernel development, focusing on the differences between three patching processes: (1) patches crafted entirely manually to fix bugs, (2) those that are derived from warnings of bug detection tools, and (3) those that are automatically generated based on fix patterns. With this study, we provide to the research community concrete insights on the practice of patching as well as how the development community is currently embracing research and commercial patching tools to improve productivity in repair. The result of our study shows that tool-supported patches are increasingly adopted by the developer community while manually-written patches are accepted more quickly. Patch application tools enable developers to remain committed to contributing patches to the code base. Our findings also include that, in actual development processes, patches generally implement several change operations spread over the code, even for patches fixing warnings by bug detection tools. Finally, this study has shown that there is an opportunity to directly leverage the output of bug detection tools to readily generate patches that are appropriate for fixing the problem, and that are consistent with manually-written patches. [less ▲]

Detailed reference viewed: 224 (20 UL)
Full Text
Peer Reviewed
See detailEuphony: Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware
Hurier, Médéric UL; Suarez-Tangil, Guillermo; Dash, Santanu Kumar et al

in MSR 2017 (2017, May 21)

Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the ... [more ▼]

Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the proliferation of collective repositories sharing the latest specimens. Having access to a large number of samples opens new research directions aiming at efficiently vetting apps. However, automatically inferring a reference ground-truth from those repositories is not straightforward and can inadvertently lead to unforeseen misconceptions. On the one hand, samples are often mis-labeled as different parties use distinct naming schemes for the same sample. On the other hand, samples are frequently mis-classified due to conceptual errors made during labeling processes. In this paper, we analyze the associations between all labels given by different vendors and we propose a system called EUPHONY to systematically unify common samples into family groups. The key novelty of our approach is that no a-priori knowledge on malware families is needed. We evaluate our approach using reference datasets and more than 0.4 million additional samples outside of these datasets. Results show that EUPHONY provides competitive performance against the state-of-the-art. [less ▲]

Detailed reference viewed: 334 (26 UL)
Full Text
Peer Reviewed
See detailUnderstanding Android App Piggybacking
Li, Li UL; Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL et al

Poster (2017, May)

The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a ... [more ▼]

The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a large user base. Despite the fact that the literature has already presented a number of tools to detect piggybacked apps, there is still lacking a comprehensive investigation on the piggybacking processes. To fill this gap, in this work, we collect a large set of benign/piggybacked app pairs that can be taken as benchmark apps for further investigation. We manually look into these benchmark pairs for understanding the characteristics of piggybacking apps and eventually we report 20 interesting findings. We expect these findings to initiate new research directions such as practical and scalable piggybacked app detection, explainable malware detection, and malicious code location. [less ▲]

Detailed reference viewed: 267 (11 UL)
Full Text
Peer Reviewed
See detailAutomatically Locating Malicious Packages in Piggybacked Android Apps
Li, Li UL; Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL et al

in Abstract book of the 4th IEEE/ACM International Conference on Mobile Software Engineering and Systems (MobileSoft 2017) (2017, May)

To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to ... [more ▼]

To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. Towards addressing this need, we propose in this work a tool-based approach called HookRanker, which provides ranked lists of potentially malicious packages based on the way malware behaviour code is triggered. With experiments on a ground truth set of piggybacked apps, we are able to automatically locate the malicious packages from piggybacked Android apps with an accuracy of 83.6% in verifying the top five reported items. [less ▲]

Detailed reference viewed: 307 (23 UL)
Full Text
Peer Reviewed
See detailThe Multi-Generation Repackaging Hypothesis
Li, Li UL; Bissyande, Tegawendé François D Assise UL; Bartel, Alexandre UL et al

Poster (2017, May)

App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately ... [more ▼]

App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately, although most research involves pairwise similarity comparison to distinguish repackaged apps from their “original” counterparts, no work has considered the threat to validity of not being able to discover the true original apps. We provide in this paper preliminary insights of an investigation into the Multi-Generation Repackaging Hypothesis: is the original in a repackaging process the outcome of a previous repackaging process? Leveraging the Androzoo dataset of over 5 million Android apps, we validate this hypothesis in the wild, calling upon the community to take this threat into account in new solutions for repackaged app detection. [less ▲]

Detailed reference viewed: 279 (10 UL)
Full Text
Peer Reviewed
See detailSensing by Proxy in Buildings with Agglomerative Clustering of Indoor Temperature Movements
Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL; Klein, Jacques UL et al

in The 32nd ACM Symposium on Applied Computing (SAC 2017) (2017, April)

As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more ... [more ▼]

As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more concerned with their privacy in buildings, explicit sensing techniques can lead to uncomfortableness and resistance from occupants. In this paper, we adapt a sensing by proxy paradigm that monitors building status and coarse occupant activities through agglomerative clustering of indoor temperature movements. Through extensive experimentation on 86 classrooms, offices and labs in a five-story school building in western Europe, we prove that indoor temperature movements can be leveraged to infer latent information about indoor environments, especially about rooms' relative physical locations and rough type of occupant activities. Our results evidence a cost-effective approach to extending commercial building control systems and gaining extra relevant intelligence from such systems. [less ▲]

Detailed reference viewed: 241 (19 UL)
Full Text
Peer Reviewed
See detailUnderstanding Android App Piggybacking: A Systematic Study of Malicious Code Grafting
Li, Li UL; Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL et al

in IEEE Transactions on Information Forensics and Security (2017)

The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has ... [more ▼]

The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has produced approaches and tools to identify piggybacked apps, the literature lacks a comprehensive investigation into such phenomenon. We fill this gap by 1) systematically building a large set of piggybacked and benign apps pairs, which we release to the community, 2) empirically studying the characteristics of malicious piggybacked apps in comparison with their benign counterparts, and 3) providing insights on piggybacking processes. Among several findings providing insights, analysis techniques should build upon to improve the overall detection and classification accuracy of piggybacked apps, we show that piggybacking operations not only concern app code but also extensively manipulates app resource files, largely contradicting common beliefs. We also find that piggybacking is done with little sophistication, in many cases automatically, and often via library code. [less ▲]

Detailed reference viewed: 324 (28 UL)
Full Text
See detailAugmenting and Structuring User Queries to Support Efficient Free-Form Code Search
Sirres, Raphael; Bissyande, Tegawendé François D Assise UL; Kim, Dongsun UL et al

Report (2017)

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this ... [more ▼]

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present Code voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and StackOverflow Q\&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users complete solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for StackOverflow questions. [less ▲]

Detailed reference viewed: 296 (31 UL)
Full Text
Peer Reviewed
See detailComprehending Malicious Android Apps By Mining Topic-Specific Data Flow Signatures
Yang, Xinli; Lo, David; Li, Li UL et al

in Information and Software Technology (2017)

Context: State-of-the-art works on automated detection of Android malware have leveraged app descriptions to spot anomalies w.r.t the functionality implemented, or have used data flow information as a ... [more ▼]

Context: State-of-the-art works on automated detection of Android malware have leveraged app descriptions to spot anomalies w.r.t the functionality implemented, or have used data flow information as a feature to discriminate malicious from benign apps. Although these works have yielded promising performance, we hypothesize that these performances can be improved by a better understanding of malicious behavior. Objective: To characterize malicious apps, we take into account both information on app descriptions, which are indicative of apps’ topics, and information on sensitive data flow, which can be relevant to discriminate malware from benign apps. Method: In this paper, we propose a topic-specific approach to malware comprehension based on app descriptions and data-flow information. First, we use an advanced topic model, adaptive LDA with GA, to cluster apps according to their descriptions. Then, we use information gain ratio of sensitive data flow information to build so-called “topic-specific data flow signatures”. Results: We conduct an empirical study on 3691 benign and 1612 malicious apps. We group them into 118 topics and generate topic-specific data flow signature. We verify the effectiveness of the topic-specific data flow signatures by comparing them with the overall data flow signature. In addition, we perform a deeper analysis on 25 representative topic-specific signatures and yield several implications. Conclusion: Topic-specific data flow signatures are efficient in highlighting the malicious behavior, and thus can help in characterizing malware. [less ▲]

Detailed reference viewed: 246 (13 UL)
Full Text
Peer Reviewed
See detailStatic Analysis of Android Apps: A Systematic Literature Review
Li, Li UL; Bissyande, Tegawendé François D Assise UL; Papadakis, Mike UL et al

in Information and Software Technology (2017)

Context: Static analysis exploits techniques that parse program source code or bytecode, often traversing program paths to check some program properties. Static analysis approaches have been proposed for ... [more ▼]

Context: Static analysis exploits techniques that parse program source code or bytecode, often traversing program paths to check some program properties. Static analysis approaches have been proposed for different tasks, including for assessing the security of Android apps, detecting app clones, automating test cases generation, or for uncovering non-functional issues related to performance or energy. The literature thus has proposed a large body of works, each of which attempts to tackle one or more of the several challenges that program analysers face when dealing with Android apps. Objective: We aim to provide a clear view of the state-of-the-art works that statically analyse Android apps, from which we highlight the trends of static analysis approaches, pinpoint where the focus has been put, and enumerate the key aspects where future researches are still needed. Method: We have performed a systematic literature review (SLR) which involves studying 124 research papers published in software engineering, programming languages and security venues in the last 5 years (January 2011 - December 2015). This review is performed mainly in five dimensions: problems targeted by the approach, fundamental techniques used by authors, static analysis sensitivities considered, android characteristics taken into account and the scale of evaluation performed. Results: Our in-depth examination has led to several key findings: 1) Static analysis is largely performed to uncover security and privacy issues; 2) The Soot framework and the Jimple intermediate representation are the most adopted basic support tool and format, respectively; 3) Taint analysis remains the most applied technique in research approaches; 4) Most approaches support several analysis sensitivities, but very few approaches consider path-sensitivity; 5) There is no single work that has been proposed to tackle all challenges of static analysis that are related to Android programming; and 6) Only a small portion of state-of-the-art works have made their artefacts publicly available. Conclusion: The research community is still facing a number of challenges for building approaches that are aware altogether of implicit-Flows, dynamic code loading features, reflective calls, native code and multi-threading, in order to implement sound and highly precise static analyzers. [less ▲]

Detailed reference viewed: 402 (13 UL)