Results 1-20 of 150.
((uid:50002182))

Bookmark and Share    
Full Text
Peer Reviewed
See detailThe Multi-Generation Repackaging Hypothesis
Li, Li UL; Bissyande, Tegawendé François D Assise UL; Bartel, Alexandre UL et al

Poster (2017, May)

App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately ... [more ▼]

App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately, although most research involves pairwise similarity comparison to distinguish repackaged apps from their “original” counterparts, no work has considered the threat to validity of not being able to discover the true original apps. We provide in this paper preliminary insights of an investigation into the Multi-Generation Repackaging Hypothesis: is the original in a repackaging process the outcome of a previous repackaging process? Leveraging the Androzoo dataset of over 5 million Android apps, we validate this hypothesis in the wild, calling upon the community to take this threat into account in new solutions for repackaged app detection. [less ▲]

Detailed reference viewed: 25 (1 UL)
Full Text
Peer Reviewed
See detailUnderstanding Android App Piggybacking
Li, Li UL; Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL et al

Poster (2017, May)

The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a ... [more ▼]

The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a large user base. Despite the fact that the literature has already presented a number of tools to detect piggybacked apps, there is still lacking a comprehensive investigation on the piggybacking processes. To fill this gap, in this work, we collect a large set of benign/piggybacked app pairs that can be taken as benchmark apps for further investigation. We manually look into these benchmark pairs for understanding the characteristics of piggybacking apps and eventually we report 20 interesting findings. We expect these findings to initiate new research directions such as practical and scalable piggybacked app detection, explainable malware detection, and malicious code location. [less ▲]

Detailed reference viewed: 30 (4 UL)
Full Text
Peer Reviewed
See detailAutomatically Locating Malicious Packages in Piggybacked Android Apps
Li, Li UL; Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL et al

in 4th IEEE/ACM International Conference on Mobile Software Engineering and Systems (2017, May)

To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to ... [more ▼]

To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. Towards addressing this need, we propose in this work a tool-based approach called HookRanker, which provides ranked lists of potentially malicious packages based on the way malware behaviour code is triggered. With experiments on a ground truth set of piggybacked apps, we are able to automatically locate the malicious packages from piggybacked Android apps with an accuracy of 83.6% in verifying the top five reported items. [less ▲]

Detailed reference viewed: 53 (2 UL)
Full Text
Peer Reviewed
See detailSensing by Proxy in Buildings with Agglomerative Clustering of Indoor Temperature Movements
Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL; Klein, Jacques UL et al

in The 32nd ACM Symposium on Applied Computing (SAC 2017) (2017, April)

As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more ... [more ▼]

As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more concerned with their privacy in buildings, explicit sensing techniques can lead to uncomfortableness and resistance from occupants. In this paper, we adapt a sensing by proxy paradigm that monitors building status and coarse occupant activities through agglomerative clustering of indoor temperature movements. Through extensive experimentation on 86 classrooms, offices and labs in a five-story school building in western Europe, we prove that indoor temperature movements can be leveraged to infer latent information about indoor environments, especially about rooms' relative physical locations and rough type of occupant activities. Our results evidence a cost-effective approach to extending commercial building control systems and gaining extra relevant intelligence from such systems. [less ▲]

Detailed reference viewed: 39 (11 UL)
Full Text
Peer Reviewed
See detailUnderstanding Android App Piggybacking: A Systematic Study of Malicious Code Grafting
Li, Li UL; Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL et al

in IEEE Transactions on Information Forensics & Security (2017)

The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has ... [more ▼]

The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has produced approaches and tools to identify piggybacked apps, the literature lacks a comprehensive investigation into such phenomenon. We fill this gap by 1) systematically building a large set of piggybacked and benign apps pairs, which we release to the community, 2) empirically studying the characteristics of malicious piggybacked apps in comparison with their benign counterparts, and 3) providing insights on piggybacking processes. Among several findings providing insights, analysis techniques should build upon to improve the overall detection and classification accuracy of piggybacked apps, we show that piggybacking operations not only concern app code but also extensively manipulates app resource files, largely contradicting common beliefs. We also find that piggybacking is done with little sophistication, in many cases automatically, and often via library code. [less ▲]

Detailed reference viewed: 89 (12 UL)
Full Text
Peer Reviewed
See detailAssessing and Improving the Mutation Testing Practice of PIT
Laurent, Thomas; Papadakis, Mike UL; Kintis, Marinos UL et al

in 10th IEEE International Conference on Software Testing, Verification and Validation (2017)

Detailed reference viewed: 16 (2 UL)
Full Text
Peer Reviewed
See detailAn Empirical Analysis of Vulnerabilities in OpenSSL and the Linux Kernel
Jimenez, Matthieu UL; Papadakis, Mike UL; Le Traon, Yves UL

in 2016 Asia-Pacific Software Engineering Conference (APSEC) (2016, December)

Vulnerabilities are one of the main concerns faced by practitioners when working with security critical applications. Unfortunately, developers and security teams, even experienced ones, fail to identify ... [more ▼]

Vulnerabilities are one of the main concerns faced by practitioners when working with security critical applications. Unfortunately, developers and security teams, even experienced ones, fail to identify many of them with severe consequences. Vulnerabilities are hard to discover since they appear in various forms, caused by many different issues and their identification requires an attacker’s mindset. In this paper, we aim at increasing the understanding of vulnerabilities by investigating their characteristics on two major open-source software systems, i.e., the Linux kernel and OpenSSL. In particular, we seek to analyse and build a profile for vulnerable code, which can ultimately help researchers in building automated approaches like vulnerability prediction models. Thus, we examine the location, criticality and category of vulnerable code along with its relation with software metrics. To do so, we collect more than 2,200 vulnerable files accounting for 863 vulnerabilities and compute more than 35 software metrics. Our results indicate that while 9 Common Weakness Enumeration (CWE) types of vulnerabilities are prevalent, only 3 of them are critical in OpenSSL and 2 of them in the Linux kernel. They also indicate that different types of vulnerabilities have different characteristics, i.e., metric profiles, and that vulnerabilities of the same type have different profiles in the two projects we examined. We also found that the file structure of the projects can provide useful information related to the vulnerabilities. Overall, our results demonstrate the need for making project specific approaches that focus on specific types of vulnerabilities. [less ▲]

Detailed reference viewed: 96 (12 UL)
Full Text
Peer Reviewed
See detailVulnerability Prediction Models: A case study on the Linux Kernel
Jimenez, Matthieu UL; Papadakis, Mike UL; Le Traon, Yves UL

in 16th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2016, Raleigh, US, October 2-3, 2016 (2016, October)

To assist the vulnerability identification process, researchers proposed prediction models that highlight (for inspection) the most likely to be vulnerable parts of a system. In this paper we aim at ... [more ▼]

To assist the vulnerability identification process, researchers proposed prediction models that highlight (for inspection) the most likely to be vulnerable parts of a system. In this paper we aim at making a reliable replication and comparison of the main vulnerability prediction models. Thus, we seek for determining their effectiveness, i.e., their ability to distinguish between vulnerable and non-vulnerable components, in the context of the Linux Kernel, under different scenarios. To achieve the above-mentioned aims, we mined vulnerabilities reported in the National Vulnerability Database and created a large dataset with all vulnerable components of Linux from 2005 to 2016. Based on this, we then built and evaluated the prediction models. We observe that an approach based on the header files included and on function calls performs best when aiming at future vulnerabilities, while text mining is the best technique when aiming at random instances. We also found that models based on code metrics perform poorly. We show that in the context of the Linux kernel, vulnerability prediction models can be superior to random selection and relatively precise. Thus, we conclude that practitioners have a valuable tool for prioritizing their security inspection efforts. [less ▲]

Detailed reference viewed: 221 (16 UL)
Full Text
Peer Reviewed
See detailDSCo-NG: A Practical Language Modeling Approach for Time Series Classification
Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL; Klein, Jacques UL et al

in The 15th International Symposium on Intelligent Data Analysis (2016, October)

The abundance of time series data in various domains and their high dimensionality characteristic are challenging for harvesting useful information from them. To tackle storage and processing challenges ... [more ▼]

The abundance of time series data in various domains and their high dimensionality characteristic are challenging for harvesting useful information from them. To tackle storage and processing challenges, compression-based techniques have been proposed. Our previous work, Domain Series Corpus (DSCo), compresses time series into symbolic strings and takes advantage of language modeling techniques to extract from the training set knowledge about different classes. However, this approach was flawed in practice due to its excessive memory usage and the need for a priori knowledge about the dataset. In this paper we propose DSCo-NG, which reduces DSCo’s complexity and offers an efficient (linear time complexity and low memory footprint), accurate (performance comparable to approaches working on uncompressed data) and generic (so that it can be applied to various domains) approach for time series classification. Our confidence is backed with extensive experimental evaluation against publicly accessible datasets, which also offers insights on when DSCo-NG can be a better choice than others. [less ▲]

Detailed reference viewed: 79 (11 UL)
Full Text
Peer Reviewed
See detailAccessing Inaccessible Android APIs: An Empirical Study
Li, Li UL; Bissyande, Tegawendé François D Assise UL; Le Traon, Yves UL et al

in The 32nd International Conference on Software Maintenance and Evolution (ICSME) (2016, October)

As Android becomes a de-facto choice of development platform for mobile apps, developers extensively leverage its accompanying Software Development Kit to quickly build their apps. This SDK comes with a ... [more ▼]

As Android becomes a de-facto choice of development platform for mobile apps, developers extensively leverage its accompanying Software Development Kit to quickly build their apps. This SDK comes with a set of APIs which developers may find limited in comparison to what system apps can do or what framework developers are preparing to harness capabilities of new generation devices. Thus, developers may attempt to explore in advance the normally “inaccessible” APIs for building unique API-based functionality in their app. The Android programming model is unique in its kind. Inaccessible APIs, which however are used by developers, constitute yet another specificity of Android development, and is worth investigating to understand what they are, how they evolve over time, and who uses them. To that end, in this work, we empirically investigate 17 important releases of the Android framework source code base, and we find that inaccessible APIs are commonly implemented in the Android framework, which are further neither forward nor backward compatible. Moreover, a small set of inaccessible APIs can eventually become publicly accessible, while most of them are removed during the evolution, resulting in risks for such apps that have leveraged inaccessible APIs. Finally, we show that inaccessible APIs are indeed accessed by third-party apps, and the official Google Play store has tolerated the proliferation of apps leveraging inaccessible API methods. [less ▲]

Detailed reference viewed: 133 (8 UL)
Full Text
Peer Reviewed
See detailA state-of the-art survey & testbed of Fuzzy AHP (FAHP) applications
Kubler, Sylvain UL; Robert, Jérémy UL; William, Derigent et al

in Expert Systems with Applications (2016), 65

As a practical popular methodology for dealing with fuzziness and uncertainty in Multiple Criteria Decision-Making (MCDM), Fuzzy AHP (FAHP) has been applied to a wide range of applications. As of the time ... [more ▼]

As a practical popular methodology for dealing with fuzziness and uncertainty in Multiple Criteria Decision-Making (MCDM), Fuzzy AHP (FAHP) has been applied to a wide range of applications. As of the time of writing there is no state of the art survey of FAHP, we carry out a literature review of 190 application papers (i.e., applied research papers), published between 2004 and 2016, by classifying them on the basis of the area of application, the identified theme, the year of publication, and so forth. The identified themes and application areas have been chosen based upon the latest state-of-the-art survey of AHP conducted by Vaidya and Kumar (2006). To help readers extract quick and meaningful information, the reviewed papers are summarized in various tabular formats and charts. Unlike previous literature surveys, results and findings are made available through an online (and free) testbed, which can serve as a ready reference for those who wish to apply, modify or extend FAHP in various applications areas. This online testbed makes also available one or more fuzzy pairwise comparison matrices (FPCMs) from all the reviewed papers (255 matrices in total). In terms of results and findings, this survey shows that: (i) FAHP is used primarily in the Manufacturing, Industry and Government sectors; (ii) Asia is the torchbearer in this field, where FAHP is mostly applied in the theme areas of Selection and Evaluation; (iii) a significant amount of research papers (43% of the reviewed literature) combine FAHP with other tools, particularly with TOPSIS, QFD and ANP (AHP’s variant); (iv) Chang’s extent analysis method, which is used for FPCMs’ weight derivation in FAHP, is still the most popular method in spite of a number of criticisms in recent years (considered in 57% of the reviewed literature). [less ▲]

Detailed reference viewed: 51 (17 UL)
Full Text
Peer Reviewed
See detailCloud Providers Viability: How to Address it from an IT and Legal Perspective?
Bartolini, Cesare UL; El Kateb, Donia UL; Le Traon, Yves UL et al

in Altmann, Jörn; Silaghi, Gheorghe Cosmin; Rana, Omer F. (Eds.) Economics of Grids, Clouds, Systems, and Services (2016)

A major part of the commercial Internet is moving towards a cloud paradigm. This phenomenon has a drastic impact on the organizational structures of enterprises and introduces new challenges that must be ... [more ▼]

A major part of the commercial Internet is moving towards a cloud paradigm. This phenomenon has a drastic impact on the organizational structures of enterprises and introduces new challenges that must be properly addressed to avoid major setbacks. One such challenge is that of cloud provider viability, that is, the reasonable certainty that the Cloud Service Provider (CSP) will not go out of business, either by filing for bankruptcy or by simply shutting down operations, thus leaving its customers stranded without an infrastructure and, depending on the type of cloud service used, even without their applications or data. This article attempts to address the issue of cloud provider viability, proposing some ways of mitigating the problem both from a technical and from a legal perspective. [less ▲]

Detailed reference viewed: 29 (1 UL)
Full Text
Peer Reviewed
See detailTime Series Classification with Discrete Wavelet Transformed Data: Insights from an Empirical Study
Li, Daoyuan UL; Bissyande, Tegawendé François D Assise UL; Klein, Jacques UL et al

in The 28th International Conference on Software Engineering and Knowledge Engineering (SEKE 2016) (2016, July)

Time series mining has become essential for extracting knowledge from the abundant data that flows out from many application domains. To overcome storage and processing challenges in time series mining ... [more ▼]

Time series mining has become essential for extracting knowledge from the abundant data that flows out from many application domains. To overcome storage and processing challenges in time series mining, compression techniques are being used. In this paper, we investigate the loss/gain of performance of time series classification approaches when fed with lossy-compressed data. This empirical study is essential for reassuring practitioners, but also for providing more insights on how compression techniques can even be effective in reducing noise in time series data. From a knowledge engineering perspective, we show that time series may be compressed by 90% using discrete wavelet transforms and still achieve remarkable classification ac- curacy, and that residual details left by popular wavelet compression techniques can sometimes even help achieve higher classification accuracy than the raw time series data, as they better capture essential local features. [less ▲]

Detailed reference viewed: 162 (21 UL)
Full Text
Peer Reviewed
See detailDSCo: A Language Modeling Approach for Time Series Classification
Li, Daoyuan UL; Li, Li UL; Bissyande, Tegawendé François D Assise UL et al

in 12th International Conference on Machine Learning and Data Mining (MLDM 2016) (2016, July)

Time series data are abundant in various domains and are often characterized as large in size and high in dimensionality, leading to storage and processing challenges. Symbolic representation of time ... [more ▼]

Time series data are abundant in various domains and are often characterized as large in size and high in dimensionality, leading to storage and processing challenges. Symbolic representation of time series – which transforms numeric time series data into texts – is a promising technique to address these challenges. However, these techniques are essentially lossy compression functions and information are partially lost during transformation. To that end, we bring up a novel approach named Domain Series Corpus (DSCo), which builds per-class language models from the symbolized texts. To classify unlabeled samples, we compute the fitness of each symbolized sample against all per-class models and choose the class represented by the model with the best fitness score. Our work innovatively takes advantage of mature techniques from both time series mining and NLP communities. Through extensive experiments on an open dataset archive, we demonstrate that it performs similarly to approaches working with original uncompressed numeric data. [less ▲]

Detailed reference viewed: 187 (23 UL)
Full Text
Peer Reviewed
See detailOpen Data Portal Quality Comparison using AHP
Kubler, Sylvain UL; Robert, Jérémy UL; Le Traon, Yves UL et al

in Proceedings of the 17th International Digital Government Research Conference on Digital Government Research (2016, June 07)

During recent years, more and more Open Data becomes available and used as part of the Open Data movement. However, there are reported issues with the quality of the metadata in data portals and the data ... [more ▼]

During recent years, more and more Open Data becomes available and used as part of the Open Data movement. However, there are reported issues with the quality of the metadata in data portals and the data itself. This is a seri- ous risk that could disrupt the Open Data project, as well as e-government initiatives since the data quality needs to be managed to guarantee the reliability of e-government to the public. First quality assessment frameworks emerge to eval- uate the quality for a given dataset or portal along various dimensions (e.g., information completeness). Nonetheless, a common problem with such frameworks is to provide mean- ingful ranking mechanisms that are able to integrate sev- eral quality dimensions and user preferences (e.g., a portal provider is likely to have different quality preferences than a portal consumer). To address this multi-criteria decision making problem, our research work applies AHP (Analytic Hierarchy Process), which compares 146 active Open Data portals across 44 countries, powered by the CKAN software. [less ▲]

Detailed reference viewed: 27 (0 UL)
Full Text
See detailWatch out for This Commit! A Study of Influential Software Changes
Li, Daoyuan UL; Li, Li UL; Kim, Dongsun UL et al

Report (2016)

One single code change can significantly influence a wide range of software systems and their users. For example, 1) adding a new feature can spread defects in several modules, while 2) changing an API ... [more ▼]

One single code change can significantly influence a wide range of software systems and their users. For example, 1) adding a new feature can spread defects in several modules, while 2) changing an API method can improve the performance of all client programs. Developers often may not clearly know whether their or others’ changes are influential at commit time. Rather, it turns out to be influential after affecting many aspects of a system later. This paper investigates influential software changes and proposes an approach to identify them early, i.e., immediately when they are applied. We first conduct a post- mortem analysis to discover existing influential changes by using intuitions such as isolated changes and changes referred by other changes in 10 open source projects. Then we re-categorize all identified changes through an open-card sorting process. Subsequently, we conduct a survey with 89 developers to confirm our influential change categories. Finally, from our ground truth we extract features, including metrics such as the complexity of changes, terms in commit logs and file centrality in co-change graphs, to build ma- chine learning classifiers. The experiment results show that our prediction model achieves overall with random samples 86.8% precision, 74% recall and 80.4% F-measure respectively. [less ▲]

Detailed reference viewed: 76 (17 UL)
Full Text
Peer Reviewed
See detailAndroZoo: Collecting Millions of Android Apps for the Research Community
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Klein, Jacques UL et al

in Proceedings of the 13th International Workshop on Mining Software Repositories (2016, May)

We present a growing collection of Android Applications collected from several sources, including the official Google Play app market. Our dataset, AndroZoo, currently contains more than three million ... [more ▼]

We present a growing collection of Android Applications collected from several sources, including the official Google Play app market. Our dataset, AndroZoo, currently contains more than three million apps, each of which has been analysed by tens of different AntiVirus products to know which applications are detected as Malware. We provide this dataset to contribute to ongoing research efforts, as well as to enable new potential research topics on Android Apps. By releasing our dataset to the research community, we also aim at encouraging our fellow researchers to engage in reproducible experiments. [less ▲]

Detailed reference viewed: 156 (16 UL)
Full Text
See detailStatic Analysis of Android Apps: A Systematic Literature Review
Li, Li UL; Bissyande, Tegawendé François D Assise UL; Papadakis, Mike UL et al

Report (2016)

Context: Static analysis approaches have been proposed to assess the security of Android apps, by searching for known vulnerabilities or actual malicious code. The literature thus has proposed a large ... [more ▼]

Context: Static analysis approaches have been proposed to assess the security of Android apps, by searching for known vulnerabilities or actual malicious code. The literature thus has proposed a large body of works, each of which attempts to tackle one or more of the several challenges that program analyzers face when dealing with Android apps. Objective: We aim to provide a clear view of the state-of-the-art works that statically analyze Android apps, from which we highlight the trends of static analysis approaches, pinpoint where the focus has been put and enumerate the key aspects where future researches are still needed. Method: We have performed a systematic literature review which involves studying around 90 research papers published in software engineering, programming languages and security venues. This review is performed mainly in five dimensions: problems targeted by the approach, fundamental techniques used by authors, static analysis sensitivities considered, android characteristics taken into account and the scale of evaluation performed. Results: Our in-depth examination have led to several key findings: 1) Static analysis is largely performed to uncover security and privacy issues; 2) The Soot framework and the Jimple intermediate representation are the most adopted basic support tool and format, respectively; 3) Taint analysis remains the most applied technique in research approaches; 4) Most approaches support several analysis sensitivities, but very few approaches consider path-sensitivity; 5) There is no single work that has been proposed to tackle all challenges of static analysis that are related to Android programming; and 6) Only a small portion of state-of-the-art works have made their artifacts publicly available. Conclusion: The research community is still facing a number of challenges for building approaches that are aware altogether of implicit-Flows, dynamic code loading features, reflective calls, native code and multi-threading, in order to implement sound and highly precise static analyzers. [less ▲]

Detailed reference viewed: 537 (19 UL)
Full Text
Peer Reviewed
See detailNear Real-Time Electric Load Approximation in Low Voltage Cables of Smart Grids with Models@run.time
Hartmann, Thomas UL; Moawad, Assaad UL; Fouquet, François UL et al

in 31st Annual ACM Symposium on Applied Computing (SAC'16) (2016, April)

Micro-generations and future grid usages, such as charging of electric cars, raises major challenges to monitor the electric load in low-voltage cables. Due to the highly interconnected nature, real-time ... [more ▼]

Micro-generations and future grid usages, such as charging of electric cars, raises major challenges to monitor the electric load in low-voltage cables. Due to the highly interconnected nature, real-time measurements are problematic, both economically and technically. This entails an overload risk in electricity networks when cables must be disconnected for maintenance reasons or are accidentally damaged. Therefore, it is of great interest for electricity grid providers to anticipate the load in networks and quicker detect failures. However, computing the electric load in cables requires computational intensive power flow calculations and live consumption measurements. Today’s view of the grid is usually based on on-field documentation of cables, fuses, and measurements by technicians and therefore often outdated. Thus, the electric load is usually only simulated in case of major topology variations. However, live measurements of smart meters provide new opportunities. In this paper we present a novel approach for a near real-time electric load approximation by deriving in live the current electric topology and cable loads from smart meter data. We leverage the models@run.time paradigm to combine live measurements with topology characteristics of the grid. Our approach enables to approximate the load in cables, not only for the current grid topology, but also to simulate topology changes for maintenance purposes. We showed that this allows a near real-time approximation while remaining very accurate (average deviation of 1.89% compared to offline power-flow calculation tools). Developed with a grid operator, this approach will be integrated in a monitoring and warning system and as an embeddable solution for on-field simulation. [less ▲]

Detailed reference viewed: 67 (4 UL)