![]() Li, Li ![]() ![]() ![]() in Abstract book of the 4th IEEE/ACM International Conference on Mobile Software Engineering and Systems (MobileSoft 2017) (2017, May) To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to ... [more ▼] To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. Towards addressing this need, we propose in this work a tool-based approach called HookRanker, which provides ranked lists of potentially malicious packages based on the way malware behaviour code is triggered. With experiments on a ground truth set of piggybacked apps, we are able to automatically locate the malicious packages from piggybacked Android apps with an accuracy of 83.6% in verifying the top five reported items. [less ▲] Detailed reference viewed: 334 (23 UL)![]() Li, Li ![]() ![]() ![]() Poster (2017, May) App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately ... [more ▼] App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately, although most research involves pairwise similarity comparison to distinguish repackaged apps from their “original” counterparts, no work has considered the threat to validity of not being able to discover the true original apps. We provide in this paper preliminary insights of an investigation into the Multi-Generation Repackaging Hypothesis: is the original in a repackaging process the outcome of a previous repackaging process? Leveraging the Androzoo dataset of over 5 million Android apps, we validate this hypothesis in the wild, calling upon the community to take this threat into account in new solutions for repackaged app detection. [less ▲] Detailed reference viewed: 305 (10 UL)![]() Li, Li ![]() ![]() ![]() Poster (2017, May) The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a ... [more ▼] The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a large user base. Despite the fact that the literature has already presented a number of tools to detect piggybacked apps, there is still lacking a comprehensive investigation on the piggybacking processes. To fill this gap, in this work, we collect a large set of benign/piggybacked app pairs that can be taken as benchmark apps for further investigation. We manually look into these benchmark pairs for understanding the characteristics of piggybacking apps and eventually we report 20 interesting findings. We expect these findings to initiate new research directions such as practical and scalable piggybacked app detection, explainable malware detection, and malicious code location. [less ▲] Detailed reference viewed: 289 (12 UL)![]() Li, Daoyuan ![]() ![]() ![]() in The 32nd ACM Symposium on Applied Computing (SAC 2017) (2017, April) As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more ... [more ▼] As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more concerned with their privacy in buildings, explicit sensing techniques can lead to uncomfortableness and resistance from occupants. In this paper, we adapt a sensing by proxy paradigm that monitors building status and coarse occupant activities through agglomerative clustering of indoor temperature movements. Through extensive experimentation on 86 classrooms, offices and labs in a five-story school building in western Europe, we prove that indoor temperature movements can be leveraged to infer latent information about indoor environments, especially about rooms' relative physical locations and rough type of occupant activities. Our results evidence a cost-effective approach to extending commercial building control systems and gaining extra relevant intelligence from such systems. [less ▲] Detailed reference viewed: 269 (19 UL)![]() Mouline, Ludovic ![]() ![]() ![]() in Mouline, Ludovic; Hartmann, Thomas; Fouquet, François (Eds.) et al Programming '17 Companion to the first International Conference on the Art, Science and Engineering of Programming (2017, April) Smart systems are characterised by their ability to analyse measured data in live and to react to changes according to expert rules. Therefore, such systems exploit appropriate data models together with ... [more ▼] Smart systems are characterised by their ability to analyse measured data in live and to react to changes according to expert rules. Therefore, such systems exploit appropriate data models together with actions, triggered by domain-related conditions. The challenge at hand is that smart systems usually need to process thousands of updates to detect which rules need to be triggered, often even on restricted hardware like a Raspberry Pi. Despite various approaches have been investigated to efficiently check conditions on data models, they either assume to fit into main memory or rely on high latency persistence storage systems that severely damage the reactivity of smart systems. To tackle this challenge, we propose a novel composition process, which weaves executable rules into a data model with lazy loading abilities. We quantitatively show, on a smart building case study, that our approach can handle, at low latency, big sets of rules on top of large-scale data models on restricted hardware. [less ▲] Detailed reference viewed: 364 (22 UL)![]() Kintis, Marinos ![]() ![]() in IEEE Transactions on Software Engineering (2017) Detailed reference viewed: 235 (8 UL)![]() Li, Li ![]() ![]() ![]() in IEEE Transactions on Information Forensics and Security (2017) The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has ... [more ▼] The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has produced approaches and tools to identify piggybacked apps, the literature lacks a comprehensive investigation into such phenomenon. We fill this gap by 1) systematically building a large set of piggybacked and benign apps pairs, which we release to the community, 2) empirically studying the characteristics of malicious piggybacked apps in comparison with their benign counterparts, and 3) providing insights on piggybacking processes. Among several findings providing insights, analysis techniques should build upon to improve the overall detection and classification accuracy of piggybacked apps, we show that piggybacking operations not only concern app code but also extensively manipulates app resource files, largely contradicting common beliefs. We also find that piggybacking is done with little sophistication, in many cases automatically, and often via library code. [less ▲] Detailed reference viewed: 356 (28 UL)![]() Li, Li ![]() ![]() ![]() in Information and Software Technology (2017) Context: Static analysis exploits techniques that parse program source code or bytecode, often traversing program paths to check some program properties. Static analysis approaches have been proposed for ... [more ▼] Context: Static analysis exploits techniques that parse program source code or bytecode, often traversing program paths to check some program properties. Static analysis approaches have been proposed for different tasks, including for assessing the security of Android apps, detecting app clones, automating test cases generation, or for uncovering non-functional issues related to performance or energy. The literature thus has proposed a large body of works, each of which attempts to tackle one or more of the several challenges that program analysers face when dealing with Android apps. Objective: We aim to provide a clear view of the state-of-the-art works that statically analyse Android apps, from which we highlight the trends of static analysis approaches, pinpoint where the focus has been put, and enumerate the key aspects where future researches are still needed. Method: We have performed a systematic literature review (SLR) which involves studying 124 research papers published in software engineering, programming languages and security venues in the last 5 years (January 2011 - December 2015). This review is performed mainly in five dimensions: problems targeted by the approach, fundamental techniques used by authors, static analysis sensitivities considered, android characteristics taken into account and the scale of evaluation performed. Results: Our in-depth examination has led to several key findings: 1) Static analysis is largely performed to uncover security and privacy issues; 2) The Soot framework and the Jimple intermediate representation are the most adopted basic support tool and format, respectively; 3) Taint analysis remains the most applied technique in research approaches; 4) Most approaches support several analysis sensitivities, but very few approaches consider path-sensitivity; 5) There is no single work that has been proposed to tackle all challenges of static analysis that are related to Android programming; and 6) Only a small portion of state-of-the-art works have made their artefacts publicly available. Conclusion: The research community is still facing a number of challenges for building approaches that are aware altogether of implicit-Flows, dynamic code loading features, reflective calls, native code and multi-threading, in order to implement sound and highly precise static analyzers. [less ▲] Detailed reference viewed: 444 (13 UL)![]() ; Papadakis, Mike ![]() ![]() in 10th IEEE International Conference on Software Testing, Verification and Validation (2017) Detailed reference viewed: 199 (9 UL)![]() Jimenez, Matthieu ![]() ![]() ![]() E-print/Working paper (2017) Detailed reference viewed: 197 (15 UL)![]() ; Bissyande, Tegawendé François D Assise ![]() ![]() Report (2017) Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this ... [more ▼] Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present Code voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and StackOverflow Q\&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users complete solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for StackOverflow questions. [less ▲] Detailed reference viewed: 318 (31 UL)![]() Kubler, Sylvain ![]() ![]() in Government Information Quarterly (2017) Detailed reference viewed: 199 (2 UL)![]() Muller, Steve ![]() ![]() in Computers and Security (2017), 64 Quantitative risk assessment provides a holistic view of risk in an organisation, which is, however, often biased by the fact that risk shared by several assets is encoded multiple times in a risk ... [more ▼] Quantitative risk assessment provides a holistic view of risk in an organisation, which is, however, often biased by the fact that risk shared by several assets is encoded multiple times in a risk analysis. An apparent solution to this issue is to take all dependencies between assets into consideration when building a risk model. However, existing approaches rarely support cyclic dependencies, although assets that mutually rely on each other are encountered in many organisations, notably in critical infrastructures. To the best of our knowledge, no author has provided a provably efficient algorithm (in terms of the execution time) for computing the risk in such an organisation, notwithstanding that some heuristics exist. This paper introduces the dependency-aware root cause (DARC) model, which is able to compute the risk resulting from a collection of root causes using a poly-time randomised algorithm, and concludes with a discussion on real-time risk monitoring, which DARC supports by design. © 2016 Elsevier Ltd [less ▲] Detailed reference viewed: 118 (5 UL)![]() Jimenez, Matthieu ![]() ![]() ![]() in 2016 Asia-Pacific Software Engineering Conference (APSEC) (2016, December) Vulnerabilities are one of the main concerns faced by practitioners when working with security critical applications. Unfortunately, developers and security teams, even experienced ones, fail to identify ... [more ▼] Vulnerabilities are one of the main concerns faced by practitioners when working with security critical applications. Unfortunately, developers and security teams, even experienced ones, fail to identify many of them with severe consequences. Vulnerabilities are hard to discover since they appear in various forms, caused by many different issues and their identification requires an attacker’s mindset. In this paper, we aim at increasing the understanding of vulnerabilities by investigating their characteristics on two major open-source software systems, i.e., the Linux kernel and OpenSSL. In particular, we seek to analyse and build a profile for vulnerable code, which can ultimately help researchers in building automated approaches like vulnerability prediction models. Thus, we examine the location, criticality and category of vulnerable code along with its relation with software metrics. To do so, we collect more than 2,200 vulnerable files accounting for 863 vulnerabilities and compute more than 35 software metrics. Our results indicate that while 9 Common Weakness Enumeration (CWE) types of vulnerabilities are prevalent, only 3 of them are critical in OpenSSL and 2 of them in the Linux kernel. They also indicate that different types of vulnerabilities have different characteristics, i.e., metric profiles, and that vulnerabilities of the same type have different profiles in the two projects we examined. We also found that the file structure of the projects can provide useful information related to the vulnerabilities. Overall, our results demonstrate the need for making project specific approaches that focus on specific types of vulnerabilities. [less ▲] Detailed reference viewed: 333 (17 UL)![]() Li, Daoyuan ![]() ![]() ![]() in The 15th International Symposium on Intelligent Data Analysis (2016, October) The abundance of time series data in various domains and their high dimensionality characteristic are challenging for harvesting useful information from them. To tackle storage and processing challenges ... [more ▼] The abundance of time series data in various domains and their high dimensionality characteristic are challenging for harvesting useful information from them. To tackle storage and processing challenges, compression-based techniques have been proposed. Our previous work, Domain Series Corpus (DSCo), compresses time series into symbolic strings and takes advantage of language modeling techniques to extract from the training set knowledge about different classes. However, this approach was flawed in practice due to its excessive memory usage and the need for a priori knowledge about the dataset. In this paper we propose DSCo-NG, which reduces DSCo’s complexity and offers an efficient (linear time complexity and low memory footprint), accurate (performance comparable to approaches working on uncompressed data) and generic (so that it can be applied to various domains) approach for time series classification. Our confidence is backed with extensive experimental evaluation against publicly accessible datasets, which also offers insights on when DSCo-NG can be a better choice than others. [less ▲] Detailed reference viewed: 266 (23 UL)![]() Kubler, Sylvain ![]() ![]() in Expert Systems with Applications (2016), 65 As a practical popular methodology for dealing with fuzziness and uncertainty in Multiple Criteria Decision-Making (MCDM), Fuzzy AHP (FAHP) has been applied to a wide range of applications. As of the time ... [more ▼] As a practical popular methodology for dealing with fuzziness and uncertainty in Multiple Criteria Decision-Making (MCDM), Fuzzy AHP (FAHP) has been applied to a wide range of applications. As of the time of writing there is no state of the art survey of FAHP, we carry out a literature review of 190 application papers (i.e., applied research papers), published between 2004 and 2016, by classifying them on the basis of the area of application, the identified theme, the year of publication, and so forth. The identified themes and application areas have been chosen based upon the latest state-of-the-art survey of AHP conducted by Vaidya and Kumar (2006). To help readers extract quick and meaningful information, the reviewed papers are summarized in various tabular formats and charts. Unlike previous literature surveys, results and findings are made available through an online (and free) testbed, which can serve as a ready reference for those who wish to apply, modify or extend FAHP in various applications areas. This online testbed makes also available one or more fuzzy pairwise comparison matrices (FPCMs) from all the reviewed papers (255 matrices in total). In terms of results and findings, this survey shows that: (i) FAHP is used primarily in the Manufacturing, Industry and Government sectors; (ii) Asia is the torchbearer in this field, where FAHP is mostly applied in the theme areas of Selection and Evaluation; (iii) a significant amount of research papers (43% of the reviewed literature) combine FAHP with other tools, particularly with TOPSIS, QFD and ANP (AHP’s variant); (iv) Chang’s extent analysis method, which is used for FPCMs’ weight derivation in FAHP, is still the most popular method in spite of a number of criticisms in recent years (considered in 57% of the reviewed literature). [less ▲] Detailed reference viewed: 335 (22 UL)![]() Li, Li ![]() ![]() ![]() in The 32nd International Conference on Software Maintenance and Evolution (ICSME) (2016, October) As Android becomes a de-facto choice of development platform for mobile apps, developers extensively leverage its accompanying Software Development Kit to quickly build their apps. This SDK comes with a ... [more ▼] As Android becomes a de-facto choice of development platform for mobile apps, developers extensively leverage its accompanying Software Development Kit to quickly build their apps. This SDK comes with a set of APIs which developers may find limited in comparison to what system apps can do or what framework developers are preparing to harness capabilities of new generation devices. Thus, developers may attempt to explore in advance the normally “inaccessible” APIs for building unique API-based functionality in their app. The Android programming model is unique in its kind. Inaccessible APIs, which however are used by developers, constitute yet another specificity of Android development, and is worth investigating to understand what they are, how they evolve over time, and who uses them. To that end, in this work, we empirically investigate 17 important releases of the Android framework source code base, and we find that inaccessible APIs are commonly implemented in the Android framework, which are further neither forward nor backward compatible. Moreover, a small set of inaccessible APIs can eventually become publicly accessible, while most of them are removed during the evolution, resulting in risks for such apps that have leveraged inaccessible APIs. Finally, we show that inaccessible APIs are indeed accessed by third-party apps, and the official Google Play store has tolerated the proliferation of apps leveraging inaccessible API methods. [less ▲] Detailed reference viewed: 298 (9 UL)![]() Jimenez, Matthieu ![]() ![]() ![]() in 16th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2016, Raleigh, US, October 2-3, 2016 (2016, October) To assist the vulnerability identification process, researchers proposed prediction models that highlight (for inspection) the most likely to be vulnerable parts of a system. In this paper we aim at ... [more ▼] To assist the vulnerability identification process, researchers proposed prediction models that highlight (for inspection) the most likely to be vulnerable parts of a system. In this paper we aim at making a reliable replication and comparison of the main vulnerability prediction models. Thus, we seek for determining their effectiveness, i.e., their ability to distinguish between vulnerable and non-vulnerable components, in the context of the Linux Kernel, under different scenarios. To achieve the above-mentioned aims, we mined vulnerabilities reported in the National Vulnerability Database and created a large dataset with all vulnerable components of Linux from 2005 to 2016. Based on this, we then built and evaluated the prediction models. We observe that an approach based on the header files included and on function calls performs best when aiming at future vulnerabilities, while text mining is the best technique when aiming at random instances. We also found that models based on code metrics perform poorly. We show that in the context of the Linux kernel, vulnerability prediction models can be superior to random selection and relatively precise. Thus, we conclude that practitioners have a valuable tool for prioritizing their security inspection efforts. [less ▲] Detailed reference viewed: 473 (32 UL)![]() Manukyan, Anush ![]() ![]() ![]() in Proceedings of 21st IEEE International Conference on Emerging Technologies and Factory Automation ETFA 2016 (2016, September 06) Unmanned Aerial Vehicles are currently investigated as an important sub-domain of robotics, a fast growing and truly multidisciplinary research field. UAVs are increasingly deployed in real-world settings ... [more ▼] Unmanned Aerial Vehicles are currently investigated as an important sub-domain of robotics, a fast growing and truly multidisciplinary research field. UAVs are increasingly deployed in real-world settings for missions in dangerous environments or in environments which are challenging to access. Combined with autonomous flying capabilities, many new possibilities, but also challenges, open up. To overcome the challenge of early identification of degradation, machine learning based on flight features is a promising direction. Existing approaches build classifiers that consider their features to be correlated. This prevents a fine-grained detection of degradation for the different hardware components. This work presents an approach where the data is considered uncorrelated and, using machine learning <br />techniques, allows the precise identification of UAV’s damages. [less ▲] Detailed reference viewed: 205 (22 UL)![]() Li, Daoyuan ![]() ![]() ![]() in International Journal of Software Engineering and Knowledge Engineering (2016), 26(9&10), 13611377 Detailed reference viewed: 203 (12 UL) |
||