[en] Today’s Android ecosystem is a growing universe of a few billion devices, hundreds of millions of users, and millions of applications targeting a wide range of activities where sensitive information is collected and processed. The security of Android apps is thus of utmost importance and needs to be addressed carefully. In the last decade, several studies have investigated Android applications from a security point of view, focusing on the detection of vulnerabilities or the appropriate usage of cryptography APIs. However, with the Android framework’s rapid iteration, new issues are continuously popping up while some old issues may not have been detected. As a result, security studies on Android apps have never been stopped.
[en] Meanwhile, Android applications, just like other software, are developed by following an iterative process. Indeed, applications are updated regularly to fix bugs or introduce new features. In practice, to release a new version of their applications, developers need to provide a brand new installation package, which is known as an apk file. Therefore, each of these apk files stands for one version of a specific application, and the evolution of an application can be obtained by collecting all these apks. Nevertheless, the collection of these apk files are not straightforward because Android markets such as GooglePlay do not preserve the history of apk files. Instead, only the latest version of an app, i.e., the most recent apk, is provided. This fact challenges studies focusing on Android application evolution. However, history and past experiences allow us to learn from past mistakes. That is why evolutionary studies can potentially benefit both developers and users in many ways, such as: discovering trends for security issue predictions or policy evaluations, unveiling fundamental causes of vulnerabilities for prevention.
[en] In this dissertation, by leveraging AndroZoo, a popular Android application dataset made available to
esearchers, the versioned lineages of Android apps are re-constructed. Then several securityrelevant aspects of Android applications are investigated from an evolutionary perspective. Our study begins with a wide-range investigation in which we take a deep insight into the evolution of several vulnerabilities of Android applications. Then we focus on the vulnerabilities related to crypto-API. We present our attempt to learn cryto-APIs usage from the crowd, i.e., by mining crypto-APIs usage rules from app lineages. Finally, we further narrow down the scale to a new security breach spotted by us. We elaborate on the mechanism of the breach and investigate its evolution patterns. The detailed contributions include:
[en] Re-construction of app lineages: Android developers update their apps by providing new apk files which are the installation packages, and these apks have to be published via relevant markets. Nevertheless, mainstream Android application markets including the official market GooglePlay provide applications as a fleeing data stream where only the latest version of an application is available. This causes one of the main difficulties to re-construct the lineage of Android applications. Moreover, to build an app lineage dataset of large scale, besides the collection of millions of apk files, it also requires a considerable amount of computation capacities for feature extraction and matching. In this dissertation, we take advantage of the AndroZoo dataset and the High Performance Computing (HPC) clusters of the University of Luxembourg to re-construct the first large scale app lineage dataset and publicly share it with the community. Furthermore, a primary study based on the lineage dataset has been done to investigate the evolution of Android app complexity by leveraging six well-established complexity metrics.
[en] Understanding the evolution of Android app vulnerabilities: The community is still lacking comprehensive studies exploring how vulnerabilities have evolved and how they evolve in a single app across developer updates. In this dissertation, we fill this gap by leveraging the re-constructed app lineages. We apply state-of-the-art vulnerability-finding tools and systematically investigate the reports produced by each tool. In particular, we study which types of vulnerabilities are found, how they are introduced in the app code, where they are located, and whether they foreshadow malware. We provide insights based on the quantitative data reported by the tools, but we further discuss the potential false positives. Our findings and study artifacts constitute tangible knowledge to the community.
[en] Mining crypto-API usage rules by analyzing app updates: Android app developers recurrently use crypto-APIs to provide data security to app users. Unfortunately, misuse of APIs only creates an illusion of security and even exposes apps to systematic attacks. It is thus necessary to provide developers with a statically-enforceable list of specifications of crypto-API usage rules. On the one hand, such rules cannot be manually written as the process does not scale to all available APIs. On the other hand, a classical mining approach based on typical usage patterns is not relevant in Android, given that a large share of usages include mistakes. In this dissertation, building on the assumption that “developers update API usage instances to fix misuses”, we propose to mine the app lineages dataset to infer API usage rules. Eventually, our investigations yield negative results on our assumption that API usage updates tend to correct misuses. Actually, it appears that updates that fix misuses may be unintentional: subsequent updates quickly re-introduce the same misuses patterns.
[en] Direct inter-app code invocation in Android apps and its evolution: The Android ecosystem offers different facilities to enable communication among app components and across apps to ensure that rich services can be composed through functionality reuse. At the heart of this system is the Inter-component communication (ICC) scheme, which has been largely studied in the literature. Less known in the community is another powerful mechanism that allows for direct inter-app code invocation which opens up for different reuse scenarios, both legitimate or malicious. In this dissertation, we expose the general workflow for this mechanism, which beyond ICCs, enables app developers to access and invoke functionalities (either entire Java classes, methods or object
fields) implemented in other apps using official Android APIs. We experimentally showcase how this reuse mechanism can be leveraged to “plagiarize" supposedly-protected functionalities. Typically, we could leverage this mechanism to bypass security guards that a popular video broadcaster has placed for preventing access to its video database from outside its provided app. We further contribute with a static analysis toolkit, named DICIDer, for detecting direct inter-app code invocations in apps. An empirical analysis of the usage prevalence and evolution of this reuse mechanism is then conducted.