Publications generated thanks to the UL HPC Platform
Bookmark and Share    
Full Text
See detailCross-Evaluation of Surface Meteorological Data and GNSS-derived Water Vapor with Re-analysis Information for South Georgia Island, South Atlantic Ocean
Erkihune, Eshetu Nega UL; Teferle, Felix Norman UL; Hunegnaw, Addisu UL et al

Poster (2020, December 11)

As one of the most important components of the global hydrologic cycle, atmospheric water vapor shows significant variability in both space and time over a large range of scales. This variability results ... [more ▼]

As one of the most important components of the global hydrologic cycle, atmospheric water vapor shows significant variability in both space and time over a large range of scales. This variability results from the interactions of many different factors, including topography and the presence of specific atmospheric processes. One of the key regions for affecting global climatic variations lies in the sub-Antarctic zone over the Southern Ocean with its Antarctic Circumpolar Current and the associated Antarctic Convergence. There, in this cold and maritime region, lies South Georgia Island with its weather and climate being largely affected by both the dominating ocean currents and the strong east ward blowing winds in this zone. While the island forms an important outpost for various surface observations in this largely under-sampled and extremely remote region, it also forms a barrier for these winds due to its high topography, which, in turn, leads to various local meteorological phenomena, such as foehn winds. Surface meteorological data have been available for several stations near King Edward Point (KEP) on South Georgia for much of the 20th century. Since 2013 and 2014, Global Navigation Satellite System (GNSS) data have been available at five locations around the periphery of the island and during a few months in 2016 also radiosonde data have been collected at KEP. This study aims at investigating the consistency between the different surface meteorological data sets such as temperature, pressure and wind direction/speed that have been collected at KEP and a nearby GNSS station on Brown Mountain (BMT) for which we also compare the precipitable water vapor estimates. A cross-evaluation of these data sets with model values from the ERA-Interim re-analyses is carried out to further investigate the performance of both instruments and models. Overall, our preliminary results show high consistency between the surface meteorological observations and the re-analysis model values. It was our main objective to investigate the homogeneity and accuracy of the BMT observation time series through cross-evaluation with the series of the official WMO station at KEP. Air temperature and pressure at both sites from observation and model data are strongly correlated at hourly intervals, reaching correlation coefficients in the range of 0.966 - 0.968 for the former data set. The difference temperature time series shows seasonal variations but no obvious steps. The difference pressure time series is flat, also indicating no discontinuities. A cross-evaluation of the wind observations shows the distinct directional feature at KEP for a station in a valley where the winds are funneled through the valley. For BMT the wind observations confirm the main directions of winds but also show the openness of the station from all directions. The observations of temperature, pressure, humidity and GNSS-derived PWV clearly show the signatures of the frequent foehn events. [less ▲]

Detailed reference viewed: 42 (1 UL)
See detailTest
Erkihune, Eshetu Nega UL; Teferle, Felix Norman UL; Hunegnaw, Addisu UL et al

Poster (2020, December 11)

test

Detailed reference viewed: 79 (22 UL)
Full Text
Peer Reviewed
See detailOptimised biomolecular extraction for metagenomic analysis of microbial biofilms from high-mountain streams
Busi, Susheel Bhanu UL; Pramateftaki, Paraskevi; Brandani, Jade et al

in PeerJ (2020)

Glacier-fed streams (GFS) are harsh ecosystems dominated by microbial life organized in benthic biofilms, yet the biodiversity and ecosystem functions provided by these communities remain under ... [more ▼]

Glacier-fed streams (GFS) are harsh ecosystems dominated by microbial life organized in benthic biofilms, yet the biodiversity and ecosystem functions provided by these communities remain under-appreciated. To better understand the microbial processes and communities contributing to GFS ecosystems, it is necessary to leverage high throughput sequencing. Low biomass and high inorganic particle load in GFS sediment samples may affect nucleic acid extraction efficiency using extraction methods tailored to other extreme environments such as deep-sea sediments. Here, we benchmarked the utility and efficacy of four extraction protocols, including an up-scaled phenol-chloroform protocol. We found that established protocols for comparable sample types consistently failed to yield sufficient high-quality DNA, delineating the extreme character of GFS. The methods differed in the success of downstream applications such as library preparation and sequencing. An adapted phenol-chloroform-based extraction method resulted in higher yields and better recovered the expected taxonomic profile and abundance of reconstructed genomes when compared to commercially-available methods. Affordable and straight-forward, this method consistently recapitulated the abundance and genomes of a mock community, including eukaryotes. Moreover, by increasing the amount of input sediment, the protocol is readily adjustable to the microbial load of the processed samples without compromising protocol efficiency. Our study provides a first systematic and extensive analysis of the different options for extraction of nucleic acids from glacier-fed streams for high-throughput sequencing applications, which may be applied to other extreme environments. [less ▲]

Detailed reference viewed: 42 (5 UL)
Full Text
See detailPRACE Best Practice Guide 2020: Modern Processors
Saastad, O. W.; Kapanova, K.; Markov, S. et al

Report (2020)

This Best Practice Guide (BPG) extends the previously developed series of BPGs by providing an update on new technologies and systems for the further support of European High Performance Computing (HPC ... [more ▼]

This Best Practice Guide (BPG) extends the previously developed series of BPGs by providing an update on new technologies and systems for the further support of European High Performance Computing (HPC) user community in achieving a remarkable performance of their large-scale applications. It covers existing systems and aims to provide support for scientists to port, build and run their applications on these systems. While some benchmarking is part of this guide, the results provided are mainly an illustration of the different systems characteristics, and should not be used as guides for the comparison of systems presented nor should be used for system procurement considerations. Procurement and benchmarking are well covered by other PRACE work packages and are out of this BPG's discussion scope. This BPG document has grown to be a hybrid of field guide and a textbook approach. The system and processor coverage provide some relevant technical information for the users who need a deeper knowledge of the system in order to fully utilise the hardware. While the field guide approach provides hints and starting points for porting and building scientific software. For this, a range of compilers, libraries, debuggers, performance analysis tools, etc. are covered. While recommendation for compilers, libraries and flags are covered we acknowledge that there is no magic bullet as all codes are different. Unfortunately there is often no way around the trial and error approach. Some in-depth documentation of the covered processors is provided. This includes some background on the inner workings of the processors considered; the number of threads each core can handle; how these threads are implemented and how these threads (instruction streams) are scheduled onto different execution units within the core. In addition, this guide describes how the vector units with different lengths (256, 512 or in the case of SVE - variable and generally unknown until execution time) are implemented. As most of HPC work up to now has been done in 64 bit floating point the emphasis is on this data type, specially for vectors. In addition to the processor executing units, memory in its many levels of hierarchy is important. The different implementations of Non-Uniform Memory Access (NUMA) are also covered in this BPG. The guide gives a description of the hardware for a selection of relevant processors currently deployed in some PRACE HPC systems. It includes ARM64(Huawei/HiSilicon and Marvell) and x86-64 (AMD and Intel). It provides information on the programming models and development environment as well as information about porting programs. Furthermore it provides sections about strategies on how to analyze and improve the performance of applications. While this guide does not provide an update on all recent processors, some of the previous BPG releases do cover other processor architectures not discussed in this guide (e.g. Power architecture) and should be considered as a staring point for work. This guide aims also to increase the user awareness on energy and power consumption of individual applications by providing some analysis on usefulness of maximum CPU frequency scaling based on the type of application considered (e.g. CPU-bound, memory-bound, etc.). [less ▲]

Detailed reference viewed: 79 (4 UL)
Full Text
Peer Reviewed
See detailTrace-Checking Signal-based Temporal Properties: A Model-Driven Approach
Boufaied, Chaima UL; Menghi, Claudio UL; Bianculli, Domenico UL et al

in Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE ’20) (2020, September)

Signal-based temporal properties (SBTPs) characterize the behavior of a system when its inputs and outputs are signals over time; they are very common for the requirements specification of cyber-physical ... [more ▼]

Signal-based temporal properties (SBTPs) characterize the behavior of a system when its inputs and outputs are signals over time; they are very common for the requirements specification of cyber-physical systems. Although there exist several specification languages for expressing SBTPs, such languages either do not easily allow the specification of important types of properties (such as spike or oscillatory behaviors), or are not supported by (efficient) trace-checking procedures. In this paper, we propose SB-TemPsy, a novel model-driven trace-checking approach for SBTPs. SB-TemPsy provides (i) SB-TemPsy-DSL, a domain-specific language that allows the specification of SBTPs covering the most frequent requirement types in cyber-physical systems, and (ii) SB-TemPsy-Check, an efficient, model-driven trace-checking procedure. This procedure reduces the problem of checking an SB-TemPsy-DSL property over an execution trace to the problem of evaluating an Object Constraint Language constraint on a model of the execution trace. We evaluated our contributions by assessing the expressiveness of SB-TemPsy-DSL and the applicability of SB-TemPsy-Check using a representative industrial case study in the satellite domain. SB-TemPsy-DSL could express 97% of the requirements of our case study and SB-TemPsy-Check yielded a trace-checking verdict in 87% of the cases, with an average checking time of 48.7 s. From a practical standpoint and compared to state-of-the-art alternatives, our approach strikes a better trade-off between expressiveness and performance as it supports a large set of property types that can be checked, in most cases, within practical time limits. [less ▲]

Detailed reference viewed: 214 (17 UL)
Full Text
Peer Reviewed
See detailA rare loss-of function variant of ADAM17 is associated with late-onset familial Alzheimer disease
Hartl, Daniela; May, Patrick UL; Gu, Wei UL et al

in Molecular Psychiatry (2020), 25(3), 629-639

Common variants of about 20 genes contributing to AD risk have so far been identified through genome-wide association studies (GWAS). However, there is still a large proportion of heritability that might ... [more ▼]

Common variants of about 20 genes contributing to AD risk have so far been identified through genome-wide association studies (GWAS). However, there is still a large proportion of heritability that might be explained by rare but functionally important variants. One of the so far identified genes with rare AD causing variants is ADAM10. Using whole-genome sequencing we now identified a single rare nonsynonymous variant (SNV) rs142946965 [p.R215I] in ADAM17 co-segregating with an autosomal-dominant pattern of late-onset AD in one family. Subsequent genotyping and analysis of available whole-exome sequencing data of additional case/control samples from Germany, the UK and the USA identified five variant carriers among AD patients only. The mutation inhibits pro-protein cleavage and the formation of the active enzyme, thus leading to loss-of-function of ADAM17 α-secretase. Further, we identified a strong negative correlation between ADAM17 and APP gene expression in human brain and present in vitro evidence that ADAM17 negatively controls the expression of APP. As a consequence, p.R215I mutation of ADAM17 leads to elevated Aß formation in vitro. Together our data supports a causative association of the identified ADAM17 variant in the pathogenesis of AD. [less ▲]

Detailed reference viewed: 281 (30 UL)
Full Text
See detailDigraph3 Python Software Collection
Bisdorff, Raymond UL

Software (2020)

Python3 resources for implementing decision aid algorithms in the context of a bipolar-valued outranking approach. These computing resources are useful in the field of Algorithmic Decision Theory and more ... [more ▼]

Python3 resources for implementing decision aid algorithms in the context of a bipolar-valued outranking approach. These computing resources are useful in the field of Algorithmic Decision Theory and more specifically in outranking based Multiple Criteria Decision Aid (MCDA). They provide the practical tools for a Master Course on Algorithmic Decision Theory at the University of Luxembourg. [less ▲]

Detailed reference viewed: 26 (3 UL)
Full Text
See detailAlgorithmic Decision Theory: Lecture notes and presentation slides
Bisdorff, Raymond UL

Learning material (2020)

The objective of this course is to introduce students to ADT, a new interdisciplinary field at the intersection of decision theory, discrete mathematics, theoretical computer science and artificial ... [more ▼]

The objective of this course is to introduce students to ADT, a new interdisciplinary field at the intersection of decision theory, discrete mathematics, theoretical computer science and artificial intelligence. ADT proposes new ideas, approaches and tools for supporting decision making processes in presence of massive databases, combinatorial structures, partial and/or uncertain information, and distributed, possibly inter-operating, decision makers. Such problems arise in several real-world decision making problems such as humanitarian logistics, epidemiology, risk assessment and management, e-government, electronic commerce, and the implementation of recommender systems [less ▲]

Detailed reference viewed: 173 (14 UL)
Full Text
Peer Reviewed
See detailAmyloid Evolution: Antiparallel Replaced by Parallel
Hakami Zanjani, Ali Asghar UL; Reynolds, Nicholas; Zhang, Afang et al

in Biophysical Journal (2020), 118

Several atomic structures have now been found for micrometer-scale amyloid fibrils or elongated microcrystals using a range of methods, including NMR, electron microscopy, and X-ray crystallography, with ... [more ▼]

Several atomic structures have now been found for micrometer-scale amyloid fibrils or elongated microcrystals using a range of methods, including NMR, electron microscopy, and X-ray crystallography, with parallel beta-sheet appearing as the most common secondary structure. The etiology of amyloid disease, however, indicates nanometer-scale assemblies of only tens of peptides as significant agents of cytotoxicity and contagion. By combining solution X-ray with molecular dynamics, weshow that antiparallel structure dominates at the first stages of aggregation for a specific set of peptides, being replaced by parallel at large length scales only. This divergence in structure between small and large amyloid aggregates should inform future design of molecular therapeutics against nucleation or intercellular transmission of amyloid. Calculations and an overview from the literature argue that antiparallel order should be the first appearance of structure in many or most amyloid aggregation processes, regardless of the endpoint. Exceptions to this finding should exist, depending inevitably on the sequence and on solution conditions. [less ▲]

Detailed reference viewed: 47 (4 UL)
Full Text
Peer Reviewed
See detailPerformance Analysis of Distributed and Scalable Deep Learning
Mahon, S.; Varrette, Sébastien UL; Plugaru, Valentin UL et al

in 20th IEEE/ACM Intl. Symp. on Cluster, Cloud and Internet Computing (CCGrid'20) (2020, May)

With renewed global interest for Artificial Intelligence (AI) methods, the past decade has seen a myriad of new programming models and tools that enable better and faster Machine Learning (ML). More ... [more ▼]

With renewed global interest for Artificial Intelligence (AI) methods, the past decade has seen a myriad of new programming models and tools that enable better and faster Machine Learning (ML). More recently, a subset of ML known as Deep Learning (DL) raised an increased interest due to its inherent ability to tackle efficiently novel cognitive computing applications. DL allows computational models that are composed of multiple processing layers to learn in an automated way representations of data with multiple levels of abstraction, and can deliver higher predictive accuracy when trained on larger data sets. Based on Artificial Neural Networks (ANN), DL is now at the core of state of the art voice recognition systems (which enable easy control over e.g. Internet-of- Things (IoT) smart home appliances for instance), self-driving car engine, online recommendation systems. The ecosystem of DL frameworks is fast evolving, as well as the DL architectures that are shown to perform well on specialized tasks and to exploit GPU accelerators. For this reason, the frequent performance evaluation of the DL ecosystem is re- quired, especially since the advent of novel distributed training frameworks such as Horovod allowing for scalable training across multiple computing resources. In this paper, the scalability evaluation of the reference DL frameworks (Tensorflow, Keras, MXNet, and PyTorch) is performed over up-to-date High Performance Comput- ing (HPC) resources to compare the efficiency of differ- ent implementations across several hardware architectures (CPU and GPU). Experimental results demonstrate that the DistributedDataParallel features in the Pytorch library seem to be the most efficient framework for distributing the training process across many devices, allowing to reach a throughput speedup of 10.11 when using 12 NVidia Tesla V100 GPUs when training Resnet44 on the CIFAR10 dataset. [less ▲]

Detailed reference viewed: 115 (8 UL)
Full Text
Peer Reviewed
See detailBi-allelic GAD1 variants cause a neonatal onset syndromic developmental and epileptic encephalopathy
Chatron, Nicolas; Becker, Felicitas; Morsy, Herba et al

in Brain: a Journal of Neurology (2020)

Developmental and Epileptic Encephalopathies are a heterogeneous group of early-onset epilepsy syndromes dramatically impairing neurodevelopment. Modern genomic technologies have revealed a number of ... [more ▼]

Developmental and Epileptic Encephalopathies are a heterogeneous group of early-onset epilepsy syndromes dramatically impairing neurodevelopment. Modern genomic technologies have revealed a number of monogenic origins and opened the door to therapeutic hopes. Here we describe a new syndromic developmental and epileptic encephalopathies caused by bi-allelic loss of function variants in GAD1, as presented by eleven patients from 6 independent consanguineous families. Seizure onset occurred in the two first months of life in all patients. All 10 patients from whom early disease history was available, presented seizure onset in the first month of life, mainly consisting of epileptic spasms or myoclonic seizures. Early electroencephalography showed suppression-burst or pattern of burst attenuation or hypsarrhythmia if only recorded in the post-neonatal period. Eight patients had joint contractures and/or pes equinovarus. Seven patients presented a cleft palate and two also had an omphalocele, reproducing the phenotype of the knockout Gad1-/- mouse model. Four patients died before four years of age. GAD1 encodes the glutamate decarboxylase enzyme GAD67, a critical actor of the γ-aminobutyric acid (GABA) metabolism as it catalyzes the decarboxylation of glutamic acid to form GABA. Our findings evoke a novel syndrome related to GAD67 deficiency, characterized by the unique association of developmental and epileptic encephalopathies, cleft palate, joint contractures and/or omphalocele. [less ▲]

Detailed reference viewed: 34 (4 UL)
Full Text
Peer Reviewed
See detailFirst-principles modeling of chemistry in mixed solvents: Where to go from here?
Maldonado, Alex; Basdogan, Yasemin; Berryman, Josh UL et al

in Journal of Chemical Physics (2020), 152

Mixed solvents (i.e., binary or higher order mixtures of ionic or nonionic liquids) play crucial roles in chemical syntheses, separations, and electrochemical devices because they can be tuned for ... [more ▼]

Mixed solvents (i.e., binary or higher order mixtures of ionic or nonionic liquids) play crucial roles in chemical syntheses, separations, and electrochemical devices because they can be tuned for specific reactions and applications. Apart from fully explicit solvation treatments that can be difficult to parameterize or computationally expensive, there is currently no well-established first-principles regimen for reliably modeling atomic-scale chemistry in mixed solvent environments. We offer our perspective on how this process could be achieved in the near future as mixed solvent systems become more explored using theoretical and computational chemistry. We first outline what makes mixed solvent systems far more complex compared to single-component solvents. An overview of current and promising techniques for modeling mixed solvent environments is provided. We focus on so-called hybrid solvation treatments such as the conductor-like screening model for real solvents and the reference interaction site model, which are far less computationally demanding than explicit simulations. We also propose that cluster-continuum approaches rooted in physically rigorous quasi-chemical theory provide a robust, yet practical, route for studying chemical processes in mixed solvents. [less ▲]

Detailed reference viewed: 160 (2 UL)
Full Text
Peer Reviewed
See detailGene family information facilitates variant interpretation and identification of disease-associated genes in neurodevelopmental disorders
Lal, Dennis; May, Patrick UL; Perez-Palma, Eduardo et al

in Genome Medicine (2020), 12(28),

Background: Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs ... [more ▼]

Background: Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on genome-wide scale. We empirically evaluate whether paralog conserved or non-conserved sites in human gene families are important in NDDs. Methods: Gene family information was collected from Ensembl. Paralog conserved sites were defined based on paralog sequence alignments. 10,068 NDD patients and 2,078 controls were statistically evaluated for de novo variant burden in gene families. Results: We demonstrate that disease-associated missense variants are enriched at paralog conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint. Conclusion: This study represents the first method to incorporate gene-family information into a statistical framework to interpret variant data for NDDs and to discover newly NDD -associated genes. [less ▲]

Detailed reference viewed: 83 (2 UL)
Full Text
Peer Reviewed
See detailExcess of singleton loss-of-function variants in Parkinson's disease contributes to genetic risk.
Bobbili, Dheeraj Reddy; Banda, Peter UL; Krüger, Rejko UL et al

in Journal of Medical Genetics (2020)

Background Parkinson’s disease (PD) is a neurodegenerative disorder with complex genetic architecture. Besides rare mutations in high-risk genes related to monogenic familial forms of PD, multiple ... [more ▼]

Background Parkinson’s disease (PD) is a neurodegenerative disorder with complex genetic architecture. Besides rare mutations in high-risk genes related to monogenic familial forms of PD, multiple variants associated with sporadic PD were discovered via association studies. Methods We studied the whole-exome sequencing data of 340 PD cases and 146 ethnically matched controls from the Parkinson’s Progression Markers Initiative (PPMI) and performed burden analysis for different rare variant classes. Disease prediction models were built based on clinical, non-clinical and genetic features, including both common and rare variants, and two machine learning methods. Results We observed a significant exome-wide burden of singleton loss-of-function variants (corrected p=0.037). Overall, no exome-wide burden of rare amino acid changing variants was detected. Finally, we built a disease prediction model combining singleton loss-of-function variants, a polygenic risk score based on common variants, and family history of PD as features and reached an area under the curve of 0.703 (95% CI 0.698 to 0.708). By incorporating a rare variant feature, our model increased the performance of the state-of-the-art classification model for the PPMI dataset, which reached an area under the curve of 0.639 based on common variants alone. Conclusion The main finding of this study is to highlight the contribution of singleton loss-of-function variants to the complex genetics of PD and that disease risk prediction models combining singleton and common variants can improve models built solely on common variants. [less ▲]

Detailed reference viewed: 77 (3 UL)
Full Text
See detailDigraph3 Documentation: Advanced topics
Bisdorff, Raymond UL

Learning material (2020)

In this part of the Digraph3 documentation, we provide an insight in computational enhancements one may get when working in a bipolar-valued epistemic logical framework, like easily coping with missing ... [more ▼]

In this part of the Digraph3 documentation, we provide an insight in computational enhancements one may get when working in a bipolar-valued epistemic logical framework, like easily coping with missing data and uncertain criterion significance weights, computing valued ordinal correlations between bipolar-valued outranking digraphs, solving bipolar-valued Berge kernel equation systems, and testing for stability of outranking statements when facing only ordinal criteria significance weights. [less ▲]

Detailed reference viewed: 27 (1 UL)
Full Text
See detailOptimized Collision Search for STARK-Friendly Hash Challenge Candidates
Udovenko, Aleksei UL

E-print/Working paper (2020)

In this note, we report several solutions to the STARK-Friendly Hash Challenge: a competition with the goal of finding collisions for several hash functions designed specifically for zero-knowledge proofs ... [more ▼]

In this note, we report several solutions to the STARK-Friendly Hash Challenge: a competition with the goal of finding collisions for several hash functions designed specifically for zero-knowledge proofs (ZKP) and multiparty computations (MPC). We managed to find collisions for 3 instances of 91-bit hash functions. The method used is the classic parallel collision search with distinguished points from van Oorshot and Wiener (1994). As this is a general attack on hash functions, it does not exhibit any particular weakness of the chosen hash functions. The crucial part is to optimize the implementations to make the attack cost realistic, and we describe several arithmetic tricks. [less ▲]

Detailed reference viewed: 102 (5 UL)
Full Text
Peer Reviewed
See detailEvolution of Conformation, Nanomechanics, and Infrared Nanospectroscopy of Single Amyloid Fibrils Converting into Microcrystals
Adamcik, Jozef; Ruggeri, Francesco Simone; Berryman, Josh UL et al

in Advanced Science (2020)

Abstract Nanomechanical properties of amyloid fibrils and nanocrystals depend on their secondary and quaternary structure, and the geometry of intermolecular hydrogen bonds. Advanced imaging methods based ... [more ▼]

Abstract Nanomechanical properties of amyloid fibrils and nanocrystals depend on their secondary and quaternary structure, and the geometry of intermolecular hydrogen bonds. Advanced imaging methods based on atomic force microscopy (AFM) have unravelled the morphological and mechanical heterogeneity of amyloids, however a full understanding has been hampered by the limited resolution of conventional spectroscopic methods. Here, it is shown that single molecule nanomechanical mapping and infrared nanospectroscopy (AFM-IR) in combination with atomistic modelling enable unravelling at the single aggregate scale of the morphological, nanomechanical, chemical, and structural transition from amyloid fibrils to amyloid microcrystals in the hexapeptides, ILQINS, IFQINS, and TFQINS. Different morphologies have different Young's moduli, within 2?6 GPa, with amyloid fibrils exhibiting lower Young's moduli compared to amyloid microcrystals. The origins of this stiffening are unravelled and related to the increased content of intermolecular ?-sheet and the increased lengthscale of cooperativity following the transition from twisted fibril to flat nanocrystal. Increased stiffness in Young's moduli is correlated with increased density of intermolecular hydrogen bonding and parallel beta-sheet structure, which energetically stabilize crystals over the other polymorphs. These results offer additional evidence for the position of amyloid crystals in the minimum of the protein folding and aggregation landscape. [less ▲]

Detailed reference viewed: 23 (0 UL)
Full Text
Peer Reviewed
See detailMining Assumptions for Software Components using Machine Learning
Gaaloul, Khouloud UL; Menghi, Claudio UL; Nejati, Shiva UL et al

in Proceedings of the The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (2020)

Software verification approaches aim to check a software component under analysis for all possible environments. In reality, however, components are expected to operate within a larger system and are ... [more ▼]

Software verification approaches aim to check a software component under analysis for all possible environments. In reality, however, components are expected to operate within a larger system and are required to satisfy their requirements only when their inputs are constrained by environment assumptions. In this paper, we propose EPIcuRus, an approach to automatically synthesize environment assumptions for a component under analysis (i.e., conditions on the component inputs under which the component is guaranteed to satisfy its requirements). EPIcuRus combines search-based testing, machine learning and model checking. The core of EPIcuRus is a decision tree algorithm that infers environment assumptions from a set of test results including test cases and their verdicts. The test cases are generated using search-based testing, and the assumptions inferred by decision trees are validated through model checking. In order to improve the efficiency and effectiveness of the assumption generation process, we propose a novel test case generation technique, namely Important Features Boundary Test (IFBT), that guides the test generation based on the feedback produced by machine learning. We evaluated EPIcuRus by assessing its effectiveness in computing assumptions on a set of study subjects that include 18 requirements of four industrial models. We show that, for each of the 18 requirements, EPIcuRus was able to compute an assumption to ensure the satisfaction of that requirement, and further, ≈78% of these assumptions were computed in one hour. [less ▲]

Detailed reference viewed: 262 (120 UL)
Full Text
Peer Reviewed
See detailApproximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: An Approach Based on System Identification
Menghi, Claudio UL; Nejati, Shiva UL; Briand, Lionel UL et al

in Proceedings of the 42nd International Conference on Software Engineering (2020)

Black-box testing has been extensively applied to test models of Cyber-Physical systems (CPS) since these models are not often amenable to static and symbolic testing and verification. Black-box testing ... [more ▼]

Black-box testing has been extensively applied to test models of Cyber-Physical systems (CPS) since these models are not often amenable to static and symbolic testing and verification. Black-box testing, however, requires to execute the model under test for a large number of candidate test inputs. This poses a challenge for a large and practically-important category of CPS models, known as compute-intensive CPS (CI-CPS) models, where a single simulation may take hours to complete. We propose a novel approach, namely ARIsTEO, to enable effective and efficient testing of CI-CPS models. Our approach embeds black-box testing into an iterative approximation-refinement loop. At the start, some sampled inputs and outputs of the CI-CPS model under test are used to generate a surrogate model that is faster to execute and can be subjected to black-box testing. Any failure-revealing test identified for the surrogate model is checked on the original model. If spurious, the test results are used to refine the surrogate model to be tested again. Otherwise, the test reveals a valid failure. We evaluated ARIsTEO by comparing it with S-Taliro, an open-source and industry-strength tool for testing CPS models. Our results, obtained based on five publicly-available CPS models, show that, on average, ARIsTEO is able to find 24% more requirements violations than S-Taliro and is 31% faster than S-Taliro in finding those violations. We further assessed the effectiveness and efficiency of ARIsTEO on a large industrial case study from the satellite domain. In contrast to S-Taliro, ARIsTEO successfully tested two different versions of this model and could identify three requirements violations, requiring four hours, on average, for each violation. [less ▲]

Detailed reference viewed: 101 (39 UL)
Full Text
See detailMarkov Chain Monte Carlo and the Application to Geodetic Time Series Analysis
Olivares Pulido, German UL; Teferle, Felix Norman UL; Hunegnaw, Addisu UL

in Montillet, Jean-Philippe; Bos, Machiel (Eds.) Geodetic Time Series Analysis in Earth Sciences (2020)

The time evolution of geophysical phenomena can be characterised by stochastic time series. The stochastic nature of the signal stems from the geophysical phenomena involved and any noise, which may be ... [more ▼]

The time evolution of geophysical phenomena can be characterised by stochastic time series. The stochastic nature of the signal stems from the geophysical phenomena involved and any noise, which may be due to, e.g., un-modelled effects or measurement errors. Until the 1990's, it was usually assumed that white noise could fully characterise this noise. However, this was demonstrated to be not the case and it was proven that this assumption leads to underestimated uncertainties of the geophysical parameters inferred from the geodetic time series. Therefore, in order to fully quantify all the uncertainties as robustly as possible, it is imperative to estimate not only the deterministic but also the stochastic parameters of the time series. In this regard, the Markov Chain Monte Carlo (MCMC) method can provide a sample of the distribution function of all parameters, including those regarding the noise, e.g., spectral index and amplitudes. After presenting the MCMC method and its implementation in our MCMC software we apply it to synthetic and real time series and perform a cross-evaluation using Maximum Likelihood Estimation (MLE) as implemented in the CATS software. Several examples as to how the MCMC method performs as a parameter estimation method for geodetic time series are given in this chapter. These include the applications to GPS position time series, superconducting gravity time series and monthly mean sea level (MSL) records, which all show very different stochastic properties. The impact of the estimated parameter uncertainties on sub-sequentially derived products is briefly demonstrated for the case of plate motion models. Finally, the MCMC results for weekly downsampled versions of the benchmark synthetic GNSS time series as provided in Chapter 2 are presented separately in an appendix. [less ▲]

Detailed reference viewed: 49 (2 UL)