References of "Zeller, Andreas"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailInputs from Hell: Learning Input Distributions for Grammar-Based Test Generation
Soremekun, Ezekiel UL; Pavese, Esteban; Havrikov, Nikolas et al

in IEEE Transactions on Software Engineering (in press)

Grammars can serve as producers for structured test inputs that are syntactically correct by construction. A probabilistic grammar assigns probabilities to individual productions, thus controlling the ... [more ▼]

Grammars can serve as producers for structured test inputs that are syntactically correct by construction. A probabilistic grammar assigns probabilities to individual productions, thus controlling the distribution of input elements. Using the grammars as input parsers, we show how to learn input distributions from input samples, allowing to create inputs that are similar to the sample; by inverting the probabilities, we can create inputs that are dissimilar to the sample. This allows for three test generation strategies: 1) “Common inputs” – by learning from common inputs, we can create inputs that are similar to the sample; this is useful for regression testing. 2) “Uncommon inputs” – learning from common inputs and inverting probabilities yields inputs that are strongly dissimilar to the sample; this is useful for completing a test suite with “inputs from hell” that test uncommon features, yet are syntactically valid. 3) “Failure-inducing inputs” – learning from inputs that caused failures in the past gives us inputs that share similar features and thus also have a high chance of triggering bugs; this is useful for testing the completeness of fixes. Our evaluation on three common input formats (JSON, JavaScript, CSS) shows the effectiveness of these approaches. Results show that “common inputs” reproduced 96% of the methods induced by the samples. In contrast, for almost all subjects (95%), the “uncommon inputs” covered significantly different methods from the samples. Learning from failure-inducing samples reproduced all exceptions (100%) triggered by the failure-inducing samples and discovered new exceptions not found in any of the samples learned from. [less ▲]

Detailed reference viewed: 73 (2 UL)
Full Text
Peer Reviewed
See detailWhen does my program do this? learning circumstances of software behavior
Kampmann, Alexander; Havrikov, Nikolas; Soremekun, Ezekiel UL et al

in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2020, November 08)

We introduce Alhazen — an approach that automatically determines the circumstances under which a particular program behavior, such as a failure, takes place. Alhazen starts with a run that exhibits this ... [more ▼]

We introduce Alhazen — an approach that automatically determines the circumstances under which a particular program behavior, such as a failure, takes place. Alhazen starts with a run that exhibits this behavior and automatically determines input features associated with the behavior in question: (1) We use a grammar to parse the input into individual elements. (2) We determine features from the elements such as existence, length, or numerical values. (3) We use a decision tree learner to observe and learn which input features are associated with the behavior in question. (4) We use the grammar to generate additional inputs to further strengthen or refute hypotheses as learned associations. (5) By repeating steps 2 to 4, we obtain a theory that explains and predicts the given behavior. In our evaluation using inputs for find, grep, NetHack, and a JavaScript transpiler, the theories produced by Alhazen predict and produce failures with high accuracy and allow developers to focus on a small set of input features: “grep fails whenever the --fixed-strings option is used in conjunction with an empty search string.” [less ▲]

Detailed reference viewed: 43 (1 UL)
Full Text
Peer Reviewed
See detailAbstracting Failure-Inducing Inputs
Gopinath, Rahul; Kampmann, Alexander; Havrikov, Nikolas et al

in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (2020, July 18)

A program fails. Under which circumstances does the failure occur? Starting with a single failure-inducing input ("The input ((4)) fails") and an input grammar, the DDSET algorithm uses systematic tests ... [more ▼]

A program fails. Under which circumstances does the failure occur? Starting with a single failure-inducing input ("The input ((4)) fails") and an input grammar, the DDSET algorithm uses systematic tests to automatically generalize the input to an abstract failure-inducing input that contains both (concrete) terminal symbols and (abstract) nonterminal symbols from the grammar - for instance, "((⟨expr⟩))", which represents any expression ⟨expr⟩ in double parentheses. Such an abstract failure-inducing input can be used (1) as a debugging diagnostic, characterizing the circumstances under which a failure occurs ("The error occurs whenever an expression is enclosed in double parentheses"); (2) as a producer of additional failure-inducing tests to help design and validate fixes and repair candidates ("The inputs ((1)), ((3 * 4)), and many more also fail"). In its evaluation on real-world bugs in JavaScript, Clojure, Lua, and UNIX command line utilities, DDSET’s abstract failure-inducing inputs provided to-the-point diagnostics, and precise producers for further failure inducing inputs. [less ▲]

Detailed reference viewed: 49 (0 UL)
Full Text
Peer Reviewed
See detailDebugging Inputs
Kirschner, Lukas; Soremekun, Ezekiel UL; Zeller, Andreas

in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (2020, June)

When a program fails to process an input, it need not be the program code that is at fault. It can also be that the input data is faulty, for instance as result of data corruption. To get the data ... [more ▼]

When a program fails to process an input, it need not be the program code that is at fault. It can also be that the input data is faulty, for instance as result of data corruption. To get the data processed, one then has to debug the input data—that is, (1) identify which parts of the input data prevent processing, and (2) recover as much of the (valuable) input data as possible. In this paper, we present a general-purpose algorithm called ddmax that addresses these problems automatically. Through experiments, ddmax maximizes the subset of the input that can still be processed by the program, thus recovering and repairing as much data as possible; the difference between the original failing input and the “maximized” passing input includes all input fragments that could not be processed. To the best of our knowledge, ddmax is the first approach that fixes faults in the input data without requiring program analysis. In our evaluation, ddmax repaired about 69% of input files and recovered about 78% of data within one minute per input. [less ▲]

Detailed reference viewed: 25 (2 UL)
Full Text
Peer Reviewed
See detailBurden of cardiovascular disease across 29 countries and GPs’ decision to treat hypertension in oldest-old
Streit, Sven; Gussekloo, Jacobijn; Burman, Robert A. et al

in Scandinavian Journal of Primary Health Care (2018)

Objectives: We previously found large variations in general practitioner (GP) hypertension treatment probability in oldest-old (>80 years) between countries. We wanted to explore whether differences in ... [more ▼]

Objectives: We previously found large variations in general practitioner (GP) hypertension treatment probability in oldest-old (>80 years) between countries. We wanted to explore whether differences in country-specific cardiovascular disease (CVD) burden and life expectancy could explain the differences. Design: This is a survey study using case-vignettes of oldest-old patients with different comorbidities and blood pressure levels. An ecological multilevel model analysis was performed. Setting: GP respondents from European General Practice Research Network (EGPRN) countries, Brazil and New Zeeland. Subjects: This study included 2543 GPs from 29 countries. Main outcome measures: GP treatment probability to start or not start antihypertensive treatment based on responses to case-vignettes; either low (<50% started treatment) or high (!50% started treatment). CVD burden is defined as ratio of disability-adjusted life years (DALYs) lost due to ischemic heart disease and/or stroke and total DALYs lost per country; life expectancy at age 60 and prevalence of oldest-old per country. Results: Of 1947 GPs (76%) responding to all vignettes, 787 (40%) scored high treatment probability and 1160 (60%) scored low. GPs in high CVD burden countries had higher odds of treatment probability (OR 3.70; 95% confidence interval (CI) 3.00–4.57); in countries with low life expectancy at 60, CVD was associated with high treatment probability (OR 2.18, 95% CI 1.12–4.25); but not in countries with high life expectancy (OR 1.06, 95% CI 0.56–1.98). Conclusions: GPs’ choice to treat/not treat hypertension in oldest-old was explained by differences in country-specific health characteristics. GPs in countries with high CVD [less ▲]

Detailed reference viewed: 124 (4 UL)
Full Text
Peer Reviewed
See detailVariation in GP decisions on antihypertensive treatment in oldest-old and frail individuals across 29 countries
Streit, Sven; Verschoor, Marjolein; Rodondi, Nicolas et al

in BMC Geriatrics (2017)

Background In oldest-old patients (>80), few trials showed efficacy of treating hypertension and they included mostly the healthiest elderly. The resulting lack of knowledge has led to inconsistent ... [more ▼]

Background In oldest-old patients (>80), few trials showed efficacy of treating hypertension and they included mostly the healthiest elderly. The resulting lack of knowledge has led to inconsistent guidelines, mainly based on systolic blood pressure (SBP), cardiovascular disease (CVD) but not on frailty despite the high prevalence in oldest-old. This may lead to variation how General Practitioners (GPs) treat hypertension. Our aim was to investigate treatment variation of GPs in oldest-olds across countries and to identify the role of frailty in that decision. Methods Using a survey, we compared treatment decisions in cases of oldest-old varying in SBP, CVD, and frailty. GPs were asked if they would start antihypertensive treatment in each case. In 2016, we invited GPs in Europe, Brazil, Israel, and New Zealand. We compared the percentage of cases that would be treated per countries. A logistic mixed-effects model was used to derive odds ratio (OR) for frailty with 95% confidence intervals (CI), adjusted for SBP, CVD, and GP characteristics (sex, location and prevalence of oldest-old per GP office, and years of experience). The mixed-effects model was used to account for the multiple assessments per GP. Results The 29 countries yielded 2543 participating GPs: 52% were female, 51% located in a city, 71% reported a high prevalence of oldest-old in their offices, 38% and had >20 years of experience. Across countries, considerable variation was found in the decision to start antihypertensive treatment in the oldest-old ranging from 34 to 88%. In 24/29 (83%) countries, frailty was associated with GPs’ decision not to start treatment even after adjustment for SBP, CVD, and GP characteristics (OR 0.53, 95%CI 0.48–0.59; ORs per country 0.11–1.78). Conclusions Across countries, we found considerable variation in starting antihypertensive medication in oldest-old. The frail oldest-old had an odds ratio of 0.53 of receiving antihypertensive treatment. Future hypertension trials should also include frail patients to acquire evidence on the efficacy of antihypertensive treatment in oldest-old patients with frailty, with the aim to get evidence-based data for clinical decision-making. [less ▲]

Detailed reference viewed: 162 (2 UL)
Full Text
Peer Reviewed
See detailGenerating Unit Tests with Structured System Interactions
Havrikov, Nikolas; Gambi, Alessio; Zeller, Andreas et al

in IEEE/ACM International Workshop on Automation of Software Test (AST) (2017)

There is a large body of work in the literature about automatic unit tests generation, and many successful results have been reported so far. However, current approaches target library classes, but not ... [more ▼]

There is a large body of work in the literature about automatic unit tests generation, and many successful results have been reported so far. However, current approaches target library classes, but not full applications. A major obstacle for testing full applications is that they interact with the environment. For example, they establish connections to remote servers. Thoroughly testing such applications requires tests that completely control the interactions between the application and its environment. Recent techniques based on mocking enable the generation of tests which include environment interactions; however, generating the right type of interactions is still an open problem. In this paper, we describe a novel approach which addresses this problem by enhancing search-based testing with complex test data generation. Experiments on an artificial system show that the proposed approach can generate effective unit tests. Compared with current techniques based on mocking, we generate more robust unit tests which achieve higher coverage and are, arguably, easier to read and understand. [less ▲]

Detailed reference viewed: 105 (4 UL)
Full Text
Peer Reviewed
See detailSearch-based Security Testing of Web Applications
Thome, Julian UL; Gorla, Alessandra; Zeller, Andreas

in SBST 2014 Proceedings of the 7th International Workshop on Search-Based Software Testing (2014)

Detailed reference viewed: 241 (45 UL)