Variation Factors in the Design and Analysis of Replicated Controlled Experiments - Three (Dis)similar Studies on Inspections versus Unit Testing
Författare
Summary, in English
Background. In formal experiments on software engineering, the number of factors that may impact an outcome is very high. Some factors are controlled and change by design, while others are are either unforeseen or due to chance.
Aims. This paper aims to explore how context factors change in a series of for- mal experiments and to identify implications for experimentation and replication practices to enable learning from experimentation.
Method. We analyze three experiments on code inspections and structural unit testing. The first two experiments use the same experimental design and instrumentation (replication), while the third, conducted by different researchers, replaces the programs and adapts defect detection methods accordingly (reproduction). Experimental procedures and location also differ between the experiments.
Results. Contrary to expectations, there are significant differences between the original experiment and the replication, as well as compared to the reproduction. Some of the differences are due to factors other than the ones designed to vary between experiments, indicating the sensitivity to context factors in software engineering experimentation.
Conclusions. In aggregate, the analysis indicates that reducing the complexity of software engineering experiments should be considered by researchers who want to obtain reliable and repeatable empirical measures.
Aims. This paper aims to explore how context factors change in a series of for- mal experiments and to identify implications for experimentation and replication practices to enable learning from experimentation.
Method. We analyze three experiments on code inspections and structural unit testing. The first two experiments use the same experimental design and instrumentation (replication), while the third, conducted by different researchers, replaces the programs and adapts defect detection methods accordingly (reproduction). Experimental procedures and location also differ between the experiments.
Results. Contrary to expectations, there are significant differences between the original experiment and the replication, as well as compared to the reproduction. Some of the differences are due to factors other than the ones designed to vary between experiments, indicating the sensitivity to context factors in software engineering experimentation.
Conclusions. In aggregate, the analysis indicates that reducing the complexity of software engineering experiments should be considered by researchers who want to obtain reliable and repeatable empirical measures.
Avdelning/ar
Publiceringsår
2014
Språk
Engelska
Sidor
1781-1808
Publikation/Tidskrift/Serie
Empirical Software Engineering
Volym
19
Issue
6
Fulltext
- Available as PDF - 400 kB
- Download statistics
Dokumenttyp
Artikel i tidskrift
Förlag
Springer
Ämne
- Computer Science
Nyckelord
- formal experiments
- replication
- reproduction
- experiment design
- code inspection
- unit testing
Status
Published
ISBN/ISSN/Övrigt
- ISSN: 1573-7616