Recommendations for analysing and meta-analysing small sample size software engineering experiments

Title	Recommendations for analysing and meta-analysing small sample size software engineering experiments
Author(s)	B.A. Kitchenham and L. Madeyski
Details	Empir Software Eng 29, 137 (2024)
Abstract	Context Software engineering (SE) experiments often have small sample sizes. This can result in data sets with non-normal characteristics, which poses problems as standard parametric meta-analysis, using the standardized mean difference (StdMD) effect size, assumes normally distributed sample data. Small sample sizes and non-normal data set characteristics can also lead to unreliable estimates of parametric effect sizes. Meta-analysis is even more complicated if experiments use complex experimental designs, such as two-group and four-group cross-over designs, which are popular in SE experiments. Objective Our objective was to develop a validated and robust meta-analysis method that can help to address the problems of small sample sizes and complex experimental designs without relying upon data samples being normally distributed. Method To illustrate the challenges, we used real SE data sets. We built upon previous research and developed a robust meta-analysis method able to deal with challenges typical for SE experiments. We validated our method via simulations comparing StdMD with two robust alternatives: the probability of superiority () and Cliffs’ d. Results We confirmed that many SE data sets are small and that small experiments run the risk of exhibiting non-normal properties, which can cause problems for analysing families of experiments. For simulations of individual experiments and meta-analyses of families of experiments, and Cliff’s d consistently outperformed StdMD in terms of negligible small sample bias. They also had better power for log-normal and Laplace samples, although lower power for normal and gamma samples. Tests based on always had better or equal power than tests based on Cliff’s d, and across all but one simulation condition, Type 1 error rates were less biased. Conclusions Using is a low-risk option for analysing and meta-analysing data from small sample-size SE randomized experiments. Parametric methods are only preferable if you have prior knowledge of the data distribution.
DOI	https://doi.org/10.1007/s10664-024-10504-1
BibTex	@article{cite-key, abstract = {Software engineering (SE) experiments often have small sample sizes. This can result in data sets with non-normal characteristics, which poses problems as standard parametric meta-analysis, using the standardized mean difference (StdMD) effect size, assumes normally distributed sample data. Small sample sizes and non-normal data set characteristics can also lead to unreliable estimates of parametric effect sizes. Meta-analysis is even more complicated if experiments use complex experimental designs, such as two-group and four-group cross-over designs, which are popular in SE experiments.}, author = {Kitchenham, Barbara and Madeyski, Lech}, date = {2024/08/17}, date-added = {2024-09-05 14:42:28 +0100}, date-modified = {2024-09-05 14:42:28 +0100}, doi = {10.1007/s10664-024-10504-1}, id = {Kitchenham2024}, isbn = {1573-7616}, journal = {Empirical Software Engineering}, number = {6}, pages = {137}, title = {Recommendations for analysing and meta-analysing small sample size software engineering experiments}, url = {https://doi.org/10.1007/s10664-024-10504-1}, volume = {29}, year = {2024}, bdsk-url-1 = {https://doi.org/10.1007/s10664-024-10504-1}}
Topics	Meta-analysis, Effect Size, Software Engineering