Is Cliff’s δ more robust to kurtosis than robust Cohen’s d?

Thumbnail Image
Duguay, Kit
Journal Title
Journal ISSN
Volume Title
As null hypothesis significance testing (NHST) has been criticized as the sole arbiter of a study’s worth, measuring effect sizes has taken greater prominence in psychological research. There are 3 effect size measures (i.e., Cohen’s d, robust Cohen’s dr, and Cliff’s δ) that could compare the difference between two groups of observations (e.g., boys/girls difference on weight). Cohen’s d (Cohen, 1997) is arguably the most widely employed one, but its’ accuracy or robustness depends upon some data assumptions (e.g., the scores such as weight needs to follow normal distribution). In fact, these violations of normality are to be expected when observing data in many research practices (Lyon, 2004). Robust Cohen’s dr and Cliff’s (1993) δ, are proposed as alternatives that are robust to violations of those data assumptions. Cliff’s (1993) δ describes ordinal data and as such is robust to violations of normality, and robust Cohen’s dr is a proposed modification to Cohen’s d that trims and winsorizes the data to normalize it (Algina et al., 2005). However, there is no single study that compares and evaluates the robustness of the 3 effect size measures in one single Monte Carlo experiment. Through analyzing data sets created via Monte Carlo simulation—a computer-based experiment that addressed a design with a total of 4 levels (sample sizes) x 4 levels (effect sizes) x 2 levels (normal and mixed-normal) = 32 manipulated levels with 1,000 replications (i.e., generating a total of 32 000 simulated datasets for evaluation)—it has been shown that Cliff’s δ is more robust to violations of normality than Cohen’s d, and Cohen’s dr is more robust to violations of normality than Cliff’s δ. However, despite the higher level of robustness shown by Cohen’s dr, Cliff’s δ uses cases independent of Cohen’s d and the high level of data trimming suggested by Algina et al. (2005) may not be epistemologically sound when used as the default option for measuring effect size.
effect sizes, Cohen's d, Cliff’s δ, robust statistics