Analytical Robustness

Aczel et al. (2026)

Aczel, B., Szaszi, B., Clelland, H. T., Kovacs, M., Holzmeister, F., Ravenzwaaij, D. van, et al. (2026). Investigating the analytical robustness of the social and behavioural sciences. Nature 652, 135–142. doi: 10.1038/s41586-025-09844-9

Analytical Robustness (2)

Fig. 2

An Example Data Set

Table 1: Data provided by Andrew Gelman (Gelman and Hill, 2007).
mom_iq kid_score
1 121.1 65
2 89.4 98
3 115.4 85
4 99.4 83
5 92.7 115
6..429
430 84.9 94
431 93.0 76
432 94.9 50
433 96.9 88
434 91.3 70

The Research Question

Is there a relation between the IQ and the test score?

What would be the best method to answer this question?

The Answer Is:

It depends…

  • …on what you REALLY want to find out
  • …on your (alternative) hypothesis
  • …on the intended audience
  • …on the expected effect size
  • …on the possible sample size
  • …on the analyst’s knowledge & tendencies

The Methods Toolbox

One Grouping Variable

Location

Test the scores of kids with low-IQ moms against the scores of kids with high-IQ moms

  • t-test (power loss if uneq SS & uneq var)
  • Welch’s test
  • One-way ANOVA (power loss if uneq SS)

Distribution

Test the scores of kids with low-IQ moms against the scores of kids with high-IQ moms

  • Mann-Whitney U
    a.k.a. Wilcoxon rank sum test

Continuous Data

Dependence

  • Correlation
  • Regression

Categorial Data

Independence

Test (high-IQ moms vs low-IQ moms) vs (high-score kids vs low-score kids)

  • Pearson’s \(\chi^2\)-test

Relation Between Mom’s IQ and Child’s Test Score?

Relation Between Mom’s IQ and Child’s Test Score?

Relation Between Mom’s IQ and Child’s Test Score?

Relation Between Mom’s IQ and Child’s Test Score?

Relation Between Mom’s IQ and Child’s Test Score?

Location Difference Between Low and High IQ

  • ‘Make’ a new variable, high_iq, set to 1 for moms with IQ >= 100, 0 otherwise
  • Assess your test’s assumptions (distribution, variance, sample sizes!)

    Welch Two Sample t-test

data:  kid_score by high_iq
t = -9, df = 431, p-value <2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -19.3 -12.2
sample estimates:
mean in group 0 mean in group 1 
           79.7            95.5 

Distribution Difference between Low and High IQ


    Wilcoxon rank sum test with continuity correction

data:  kid_score by high_iq
W = 12716, p-value = 4e-16
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -19 -12
sample estimates:
difference in location 
                   -15 

Always plot the distributions of both groups, as the Mann-Whitney test can lead you to falsely reject the Null when both distributions’ shape and spread are different, yet their medians are identical!

Dependence

Correlation

  • Continuous data
  • How are mom_iq and kid_score associated?
  • Pearson’s product-moment correlation coefficient \(r\)
corr_coeff <- cor(
  kids$mom_iq, 
  kids$kid_score,
  method = "pearson"
)
0.448

Dependence

Regression

  • Continuous data
  • Centered (or standardized) \(X\) variable
  • Can mom_iq explain/predict kid_score?

  • \(Y_i = \mathcal{f}(X_i, \beta) + \mathcal{e}_i\)
  • Ordinary least squares (OLS) regression
  • \(\mathcal{f}(X_i, \beta) = \beta_0 + \beta_1X_i\)

    (Intercept) standard_mom_iq 
          86.80            0.61 

Independence

By dichotomizing both mom_iq and kid_score we can test independence:

       high_score
high_iq   0   1
      0 138 101
      1  48 147

    Pearson's Chi-squared test with Yates' continuity correction

data:  tabyl(kids, high_iq, high_score)
X-squared = 47, df = 1, p-value = 8e-12

Summary

id Test (adjusted) Cohen's d SE
1 Welch Two Sample t-test 0.834 0.102
2 Wilcoxon rank sum test with continuity correction 0.546 NA
3 Correlation 1.003 0.002
4 OLS regression 1.003 0.000
5 Pearson's Chi-squared test with Yates' continuity correction 2.307 NA

Calculation of (adjusted) Cohen’s d after Borenstein (2009)

Key Takeaways

Given the differing results by the different methods:

  • What is the research question, the hypothesis?
  • Consider power aspects
  • Who is the audience?
  • When reading a paper: ask yourself the same questions!
  • Multiverse analyses as good practice

References

Aczel, B., Szaszi, B., Clelland, H. T., Kovacs, M., Holzmeister, F., Ravenzwaaij, D. van, et al. (2026). Investigating the analytical robustness of the social and behavioural sciences. Nature 652, 135–142. doi: 10.1038/s41586-025-09844-9
Borenstein, M. (2009). Introduction to meta-analysis. Chichester, U.K: John Wiley & Sons.
Gelman, A., and Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models., 1st Edn, eds. R. M. Alvarez, N. L. Beck, and L. L. Wu. Cambridge University Press.