Post-Hoc Tests Explained: Complete Guide with Examples

Introduction to Post-Hoc Tests

Post-hoc tests are statistical procedures used after an ANOVA (Analysis of Variance) to determine which specific groups differ from each other when the overall ANOVA indicates significant differences exist. While ANOVA tells us that at least one group is different, post-hoc tests pinpoint exactly where those differences lie.

Key Concept: Post-hoc tests address the problem of multiple comparisons that arises when comparing multiple group means simultaneously. Without proper correction, the chance of making a Type I error (false positive) increases dramatically.

In this comprehensive guide, we'll explore different types of post-hoc tests, when to use each one, how to interpret results, and common pitfalls to avoid in statistical analysis.

What Are Post-Hoc Tests?

Post-hoc tests (Latin for "after this") are follow-up tests conducted after an ANOVA reveals a statistically significant result. They help researchers identify which specific group means are significantly different from each other.

The Multiple Comparisons Problem

When comparing k groups, the number of possible pairwise comparisons is:

C(k, 2) = k(k-1)/2

For example, with 5 groups, you have 10 possible comparisons. If you test each at α = 0.05, the family-wise error rate becomes:

1 - (1 - 0.05)¹⁰ ≈ 0.40

This means there's a 40% chance of at least one Type I error!

Example Scenario:

A researcher tests 4 different teaching methods (A, B, C, D) on student performance. ANOVA shows significant differences (p < 0.05). Post-hoc tests would answer:

Is method A significantly better than B?
Is method C significantly better than D?
Which method performs best overall?

With Post-Hoc Tests

Controls family-wise error rate

Identifies specific group differences

Provides confidence intervals

Without Post-Hoc Tests

High risk of Type I errors

Only knows "some difference exists"

No specific group comparisons

Put theory into practice by solving ANOVA-based problems on the anova-calculator.

When to Use Post-Hoc Tests

Post-hoc tests should only be used under specific conditions and after certain statistical procedures:

✅

After Significant ANOVA

Use post-hoc tests only when your ANOVA shows a statistically significant result (p < α). If ANOVA is not significant, post-hoc tests are not justified.

Exception: Planned comparisons can be conducted regardless of ANOVA results.

📊

Three or More Groups

Post-hoc tests are designed for situations with three or more groups. With only two groups, a simple t-test suffices.

Note: Some post-hoc tests can handle complex comparisons beyond simple pairwise tests.

🔍

Exploratory Research

Ideal for exploratory studies where you don't have specific hypotheses about which groups will differ.

Alternative: For confirmatory research with specific hypotheses, use planned comparisons instead.

⚖️

Equal Variance Assumption

Most post-hoc tests assume equal variances between groups. If this assumption is violated, use tests like Games-Howell that don't require equal variances.

Check: Use Levene's test to verify equal variances.

Post-Hoc Test Decision Guide

Number of Groups

Equal Variances?

Enter your parameters and click "Suggest Test"

Tukey's Honestly Significant Difference (HSD) Test

Tukey's HSD is one of the most commonly used post-hoc tests. It compares all possible pairs of means while controlling the family-wise error rate.

Tukey's HSD Formula

The test statistic for Tukey's HSD is calculated as:

HSD = q_α,k,df × √(MS_error/n)

Where:

q is the studentized range statistic
α is the significance level
k is the number of groups
df is the degrees of freedom for error
MS_error is the mean square error from ANOVA
n is the sample size per group (for equal sample sizes)

Example Interpretation:

Suppose we have 4 groups with means: A=10, B=12, C=15, D=18. Tukey's HSD might show:

A vs B: p = 0.25 (not significant)
A vs C: p = 0.02 (significant)
A vs D: p = 0.001 (significant)
B vs C: p = 0.08 (not significant)
B vs D: p = 0.01 (significant)
C vs D: p = 0.15 (not significant)

Advantages

Controls family-wise error rate
Easy to interpret
Works well with equal sample sizes
Provides confidence intervals

Limitations

Assumes equal variances
Less powerful with unequal sample sizes
Conservative with many groups
Only for pairwise comparisons

Explore real-world statistical modeling and test your knowledge with the anova-calculator.

Bonferroni Correction

The Bonferroni correction is a simple but conservative method for controlling the family-wise error rate in multiple comparisons.

Bonferroni Adjustment

The Bonferroni correction adjusts the significance level by dividing it by the number of comparisons:

α_adjusted = α / m

Where:

α is the original significance level (usually 0.05)
m is the number of comparisons being made

Alternatively, you can adjust the p-values:

p_adjusted = p × m

Example Calculation:

If you're making 5 comparisons with α = 0.05:

α_adjusted = 0.05 / 5 = 0.01
Each comparison must have p < 0.01 to be significant

If you have a p-value of 0.03 for one comparison:

p_adjusted = 0.03 × 5 = 0.15
This would not be significant after correction

When to Use Bonferroni

Small number of comparisons

Unequal sample sizes

Non-orthogonal comparisons

When to Avoid Bonferroni

Many comparisons

Highly correlated tests

When power is important

Scheffé's Method

Scheffé's method is a very conservative post-hoc test that allows for any possible contrast between group means, not just pairwise comparisons.

Scheffé's Test Statistic

The test statistic for Scheffé's method is:

F_Scheffé = (k-1) × F_α,k-1,df

Where:

k is the number of groups
F_α,k-1,df is the critical F-value from the ANOVA
df is the degrees of freedom for error

A comparison is significant if its F-ratio exceeds F_Scheffé.

Example Application:

Suppose you have groups A, B, C, D and want to test if the average of A and B differs from the average of C and D. Scheffé's method allows this complex comparison.

This flexibility comes at the cost of reduced power for simple pairwise comparisons compared to Tukey's HSD.

Advantages

Allows any contrast, not just pairwise
Very conservative Type I error control
Robust to data snooping
Works with unequal sample sizes

Limitations

Low power for pairwise comparisons
Overly conservative for many applications
Complex calculations
Not ideal for simple group comparisons

Improve your data analysis skills through the anova-calculator.

Dunnett's Test

Dunnett's test is designed specifically for comparing multiple treatment groups to a single control group, which is common in experimental research.

Dunnett's Test Formula

The test statistic for Dunnett's test is similar to t-test but uses a special critical value:

t_Dunnett = (X̄_treatment - X̄_control) / √(MS_error × (1/n_t + 1/n_c))

Where:

X̄_treatment is the mean of a treatment group
X̄_control is the mean of the control group
MS_error is the mean square error from ANOVA
n_t and n_c are sample sizes

Example Scenario:

A pharmaceutical company tests 3 new drugs against a placebo. Dunnett's test would compare:

Drug A vs Placebo
Drug B vs Placebo
Drug C vs Placebo

But it would not compare Drug A vs Drug B directly.

Post-Hoc Test Comparison

Test	Best For	Type I Error Control	Power	Complex Comparisons
Tukey's HSD	All pairwise comparisons	Good	High	No
Bonferroni	Few comparisons	Very good	Low	Yes
Scheffé	Complex contrasts	Excellent	Very low	Yes
Dunnett	vs control group	Good	High	No

Interpreting Post-Hoc Test Results

Proper interpretation of post-hoc test results is crucial for drawing valid conclusions from your research.

Reading Output

Most statistical software provides output similar to this example from Tukey's HSD:

  Comparison    Difference   Lower CI   Upper CI   p-value
  A - B         2.34        -0.56      5.24       0.145
  A - C         5.67        2.77       8.57       0.001*
  A - D         8.91        5.01       11.81      0.000*
  B - C         3.33        0.43       6.23       0.021*
  B - D         6.57        3.67       9.47       0.000*
  C - D         3.24        0.34       6.14       0.028*
            

Interpretation: Groups with significant differences (p < 0.05) are marked with *. A and C differ significantly, as do A and D, B and C, etc.

Key Interpretation Points

Focus on p-values and confidence intervals
Consider effect sizes, not just significance
Report adjusted p-values, not raw ones
Note which specific groups differ
Consider practical significance

Common Presentation Formats

Summary tables with significance indicators
Compact letter displays (A, B, C grouping)
Graphical representations (error bars)
Matrix formats showing all comparisons

Compact Letter Display Example:

Groups that share a letter are not significantly different:

Group A: 10.2 (a)
Group B: 12.5 (a,b)
Group C: 15.8 (b,c)
Group D: 18.3 (c)

Interpretation: A and B don't differ (both have 'a'), B and C don't differ (both have 'b'), but A and C do differ (no shared letter).

Challenge yourself with real statistical data problems using the anova-calculator.

Common Mistakes with Post-Hoc Tests

Avoid these common pitfalls when using post-hoc tests in your statistical analysis:

❌

Using When ANOVA Not Significant

Mistake: Conducting post-hoc tests when the overall ANOVA is not significant.

Solution: Only use post-hoc tests after a significant ANOVA result, unless you have specific planned comparisons.

❌

Ignoring Assumptions

Mistake: Using tests that assume equal variances when variances are unequal.

Solution: Check homogeneity of variances with Levene's test and use appropriate tests like Games-Howell if violated.

❌

Data Snooping

Mistake: Conducting multiple tests without correction after seeing the data.

Solution: Plan your comparisons in advance or use appropriate post-hoc corrections.

❌

Misinterpreting p-values

Mistake: Interpreting post-hoc p-values as if they were from individual t-tests.

Solution: Remember that post-hoc p-values are adjusted for multiple comparisons.

Best Practices Checklist

✓ Only use post-hoc tests after significant ANOVA
✓ Check assumptions (normality, equal variances)
✓ Choose the appropriate test for your research question
✓ Report adjusted p-values, not raw p-values
✓ Consider effect sizes alongside significance
✓ Use confidence intervals to show magnitude of differences
✓ Be transparent about all comparisons made

Measure your progress with applied ANOVA tasks using the anova-calculator.

Software Implementation

Most statistical software packages provide built-in functions for conducting post-hoc tests. Here are examples for common platforms:

R Implementation

  # Tukey's HSD test

  model <- aov(response ~ group, data = mydata)

  tukey.result <- TukeyHSD(model)

  print(tukey.result)

  # Bonferroni correction

  pairwise.t.test(mydata$response, mydata$group, p.adjust.method = "bonferroni")

Python Implementation

  # Using statsmodels

  from statsmodels.stats.multicomp import pairwise_tukeyhsd

  tukey = pairwise_tukeyhsd(endog=mydata['response'],

                           groups=mydata['group'],

                           alpha=0.05)

  print(tukey.summary())

SPSS Implementation

  * After running ANOVA:

  ONEWAY response BY group

    /POSTHOC=TUKEY ALPHA(0.05).

  * Or using the menus:

  * Analyze > Compare Means > One-Way ANOVA > Post Hoc > Tukey

Excel Implementation

  ' Using Data Analysis ToolPak:

  ' 1. Data > Data Analysis > Anova: Single Factor

  ' 2. Check "Labels in First Row"

  ' 3. Set Alpha value

  ' Note: Excel doesn't have built-in post-hoc tests

  ' Consider using Real Statistics Resource Pack add-in

Practice: Interpret This Output

Given this Tukey HSD output for 4 groups (A, B, C, D), which groups differ significantly at α = 0.05?

  Comparison   p-value

  A - B        0.043

  A - C        0.215

  A - D        0.001

  B - C        0.087

  B - D        0.032

  C - D        0.521

Solution:

Significant differences (p < 0.05) exist between:

Groups A and B (p = 0.043)
Groups A and D (p = 0.001)
Groups B and D (p = 0.032)

No significant differences between A-C, B-C, or C-D.

Take your understanding further by practicing statistical comparisons using the anova-calculator.

Post-Hoc Tests Explained

Table of Contents

Post-Hoc Test Selection

Introduction to Post-Hoc Tests

What Are Post-Hoc Tests?

When to Use Post-Hoc Tests

After Significant ANOVA

Three or More Groups

Exploratory Research

Equal Variance Assumption

Post-Hoc Test Decision Guide

Tukey's Honestly Significant Difference (HSD) Test

Advantages

Limitations

Bonferroni Correction

Scheffé's Method

Advantages

Limitations

Dunnett's Test

Post-Hoc Test Comparison

Interpreting Post-Hoc Test Results

Key Interpretation Points

Common Presentation Formats

Common Mistakes with Post-Hoc Tests

Using When ANOVA Not Significant

Ignoring Assumptions

Data Snooping

Misinterpreting p-values

Software Implementation

R Implementation

Python Implementation

SPSS Implementation

Excel Implementation

Practice: Interpret This Output

Table of Contents

Post-Hoc Test Selection

Introduction to Post-Hoc Tests

What Are Post-Hoc Tests?

When to Use Post-Hoc Tests

After Significant ANOVA

Three or More Groups

Exploratory Research

Equal Variance Assumption

Post-Hoc Test Decision Guide

Tukey's Honestly Significant Difference (HSD) Test

Advantages

Limitations

Bonferroni Correction

Scheffé's Method

Advantages

Limitations

Dunnett's Test

Post-Hoc Test Comparison

Interpreting Post-Hoc Test Results

Key Interpretation Points

Common Presentation Formats

Common Mistakes with Post-Hoc Tests

Using When ANOVA Not Significant

Ignoring Assumptions

Data Snooping

Misinterpreting p-values

Software Implementation

R Implementation

Python Implementation

SPSS Implementation

Excel Implementation

Practice: Interpret This Output

Continue Your Statistical Learning Journey

Understanding ANOVA

Post-hoc Tests Explained

Effect Size Measures

Statistical Power Analysis