Introduction to Post-Hoc Tests
Post-hoc tests are statistical procedures used after an ANOVA (Analysis of Variance) to determine which specific groups differ from each other when the overall ANOVA indicates significant differences exist. While ANOVA tells us that at least one group is different, post-hoc tests pinpoint exactly where those differences lie.
Key Concept: Post-hoc tests address the problem of multiple comparisons that arises when comparing multiple group means simultaneously. Without proper correction, the chance of making a Type I error (false positive) increases dramatically.
In this comprehensive guide, we'll explore different types of post-hoc tests, when to use each one, how to interpret results, and common pitfalls to avoid in statistical analysis.
What Are Post-Hoc Tests?
Post-hoc tests (Latin for "after this") are follow-up tests conducted after an ANOVA reveals a statistically significant result. They help researchers identify which specific group means are significantly different from each other.
When comparing k groups, the number of possible pairwise comparisons is:
For example, with 5 groups, you have 10 possible comparisons. If you test each at α = 0.05, the family-wise error rate becomes:
This means there's a 40% chance of at least one Type I error!
Example Scenario:
A researcher tests 4 different teaching methods (A, B, C, D) on student performance. ANOVA shows significant differences (p < 0.05). Post-hoc tests would answer:
- Is method A significantly better than B?
- Is method C significantly better than D?
- Which method performs best overall?
With Post-Hoc Tests
Controls family-wise error rate
Identifies specific group differences
Provides confidence intervals
Without Post-Hoc Tests
High risk of Type I errors
Only knows "some difference exists"
No specific group comparisons
Put theory into practice by solving ANOVA-based problems on the anova-calculator.
When to Use Post-Hoc Tests
Post-hoc tests should only be used under specific conditions and after certain statistical procedures:
After Significant ANOVA
Use post-hoc tests only when your ANOVA shows a statistically significant result (p < α). If ANOVA is not significant, post-hoc tests are not justified.
Exception: Planned comparisons can be conducted regardless of ANOVA results.
Three or More Groups
Post-hoc tests are designed for situations with three or more groups. With only two groups, a simple t-test suffices.
Note: Some post-hoc tests can handle complex comparisons beyond simple pairwise tests.
Exploratory Research
Ideal for exploratory studies where you don't have specific hypotheses about which groups will differ.
Alternative: For confirmatory research with specific hypotheses, use planned comparisons instead.
Equal Variance Assumption
Most post-hoc tests assume equal variances between groups. If this assumption is violated, use tests like Games-Howell that don't require equal variances.
Check: Use Levene's test to verify equal variances.
Post-Hoc Test Decision Guide
Tukey's Honestly Significant Difference (HSD) Test
Tukey's HSD is one of the most commonly used post-hoc tests. It compares all possible pairs of means while controlling the family-wise error rate.
The test statistic for Tukey's HSD is calculated as:
Where:
- q is the studentized range statistic
- α is the significance level
- k is the number of groups
- df is the degrees of freedom for error
- MSerror is the mean square error from ANOVA
- n is the sample size per group (for equal sample sizes)
Example Interpretation:
Suppose we have 4 groups with means: A=10, B=12, C=15, D=18. Tukey's HSD might show:
- A vs B: p = 0.25 (not significant)
- A vs C: p = 0.02 (significant)
- A vs D: p = 0.001 (significant)
- B vs C: p = 0.08 (not significant)
- B vs D: p = 0.01 (significant)
- C vs D: p = 0.15 (not significant)
Advantages
- Controls family-wise error rate
- Easy to interpret
- Works well with equal sample sizes
- Provides confidence intervals
Limitations
- Assumes equal variances
- Less powerful with unequal sample sizes
- Conservative with many groups
- Only for pairwise comparisons
Explore real-world statistical modeling and test your knowledge with the anova-calculator.
Bonferroni Correction
The Bonferroni correction is a simple but conservative method for controlling the family-wise error rate in multiple comparisons.
The Bonferroni correction adjusts the significance level by dividing it by the number of comparisons:
Where:
- α is the original significance level (usually 0.05)
- m is the number of comparisons being made
Alternatively, you can adjust the p-values:
Example Calculation:
If you're making 5 comparisons with α = 0.05:
- αadjusted = 0.05 / 5 = 0.01
- Each comparison must have p < 0.01 to be significant
If you have a p-value of 0.03 for one comparison:
- padjusted = 0.03 × 5 = 0.15
- This would not be significant after correction
When to Use Bonferroni
Small number of comparisons
Unequal sample sizes
Non-orthogonal comparisons
When to Avoid Bonferroni
Many comparisons
Highly correlated tests
When power is important
Scheffé's Method
Scheffé's method is a very conservative post-hoc test that allows for any possible contrast between group means, not just pairwise comparisons.
The test statistic for Scheffé's method is:
Where:
- k is the number of groups
- Fα,k-1,df is the critical F-value from the ANOVA
- df is the degrees of freedom for error
A comparison is significant if its F-ratio exceeds FScheffé.
Example Application:
Suppose you have groups A, B, C, D and want to test if the average of A and B differs from the average of C and D. Scheffé's method allows this complex comparison.
This flexibility comes at the cost of reduced power for simple pairwise comparisons compared to Tukey's HSD.
Advantages
- Allows any contrast, not just pairwise
- Very conservative Type I error control
- Robust to data snooping
- Works with unequal sample sizes
Limitations
- Low power for pairwise comparisons
- Overly conservative for many applications
- Complex calculations
- Not ideal for simple group comparisons
Improve your data analysis skills through the anova-calculator.
Dunnett's Test
Dunnett's test is designed specifically for comparing multiple treatment groups to a single control group, which is common in experimental research.
The test statistic for Dunnett's test is similar to t-test but uses a special critical value:
Where:
- X̄treatment is the mean of a treatment group
- X̄control is the mean of the control group
- MSerror is the mean square error from ANOVA
- nt and nc are sample sizes
Example Scenario:
A pharmaceutical company tests 3 new drugs against a placebo. Dunnett's test would compare:
- Drug A vs Placebo
- Drug B vs Placebo
- Drug C vs Placebo
But it would not compare Drug A vs Drug B directly.
Post-Hoc Test Comparison
| Test | Best For | Type I Error Control | Power | Complex Comparisons |
|---|---|---|---|---|
| Tukey's HSD | All pairwise comparisons | Good | High | No |
| Bonferroni | Few comparisons | Very good | Low | Yes |
| Scheffé | Complex contrasts | Excellent | Very low | Yes |
| Dunnett | vs control group | Good | High | No |
Interpreting Post-Hoc Test Results
Proper interpretation of post-hoc test results is crucial for drawing valid conclusions from your research.
Most statistical software provides output similar to this example from Tukey's HSD:
Interpretation: Groups with significant differences (p < 0.05) are marked with *. A and C differ significantly, as do A and D, B and C, etc.
Key Interpretation Points
- Focus on p-values and confidence intervals
- Consider effect sizes, not just significance
- Report adjusted p-values, not raw ones
- Note which specific groups differ
- Consider practical significance
Common Presentation Formats
- Summary tables with significance indicators
- Compact letter displays (A, B, C grouping)
- Graphical representations (error bars)
- Matrix formats showing all comparisons
Compact Letter Display Example:
Groups that share a letter are not significantly different:
- Group A: 10.2 (a)
- Group B: 12.5 (a,b)
- Group C: 15.8 (b,c)
- Group D: 18.3 (c)
Interpretation: A and B don't differ (both have 'a'), B and C don't differ (both have 'b'), but A and C do differ (no shared letter).
Challenge yourself with real statistical data problems using the anova-calculator.
Common Mistakes with Post-Hoc Tests
Avoid these common pitfalls when using post-hoc tests in your statistical analysis:
Using When ANOVA Not Significant
Mistake: Conducting post-hoc tests when the overall ANOVA is not significant.
Solution: Only use post-hoc tests after a significant ANOVA result, unless you have specific planned comparisons.
Ignoring Assumptions
Mistake: Using tests that assume equal variances when variances are unequal.
Solution: Check homogeneity of variances with Levene's test and use appropriate tests like Games-Howell if violated.
Data Snooping
Mistake: Conducting multiple tests without correction after seeing the data.
Solution: Plan your comparisons in advance or use appropriate post-hoc corrections.
Misinterpreting p-values
Mistake: Interpreting post-hoc p-values as if they were from individual t-tests.
Solution: Remember that post-hoc p-values are adjusted for multiple comparisons.
- ✓ Only use post-hoc tests after significant ANOVA
- ✓ Check assumptions (normality, equal variances)
- ✓ Choose the appropriate test for your research question
- ✓ Report adjusted p-values, not raw p-values
- ✓ Consider effect sizes alongside significance
- ✓ Use confidence intervals to show magnitude of differences
- ✓ Be transparent about all comparisons made
Measure your progress with applied ANOVA tasks using the anova-calculator.
Software Implementation
Most statistical software packages provide built-in functions for conducting post-hoc tests. Here are examples for common platforms:
R Implementation
model <- aov(response ~ group, data = mydata)
tukey.result <- TukeyHSD(model)
print(tukey.result)
# Bonferroni correction
pairwise.t.test(mydata$response, mydata$group, p.adjust.method = "bonferroni")
Python Implementation
from statsmodels.stats.multicomp import pairwise_tukeyhsd
tukey = pairwise_tukeyhsd(endog=mydata['response'],
groups=mydata['group'],
alpha=0.05)
print(tukey.summary())
SPSS Implementation
ONEWAY response BY group
/POSTHOC=TUKEY ALPHA(0.05).
* Or using the menus:
* Analyze > Compare Means > One-Way ANOVA > Post Hoc > Tukey
Excel Implementation
' 1. Data > Data Analysis > Anova: Single Factor
' 2. Check "Labels in First Row"
' 3. Set Alpha value
' Note: Excel doesn't have built-in post-hoc tests
' Consider using Real Statistics Resource Pack add-in
Practice: Interpret This Output
A - B 0.043
A - C 0.215
A - D 0.001
B - C 0.087
B - D 0.032
C - D 0.521
Solution:
Significant differences (p < 0.05) exist between:
- Groups A and B (p = 0.043)
- Groups A and D (p = 0.001)
- Groups B and D (p = 0.032)
No significant differences between A-C, B-C, or C-D.
Take your understanding further by practicing statistical comparisons using the anova-calculator.