post hoc power calculation - Aaron Graves, PhDude Replica

Post Hoc Power Calculator (for Two-Sample T-test)

Calculate the observed statistical power of your study given the observed effect size, sample size, and significance level. This calculator assumes a two-sample independent t-test design with equal group sizes.

Observed Effect Size (Cohen's d):

Total Sample Size (N):

Significance Level (Alpha):

Number of Tails:

In the realm of statistical analysis, "power" refers to the probability that a study will detect an effect when there actually is one. It's a crucial concept for designing robust research. However, how and when power is calculated makes a significant difference in its utility and interpretation. This article delves into the concept of post hoc power calculation, often called observed or retrospective power, and why its use is widely debated and often discouraged.

What is Post Hoc Power Calculation?

Post hoc power calculation involves estimating the statistical power of a study *after* the data has been collected and analyzed. Unlike a priori power analysis, which is performed before a study begins to determine the necessary sample size, post hoc power uses the observed effect size, the study's sample size, and the chosen significance level (alpha) to calculate power. The idea is to understand the power of a completed study, particularly when a non-significant result is obtained.

The Common Misconception and Why It's Problematic

While seemingly intuitive, the practice of post hoc power calculation is fraught with methodological and interpretative issues. Many statisticians and researchers strongly advise against its routine use, especially for interpreting the results of a single study.

A Tautological Exercise

One of the primary criticisms of post hoc power is its tautological nature. When you calculate power using the observed effect size from your study, the resulting power estimate is essentially a re-expression of your p-value:

If your study yields a statistically significant result (p < α), the observed power will almost always be high.
If your study yields a non-significant result (p ≥ α), the observed power will almost always be low.

This means that post hoc power provides no new information beyond what the p-value already tells you about the statistical significance of your findings. It merely restates whether your study found an effect large enough, given its sample size, to cross the significance threshold.

Misleading Interpretations

Relying on post hoc power can lead to misleading conclusions. For instance, if a study reports a non-significant result and a low observed power, it's tempting to conclude that the study was "underpowered." While this might be true, the low observed power in this context doesn't inform us about the *true* state of the world. It simply reflects that the observed effect was not large enough to be statistically significant with the given sample size. If the true effect size is actually zero, then any power calculation (post hoc or a priori) would be meaningless.

Furthermore, the observed effect size itself is a random variable and an estimate of the true effect. It's subject to sampling variability, especially in small samples. Using a variable and potentially inaccurate estimate of the effect size to calculate power makes the power estimate equally variable and potentially inaccurate.

The "Observed Effect Size" Problem

The core issue lies in using the observed effect size. An observed effect size from a single study is merely an estimate of the true population effect size. If the true effect size is different from the observed one (which is almost always the case), then the post hoc power calculation based on the observed effect size will be biased.

When is Post Hoc Power Not Recommended?

It is generally not recommended to use post hoc power to:

Justify a non-significant result by claiming the study was underpowered.
Interpret the results of a single study, as it adds no new inferential information.
Make decisions about the validity or conclusiveness of a study's findings.

What are the Alternatives and Better Practices?

Instead of relying on post hoc power, researchers should focus on more informative and scientifically sound practices:

A Priori Power Analysis

This is the gold standard. Conducted *before* data collection, a priori power analysis determines the sample size needed to detect a clinically meaningful or theoretically important effect size with a desired level of power (e.g., 80%) and a specified alpha level. This proactive approach ensures that the study is adequately designed to answer its research question.

Sensitivity Analysis

If an a priori power analysis was not conducted or if you want to understand the capabilities of a completed study, sensitivity analysis is a better alternative. This involves calculating the minimum effect size that a study could have detected with adequate power, given its actual sample size and alpha level. It helps to understand what effect sizes the study was *capable* of finding, rather than retrospectively calculating power for an observed effect that might just be noise.

Confidence Intervals

Focusing on confidence intervals (CIs) around observed effect sizes provides a more informative picture than p-values or post hoc power. CIs give a range of plausible values for the true population effect size. If a CI is wide, it suggests imprecision in the estimate, regardless of the p-value. If it includes zero, it indicates that a true effect of zero is plausible.

For Meta-Analysis and Review (with Caution)

In some limited contexts, such as when planning a meta-analysis or reviewing existing literature, researchers might use published effect sizes and sample sizes to estimate power. However, even in these scenarios, it's crucial to understand the limitations and interpret such calculations with extreme caution, recognizing the inherent biases of observed effect sizes.

Conclusion

While the calculation of post hoc power is straightforward, its interpretation is fraught with peril. It offers little to no additional insight beyond the p-value and can lead to misinterpretations of research findings. For robust and meaningful research, prioritize a priori power analysis, sensitivity analysis, and the careful interpretation of confidence intervals. These practices empower researchers to design better studies and draw more accurate conclusions about the phenomena they investigate.