Here are some examples of real-world statistical analyses that don’t use p-values and significance testing
Here are
some examples of real-world statistical analyses that don’t use p-values and
significance testing
Your results
from a hypothesis test are statistically significant. Bravo! Are these results
important? Not so. Significance not necessarily mean the results are
practically significant in real world Statistical Analysis.
In this
blog post, I will list out few examples of real world statistical analyses that
do not use p-value and significance testing. Before that, let us understand
what exactly the p-value mean?
Statswork is one among the country’s leader in
providing Data Analysis Services and Statistical Consulting Services. Contact Statswork for availing our services.
P-value
The same
scenario as above have been faced by many data scientist when we talk about the
p-value, isn’t it?
Well, sometimes
the simplest definition of p-value tend to be complicated. Technically, a
p-value is the probability of obtaining an effect at least as extreme as the
one in the sample data. Usually, the results from a hypothesis testing
procedure determines whether the assumed null hypothesis is correct for the
population. You reject the null hypothesis only if the results are unconvincing
under the assumptions. Strictly speaking, the statistically significant results
are obtained when the strength of the evidence in the sample has passed the
defined significance level (alpha).
Often
p-value is used to determine the statistical significance in hypothesis tests
such as chi-square tests, t-tests, ANOVA,
and Regression Coefficients among
many others. Also, it might seem logical that p-values and statistical
significance relate to importance. However, there are situations where these
may not be useful in practical world. Here is a list of situations where the significance
testing and p-value failed and lead to impractical results.
1.
Suppose Mr. X is evaluating a training program by
comparing the test scores of participants to those who study on their own.
Further, he decide that the difference between these groups must be at least 5
points to represent a practically meaningful effect size. The results from the
study shows a statistically significant difference with an average score of 3 points higher on
a 100-point test. While these results are statistically significant, the
3-point difference is less than our 5-point threshold. Thus, the study provides
evidence that the effect exists, but it is too small to be meaningful in the
real world. The conclusion is that the time and money that participants spend
on this training program are not worth an average improvement of only 3 points.
2.
Let us consider the mean pizza delivery time
example. Once the data has been collected, our calculation finds that the mean
delivery time is longer by 10 minutes with a p-value of 0.03. That is, the null
hypothesis is true when there is a 3% chance that the mean delivery time is
longer by 10 minutes. But this results will be impractical because we belief that
the mean delivery time of the pizza is always 30 minutes of lesser. Here, the p
value of 0.03 is less than the threshold 0.05 and hence we conclude it is
statistically significant. In this situation, we may think about the result
from the analysis and our true belief that the delivery time is lesser or equal
to 30 minutes is a valid null hypothesis. In addition, from the reviews we may
conclude there is also situation late delivery has taken place. From this, one
may decide not to buy any pizza from that particular shop too. Therefore, in my
opinion, result based on p-value is impractical like in this situation.
3.
If the sample mean vary among the sample, then the
p-value will also vary and this effect is will result in wrong conclusion based
on the p-value. See Dance of the p-value by Geoff Cumming to understand the
effect of p-value on varying sample sizes.
4.
There is always an interesting hypothesis to
understand how the p-value fails in real time situation. Cohen (1994) discussed
the critique of the use of significance tests for the hypothesis that “the
earth is round” with p<0.05 whereas Amrhein et al (2017) argued and
discussed for the hypothesis that “the earth is flat (p > 0.05)”.
In
closing, statistical testing and the resultant p-value indicates that the
sample provides sufficient evidence to conclude that the effect exists in the
population. However, there is always a question arises that; p-value is practically
a valid measure? Thus, the use of test statistics, number of samples and
framing the null hypothesis really matters in arriving any statistical
conclusion.
Reference:
1. Amrhein
V, Korner-Nievergelt F, Roth T. 2017. The earth is flat (p > 0.05):
significance thresholds and the crisis of unreplicable research. PeerJ 5:e3544 https://doi.org/10.7717/peerj.3544
2. Cohen
J. 1994. The earth is round (p < .05) American Psychologist 49:997-1003
3. Geoff
Cumming - https://youtu.be/ez4DgdurRPg
Comments
Post a Comment