On p values

2024-09-30 Mon

Rick Gilmore

Overview

Announcements

Today

Mind your p’s

  • Read

Denworth (2019)

Goal

  • Detect differences that matter, are meaningful or significant

Fisher (1992) suggests that researchers might consider a p value of 0.05 as a handy guide: “It is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not.”

Denworth (2019)

Monroe (n.d.)

https://mediaproxy.salon.com/width/1200/https://media2.salon.com/2017/04/holygrail.jpg

Ronald Wasserstein, the ASA’s executive director, puts it this way: “Statistical significance is supposed to be like a right swipe on Tinder. It indicates just a certain level of interest…”

Denworth (2019)

…But unfortunately, that’s not what statistical significance has become. People say, ‘I’ve got 0.05, I’m good.’ The science stops.”

Denworth (2019)

Wasserstein & Lazar (2016)

Vocabulary

  • p-value: Probability of your observation relative to some baseline (e.g., data are normally distributed with some specified mean and standard deviation).
  • null hypothesis: Observed data don’t differ from your baseline comparison.
  • rejecting the null (hypothesis): Observed data differ from the comparison.
  • alpha (\(\alpha\)): The p value we set in advance as our criterion for deciding when to reject the null hypothesis

Discuss Denworth (2019) figure

Denworth (2019)

Denworth (2019)

Simulating p-values

https://rogilmore.shinyapps.io/PSYCH490-2023-APES/

ASA Statement (Wasserstein & Lazar, 2016)

  1. P-values can indicate how incompatible the data are with a specified statistical model.
  1. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  1. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

Wasserstein & Lazar (2016)

  1. Proper inference requires full reporting and transparency
  1. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
  1. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

Other (better?) approaches

These include methods that emphasize estimation over testing, such as confidence, credibility, or prediction intervals; Bayesian methods; alternative measures of evidence, such as likelihood ratios or Bayes Factors; and other approaches such as decision-theoretic modeling and false discovery rates.

Wasserstein & Lazar (2016)

All these measures and approaches rely on further assumptions, but they may more directly address the size of an effect (and its associated uncertainty) or whether the hypothesis is correct.

Wasserstein & Lazar (2016)

Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean.

Wasserstein & Lazar (2016)

No single index should substitute for scientific reasoning.

Wasserstein & Lazar (2016)

  • confidence interval: range of values that we think would contain our observation X% of the time if we did the study many times1
  • Change alpha (\(\alpha\)), (Benjamin et al., 2018; Lakens et al., 2017) or eliminate it (Amrhein & Greenland, 2018)
  • Do the reference distributions (for the null) really fit the data?
  • statistical ‘significance’ \(\neq\) real-world importance

There is a solution to every problem: simple, quick, and wrong.

For every problem there is a solution that is simple, neat—and wrong.

Every complex problem has a solution which is simple, direct, plausible—and wrong.

There’s always an easy solution to every human problem—neat, plausible and wrong.

“There is always a well-known solution to every human Problem—Neat, plausible, and wrong” (n.d.)

Explanations exist; they have existed for all time; there is always a well-known solution to every human problem—neat, plausible, and wrong.

Mencken (2020)

Tampio (2022)

Next time

Fraud & misconduct

Resources

References

Amrhein, V., & Greenland, S. (2018). Remove, rather than redefine, statistical significance. Nature Human Behaviour, 2(1), 4. https://doi.org/10.1038/s41562-017-0224-0
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z
Bhattacharjee, Y. (2013). The mind of a con man. The New York Times. Retrieved from https://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html
Carpenter, S. (2012). Harvard psychology researcher committed fraud, US investigation concludes. Science, 6. Retrieved from https://www.science.org/content/article/harvard-psychology-researcher-committed-fraud-us-investigation-concludes
Denworth, L. (2019, October). The significant problem of P values. https://www.scientificamerican.com/article/the-significant-problem-of-p-values/.
Fisher, R. A. (1992). Statistical methods for research workers. In Springer series in statistics (pp. 66–70). New York, NY: Springer New York. https://doi.org/10.1007/978-1-4612-4380-9\_6
Lakens, D., Adolfi, F., Albers, C., Anvari, F., Apps, M., Argamon, S., … Zwaan, R. (2017). Justify your alpha. https://doi.org/10.17605/OSF.IO/9S3Y6
Levelt, W. J. M., Drenth, P. J. D., & Noort, E. (2012). Flawed science: The fraudulent research practices of social psychologist diederik stapel. https://pure.mpg.de/rest/items/item_1569964/component/file_1569966/content; pure.mpg.de. Retrieved from https://pure.mpg.de/rest/items/item_1569964/component/file_1569966/content
Mencken, H. L. (2020, February). Prejudices: First, second, & third series. https://www.loa.org/books/331-prejudices-first-second-amp-third-series/.
Monroe, R. (n.d.). P-values. https://xkcd.com/1478/.
Ritchie, S. (2020). Science fictions: Exposing fraud, bias, negligence and hype in science (1st ed.). Penguin Random House. Retrieved from https://www.amazon.com/Science-Fictions/dp/1847925669
Tampio, N. (2022, March). Scepticism is a way of life that allows democracy to flourish. https://aeon.co/essays/scepticism-is-a-way-of-life-that-allows-democracy-to-flourish; Aeon Magazine.
There is always a well-known solution to every human Problem—Neat, plausible, and wrong. (n.d.). https://quoteinvestigator.com/2016/07/17/solution/.
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108