A replication crisis (or not)

2024-09-16 Mon

Rick Gilmore

Prelude

(Simon & Garfunkel, 2019)

Overview

Announcements

Last time…

  • Our (partial) replication of (Kardash & Edwards, 2012) via Survey-02
  • Our predictions:
    • Norms vs. counternorms?
    • Should do vs. actually do?
  • Are ratings continuous or not?
  • Tables and figures: Which is better?

Today

A replication crisis (or not)

A replication crisis (or not)

What proportion of findings in the published scientific literature (in the fields you care about) are actually true?

  • 100%
  • 90%
  • 70%
  • 50%
  • 30%

How do we define what “actually true” means?

Is there a reproducibility crisis in science?

Figure 1: (Baker, 2016)

(Gilmore, Hillary, Lazar, & Wham, 2023)

Figure 2: (Baker, 2016)

Figure 3: (Baker, 2016)

Figure 4: (Baker, 2016)

Final Project idea

These questions could form the basis of a final project where a student or students re-run the survey with a different sample.

What do the terms mean?

Replication refers to testing the reliability of a prior finding with different data…

(Nosek et al., 2022)

Robustness refers to testing the reliability of a prior finding using the same data and a different analysis strategy….

(Nosek et al., 2022)

Reproducibility refers to testing the reliability of a prior finding using the same data and the same analysis strategy…

(Nosek et al., 2022)

In principle, all reported evidence should be reproducible. If someone applies the same analysis to the same data, the same result should occur…

(Nosek et al., 2022)

Reproducibility tests can fail for two reasons. A process reproducibility failure occurs when the original analysis cannot be repeated because of the unavailability of data, code, information needed to recreate the code, or necessary software or tools…

(Nosek et al., 2022)

…An outcome reproducibility failure occurs when the reanalysis obtains a different result than the one reported originally. This can occur because of an error in either the original or the reproduction study.

(Nosek et al., 2022)

Methods reproducibility

  • Enough details about materials & methods recorded (& reported)
  • Same results with same materials & methods

(Goodman, Fanelli, & Ioannidis, 2016)

Figure 5: If you got hit by a bus, how many other people could pick up where you left off?

Questions to consider

Do the Survey-01 and Survey-02 documents demonstrate methods reproducibility? Why or why not?

More discussion on Friday.

Results reproducibility

  • Same results from an independent study

(Goodman et al., 2016)

Questions to consider

Is Survey-02 an independent study relative to (Kardash & Edwards, 2012)? Are our results similar to (Kardash & Edwards, 2012)?

Deep dive on Friday.

Inferential reproducibility

  • Same inferences from one or more studies or reanalyses

(Goodman et al., 2016)

Questions to consider

How many studies or re-analyses are needed before we decide an effect is real?

If \(n>>1\)1, how should we think about novel findings?

Reproducibility in psychological science

Replication failure: The “Lady Macbeth Effect”

Replication failure: Priming effect

Artner et al. 2021

We investigated the reproducibility of the major statistical conclusions drawn in 46 articles published in 2012 in three APA journals. After having identified 232 key statistical claims, we tried to reproduce, for each claim, the test statistic, its degrees of freedom, and the corresponding p value…

(Artner et al., 2021)

…starting from the raw data that were provided by the authors and closely following the Method section in the article…

(Artner et al., 2021)

Out of the 232 claims, we were able to successfully reproduce 163 (70%), 18 of which only by deviating from the article’s analytical description. Thirteen (7%) of the 185 claims deemed significant by the authors are no longer so…

(Artner et al., 2021)

Question to consider

Is Artner et al. (2021) an example of what Nosek et al. (2022) call replication, robustness, or reproducibility?

Questions to consider

If (Artner et al., 2021) had to “deviate from the article’s analytical description” for 18 claims, what does that mean?

Should we reduce the “successfully reproduced” claims to \((163-18)/232 = 62.5%\)%?

…The reproduction successes were often the result of cumbersome and time-consuming trial-and-error work, suggesting that APA style reporting in conjunction with raw data makes numerical verification at least hard, if not impossible.

(Artner et al., 2021)

Question to consider

What aspects of APA style do you think undermine reproduction attempts?

Evaluating reproducibility across psychological science

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. https://doi.org/10.1126/science.aac4716

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., … Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 1. https://doi.org/10.1038/s41562-018-0399-z

Whitt, C. M., Miranda, J. F. & Tullett, A. M. (2022). History of replication failures in psychology. In W. O’Donohue, A. Masuda & S. Lilienfeld (Eds.), Avoiding Questionable Research Practices in Applied Psychology (pp. 73–97). Springer International Publishing. https://doi.org/10.1007/978-3-031-04968-2_4

Framework for Open and Reproducible Research Training (forrt.org)

Note

Could be a topic for a final project.

Is psychology harder than physics?

Discussion of (Begley & Ellis, 2012)

Reproducibility in pre-clinical cancer biology

Note

The Center for Open Science (COS) conducted a Reproducibility Project: Cancer Biology study. Dr. Errington visited Penn State in August 2023 and talked about the eye-opening results (Errington, Mathur, et al., 2021; Errington, Denis, Perfito, Iorns, & Nosek, 2021).

This could be the focal point of a final project.

Background

The scientific community assumes that the claims in a preclinical study can be taken at face value — that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.

(Begley & Ellis, 2012)

Over the past decade, before pursuing a particular line of research, scientists (including C.G.B.) in the haematology and oncology department at the biotechnology firm Amgen in Thousand Oaks, California, tried to confirm published findings related to that work. Fifty-three papers were deemed ‘landmark’ studies (see ‘Reproducibility of research findings’).

(Begley & Ellis, 2012)

…It was acknowledged from the outset that some of the data might not hold up, because papers were deliberately selected that described something completely new, such as fresh approaches to targeting cancers or alternative clinical uses for existing therapeutics.

(Begley & Ellis, 2012)

Nevertheless, scientific findings were confirmed in only 6 (11%) cases. Even knowing the limitations of preclinical research, this was a shocking result.

(Begley & Ellis, 2012)

Table 1: (Begley & Ellis, 2012)
Journal Impact Factor \(n\) articles Mean number of citations for non-reproduced articles Mean number of citations of reproduced articles
>20 21 248 [3, 800] 231 [82-519]
5-19 32 168 [6, 1,909] 13 [3, 24]

Findings

  • Findings of 6/53 published papers (11%) could be reproduced
  • Original authors often could not reproduce their own work
  • Earlier paper (Prinz, Schlange, & Asadullah, 2011) had also found low rate of reproducibility. Paper titled “Believe it or not: How much can we rely on published data on potential drug targets?”

Figure from Prinz et al. (2011)

We received input from 23 scientists (heads of laboratories) and collected data from 67 projects, most of them (47) from the field of oncology. This analysis revealed that only in ∼20–25% of the projects were the relevant published data completely in line with our in-house findings

Prinz et al. (2011)

  • Published papers (that can’t be reproduced) are cited hundreds or thousands of times
  • Cost of irreproducible research estimated in billions of dollars (Freedman, Cockburn, & Simcoe, 2015).

An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone.

(Freedman et al., 2015)

Freedman et al. (2015), Figure 2

  • Information about U.S. Research & Development (R&D) expenditures from the Congressional Research Service.
  • Note that business accounts for 2-3x+ the Government’s share of R&D expenditures.

Cover from Harris (2017)

Questions to ponder

  • Why do Begley & Ellis (2012) focus on a journal’s impact factor?
  • Why do Begley & Ellis (2012) focus on citations to reproduced vs. non-reproduced articles?
  • Why should non-scientists care?
  • Why should scientists (and students) in other fields (not cancer biology) care?

Next time

Replication failure: The “Lady Macbeth Effect”

Resources

Talk by Begley (CrossFit, 2019)

(CrossFit, 2019)

“What I’m alleging is that the reviewers, the editors of the so-called top-tier journals, grant review committees, promotion committees, and the scientific community repeatedly tolerate poor-quality science.”

C. Glenn Begley

Note

Watching the talk by Begley is not required. But you might get inspired and decide to focus your final project around the topic.

Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., Fidler, F., Hilgard, J., Kline Struhl, M., Nuijten, M. B., Rohrer, J. M., Romero, F., Scheel, A. M., Scherer, L. D., Schönbrodt, F. D. & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73(2022), 719–748. https://doi.org/10.1146/annurev-psych-020821-114157

Peng, R. D. & Hicks, S. C. (2021). Reproducible research: A retrospective. Annual Review of Public Health, 42, 79–93. https://doi.org/10.1146/annurev-publhealth-012420-105110

Note

Reading and writing a commentary on either of these articles might be a good final project.

References

Artner, R., Verliefde, T., Steegen, S., Gomes, S., Traets, F., Tuerlinckx, F., & Vanpaemel, W. (2021). The reproducibility of statistical results in psychological research: An investigation using unpublished raw data. Psychological Methods, 26(5), 527–546. https://doi.org/10.1037/met0000365
Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature News, 533(7604), 452. https://doi.org/10.1038/533452a
Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), 531–533. https://doi.org/10.1038/483531a
CrossFit. (2019, July). Dr. Glenn begley: Perverse incentives promote scientific laziness, exaggeration, and desperation. Youtube. Retrieved from https://www.youtube.com/watch?v=YJADzllTM9w
Earp, B. D., Everett, J. A. C., Madva, E. N., & Hamlin, J. K. (2014). Out, damned spot: Can the Macbeth effect” be replicated? Basic and Applied Social Psychology, 36(1), 91–98. https://doi.org/10.1080/01973533.2013.856792
Errington, T. M., Denis, A., Perfito, N., Iorns, E., & Nosek, B. A. (2021). Challenges for assessing replicability in preclinical cancer biology. eLife, 10, e67995. https://doi.org/10.7554/eLife.67995
Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., & Nosek, B. A. (2021). Investigating the replicability of preclinical cancer biology. eLife, 10, e71601. https://doi.org/10.7554/eLife.71601
Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The economics of reproducibility in preclinical research. PLoS Biology, 13(6), e1002165. https://doi.org/10.1371/journal.pbio.1002165
Gilmore, R. O., Hillary, F., Lazar, N., & Wham, B. (2023). Penn state open science survey. https://penn-state-open-science.github.io/survey-fall-2022/index.html. Retrieved from https://penn-state-open-science.github.io/survey-fall-2022/index.html
Goodman, S. N., Fanelli, D., & Ioannidis, J. P. A. (2016). What does research reproducibility mean? Science Translational Medicine, 8(341), 341ps12–341ps12. https://doi.org/10.1126/scitranslmed.aaf5027
Harris, R. (2017). Rigor mortis: How sloppy science creates worthless cures, crushes hope, and wastes billions (1 edition). Basic Books.
Kardash, C. M., & Edwards, O. V. (2012). Thinking and behaving like scientists: Perceptions of undergraduate science interns and their faculty mentors. Instructional Science, 40(6), 875–899. https://doi.org/10.1007/s11251-011-9195-0
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., … Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73(2022), 719–748. https://doi.org/10.1146/annurev-psych-020821-114157
Prinz, F., Schlange, T., & Asadullah, K. (2011). Believe it or not: How much can we rely on published data on potential drug targets? Nature Reviews. Drug Discovery, 10(9), 712. https://doi.org/10.1038/nrd3439-c1
Ritchie, S. (2020). Science fictions: Exposing fraud, bias, negligence and hype in science (1st ed.). Penguin Random House. Retrieved from https://www.amazon.com/Science-Fictions/dp/1847925669
Simon & Garfunkel. (2019, May). The 59th street bridge song (feelin’ groovy) (live at carnegie hall, new york, NY - july 1970). Youtube. Retrieved from https://www.youtube.com/watch?v=_QwxTXGSLWQ
Zhong, C.-B., & Liljenquist, K. (2006). Washing away your sins: Threatened morality and physical cleansing. Science, 313(5792), 1451–1452. https://doi.org/10.1126/science.1130726