File drawer effect

[@Economist2023-tg](https://www.economist.com/science-and-technology/2023/02/22/there-is-a-worrying-amount-of-fraud-in-medical-research)

Figure 64: Economist (2023)

Roadmap

Take-home messages from p-hacking exercise
Discussion about the file drawer effect
- Rosenthal (1979)
- Franco et al. (2014)
Work session on final project proposals, due Thursday, March 2.

On p-hacking

What did we learn?
Does the economy do better or worse under Republican or Democratic control?
Does the answer depend on what measures (of political control, of economic performance) used?
When does it matter if someone reporting on an issue engaged in p-hacking?

File drawer effect

Figure 65: https://www.craigmarker.com/wp-content/uploads/2006/12/filedrawer1-1002x675.jpg

Related to the notion of ‘negative data’:

Figure 66: Hypothesis (2021)

Simulating the file drawer effect

How many times out of $n$ experiments do we get…
- a significant result
- a non-significant result

Simulation app: https://rogilmore.shinyapps.io/PSYCH490-2023-APES/

Pick small effect size $d=0.30$
- There is a true effect, $B > A$
Pick samples of $n=30$

Discuss Rosenthal (1979)

Abstract

For any given research area, one cannot tell how many studies have been conducted but never reported. The extreme view of the “file drawer problem” is that journals are filled with the 5% of the studies that show Type I errors, while the file drawers are filled with the 95% of the studies that show nonsignificant results. Quantitative procedures for computing the tolerance for filed and future null results are reported and illustrated, and the implications are discussed.

– Rosenthal (1979)

The extreme view of this problem, the “file drawer problem,” is that the journals are filled with the 5% of the studies that show Type I errors, while the file drawers back at the lab are filled with the 95% of the studies that show nonsignificant (e.g., p > .05) results.

– Rosenthal (1979)

Rosenthal’s illustration relies on the standard normal deviate or $Z$, which is the value on a standard normal distribution (with mean $\mu=0$ and standard deviation $\sigma=1$) that one would need to observe for a given p value.

n <- 1000
p_val <- 0.05
df <- tibble::tibble(x = rnorm(n, mean = 0, sd = 1))
qt_05 <- qt(p_val, n, lower.tail = FALSE)
ggplot(df) +
  aes(x) +
  geom_histogram(bins = 20) +
  geom_vline(xintercept = qt_05) +
  ggtitle(paste0("Z=", format(qt_05, digits = 3, nsmall = 2), " for p=", format(p_val, digits = 3, nsmall = 2), " and n=", n))

Illustration of $Z$ discussed in [@Rosenthal1979-zi](https://doi.org/10.1037/0033-2909.86.3.638)

Figure 67: Illustration of $Z$ discussed in Rosenthal (1979)

Findings

If the overall level of significance of the research review will be brought down to the level of just significant by the addition of just a few more null results, the finding is not resistant to the file drawer threat.

– Rosenthal (1979)

There is both a sobering and a cheering lesson to be learned from careful study of Equation 3. The sobering lesson is that small numbers of studies that are not very significant, even when their combined p is significant, may well be misleading in that only a few studies filed away could change the combined significant result to a nonsignificant one…The cheering lesson is that when the number of studies available grows large or the mean directional Z grows large, the file drawer hypothesis as a plausible rival hypothesis can be safely ruled out.

– Rosenthal (1979)

Discuss Franco et al. (2014)

Franco, A., Malhotra, N. & Simonovits, G. (2014). Social science. Publication bias in the social sciences: unlocking the file drawer. Science, 345(6203), 1502–1505. https://doi.org/10.1126/science.1255484

Abstract

We studied publication bias in the social sciences by analyzing a known population of conducted studies—221 in total—in which there is a full accounting of what is published and unpublished. We leveraged Time-sharing Experiments in the Social Sciences (TESS), a National Science Foundation–sponsored program in which researchers propose survey-based experiments to be run on representative samples of American adults. Because TESS proposals undergo rigorous peer review, the studies in the sample all exceed a substantial quality threshold. Strong results are 40 percentage points more likely to be published than are null results and 60 percentage points more likely to be written up. We provide direct evidence of publication bias and identify the stage of research production at which publication bias occurs: Authors do not write up and submit null findings. – Franco et al. (2014)

Findings

Table 3 from [@Franco2014-yu](http://dx.doi.org/10.1126/science.1255484)

Figure 68: Table 3 from Franco et al. (2014)

Solutions

How can the social science community combat publication bias of this sort? On the basis of communications with the authors of many experiments that resulted in null findings, we found that some researchers anticipate the rejection of such papers but also that many of them simply lose interest in “unsuccessful” projects. These findings show that a vital part of developing institutional solutions to improve scientific transparency would be to understand better the motivations of researchers who choose to pursue projects as a function of results.

Few null findings ever make it to the review process. Hence, proposed solutions such as two-stage review (the first stage for the design and the second for the results), pre-analysis plans (41), and requirements to preregister studies (16) should be complemented by incentives not to bury statistically insignificant results in file drawers. Creating high-status publication outlets for these studies could provide such incentives. The movement toward open-access journals may provide space for such articles. Further, the pre-analysis plans and registries themselves will increase researcher access to null results. Alternatively, funding agencies could impose costs on investigators who do not write up the results of funded studies. Last, resources should be deployed for replications of published studies if they are unrepresentative of conducted studies and more likely to report large effects.

– Franco et al. (2014)

Replication notes

Paper was behind the standard Science paywall.
Data & code shared on Zenodo https://doi.org/10.5281/zenodo.11300
Let’s try to run it…

##  [1] "id"                   "author1field"        
##  [3] "author2field"         "author3field"        
##  [5] "discipline"           "discipline2"         
##  [7] "POL"                  "PSY"                 
##  [9] "SOC"                  "OTHER"               
## [11] "year"                 "fieldingperiod1stday"
## [13] "age_months"           "age_day"             
## [15] "age_year"             "timetopublish"       
## [17] "iv_coder1"            "iv_coder2"           
## [19] "disagreement"         "IV_all"              
## [21] "IV"                   "strong"              
## [23] "mixed"                "null"                
## [25] "DV"                   "DV_all"              
## [27] "DV_tri"               "DV_book_separate"    
## [29] "DV_book_unpub"        "DV_book_nontop"      
## [31] "DV_book_top"          "journal"             
## [33] "journal_field"        "insample"            
## [35] "pubyear"              "pub"                 
## [37] "anyresults"           "written"             
## [39] "max_h_current"        "max_pub_attime"      
## [41] "why_excluded"

## 
##  Pearson's Chi-squared test with Yates' continuity
##  correction
## 
## data:  bounds
## X-squared = 70.503, df = 1, p-value < 2.2e-16

## quartz_off_screen 
##                 2

Figure 69: Figure generated from code in https://doi.org/10.5281/zenodo.11300

So, we can regenerate one of the figures (S1, p. 6) in the Supplemental Material.

More could be done with these data. This could be a final project for someone.

Next time…

Negligence
- (Ritchie, 2020), Chapter 5
- Nuijten, Hartgerink, Assen, Epskamp, & Wicherts (2015)
- Szucs & Ioannidis (2017)
Work session on final project proposals, due Thursday, March 2.

References

Economist. (2023). There is a worrying amount of fraud in medical research. The Economist. Retrieved from https://www.economist.com/science-and-technology/2023/02/22/there-is-a-worrying-amount-of-fraud-in-medical-research

Franco, A., Malhotra, N., & Simonovits, G. (2014). Social science. Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), 1502–1505. https://doi.org/10.1126/science.1255484

Hypothesis, B. A. S. (2021, February). 13. “Negative data” and the file drawer problem. Youtube. Retrieved from https://www.youtube.com/watch?v=9I1qR8PTr54

Nuijten, M. B., Hartgerink, C. H. J., Assen, M. A. L. M. van, Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 1–22. https://doi.org/10.3758/s13428-015-0664-2

Ritchie, S. (2020). Science fictions: Exposing fraud, bias, negligence and hype in science (1st ed.). Penguin Random House. Retrieved from https://www.amazon.com/Science-Fictions/dp/1847925669

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638–641. https://doi.org/10.1037/0033-2909.86.3.638

Szucs, D., & Ioannidis, J. P. A. (2017). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15(3), e2000797. https://doi.org/10.1371/journal.pbio.2000797