File drawer effect
Roadmap
- Take-home messages from p-hacking exercise
- Discussion about the file drawer effect
- Work session on final project proposals, due Thursday, March 2.
On p-hacking
- What did we learn?
- Does the economy do better or worse under Republican or Democratic control?
- Does the answer depend on what measures (of political control, of economic performance) used?
- When does it matter if someone reporting on an issue engaged in p-hacking?
File drawer effect
Related to the notion of ‘negative data’:
Simulating the file drawer effect
- How many times out of \(n\) experiments do we get…
- a significant result
- a non-significant result
Simulation app: https://rogilmore.shinyapps.io/PSYCH490-2023-APES/
- Pick small effect size \(d=0.30\)
- There is a true effect, \(B > A\)
- Pick samples of \(n=30\)
Discuss Rosenthal (1979)
Abstract
For any given research area, one cannot tell how many studies have been conducted but never reported. The extreme view of the “file drawer problem” is that journals are filled with the 5% of the studies that show Type I errors, while the file drawers are filled with the 95% of the studies that show nonsignificant results. Quantitative procedures for computing the tolerance for filed and future null results are reported and illustrated, and the implications are discussed.
The extreme view of this problem, the “file drawer problem,” is that the journals are filled with the 5% of the studies that show Type I errors, while the file drawers back at the lab are filled with the 95% of the studies that show nonsignificant (e.g., p > .05) results.
Rosenthal’s illustration relies on the standard normal deviate or \(Z\), which is the value on a standard normal distribution (with mean \(\mu=0\) and standard deviation \(\sigma=1\)) that one would need to observe for a given p value.
<- 1000
n <- 0.05
p_val <- tibble::tibble(x = rnorm(n, mean = 0, sd = 1))
df <- qt(p_val, n, lower.tail = FALSE)
qt_05 ggplot(df) +
aes(x) +
geom_histogram(bins = 20) +
geom_vline(xintercept = qt_05) +
ggtitle(paste0("Z=", format(qt_05, digits = 3, nsmall = 2), " for p=", format(p_val, digits = 3, nsmall = 2), " and n=", n))
Findings
If the overall level of significance of the research review will be brought down to the level of just significant by the addition of just a few more null results, the finding is not resistant to the file drawer threat.
There is both a sobering and a cheering lesson to be learned from careful study of Equation 3. The sobering lesson is that small numbers of studies that are not very significant, even when their combined p is significant, may well be misleading in that only a few studies filed away could change the combined significant result to a nonsignificant one…The cheering lesson is that when the number of studies available grows large or the mean directional Z grows large, the file drawer hypothesis as a plausible rival hypothesis can be safely ruled out.
Discuss Franco et al. (2014)
Franco, A., Malhotra, N. & Simonovits, G. (2014). Social science. Publication bias in the social sciences: unlocking the file drawer. Science, 345(6203), 1502–1505. https://doi.org/10.1126/science.1255484
Abstract
We studied publication bias in the social sciences by analyzing a known population of conducted studies—221 in total—in which there is a full accounting of what is published and unpublished. We leveraged Time-sharing Experiments in the Social Sciences (TESS), a National Science Foundation–sponsored program in which researchers propose survey-based experiments to be run on representative samples of American adults. Because TESS proposals undergo rigorous peer review, the studies in the sample all exceed a substantial quality threshold. Strong results are 40 percentage points more likely to be published than are null results and 60 percentage points more likely to be written up. We provide direct evidence of publication bias and identify the stage of research production at which publication bias occurs: Authors do not write up and submit null findings. – Franco et al. (2014)
Solutions
How can the social science community combat publication bias of this sort? On the basis of communications with the authors of many experiments that resulted in null findings, we found that some researchers anticipate the rejection of such papers but also that many of them simply lose interest in “unsuccessful” projects. These findings show that a vital part of developing institutional solutions to improve scientific transparency would be to understand better the motivations of researchers who choose to pursue projects as a function of results.
Few null findings ever make it to the review process. Hence, proposed solutions such as two-stage review (the first stage for the design and the second for the results), pre-analysis plans (41), and requirements to preregister studies (16) should be complemented by incentives not to bury statistically insignificant results in file drawers. Creating high-status publication outlets for these studies could provide such incentives. The movement toward open-access journals may provide space for such articles. Further, the pre-analysis plans and registries themselves will increase researcher access to null results. Alternatively, funding agencies could impose costs on investigators who do not write up the results of funded studies. Last, resources should be deployed for replications of published studies if they are unrepresentative of conducted studies and more likely to report large effects.
Replication notes
- Paper was behind the standard Science paywall.
- Data & code shared on Zenodo https://doi.org/10.5281/zenodo.11300
- Let’s try to run it…
## [1] "id" "author1field"
## [3] "author2field" "author3field"
## [5] "discipline" "discipline2"
## [7] "POL" "PSY"
## [9] "SOC" "OTHER"
## [11] "year" "fieldingperiod1stday"
## [13] "age_months" "age_day"
## [15] "age_year" "timetopublish"
## [17] "iv_coder1" "iv_coder2"
## [19] "disagreement" "IV_all"
## [21] "IV" "strong"
## [23] "mixed" "null"
## [25] "DV" "DV_all"
## [27] "DV_tri" "DV_book_separate"
## [29] "DV_book_unpub" "DV_book_nontop"
## [31] "DV_book_top" "journal"
## [33] "journal_field" "insample"
## [35] "pubyear" "pub"
## [37] "anyresults" "written"
## [39] "max_h_current" "max_pub_attime"
## [41] "why_excluded"
##
## Pearson's Chi-squared test with Yates' continuity
## correction
##
## data: bounds
## X-squared = 70.503, df = 1, p-value < 2.2e-16
## quartz_off_screen
## 2
- So, we can regenerate one of the figures (S1, p. 6) in the Supplemental Material.
More could be done with these data. This could be a final project for someone.
Next time…
- Negligence
- Work session on final project proposals, due Thursday, March 2.