2020-04-07 08:33:43

Preliminaries

Check-in

Announcements

  • Agenda for next week?

Today’s topics

  • Where to share?
  • Challenges to sharing
  • Your open science portfolio
  • Funder policies

Where to share?

Considerations for storing

  • Why are you doing this?
    • For yourself
    • For colleagues
    • For funders/journals
    • Accelerate discovery

  • What are you planning to share/store?
    • Data
    • Code
    • Materials (displays, test items)

  • What permissions do you have?
    • From participants
    • From colleagues/collaborators
    • From IRB/ethics boards

Considerations for using/reusing

  • What information do I want?
  • How do I find it?
  • How do I access it?

Inter-university Consortium for Political and Social Research (ICPSR)

  • Data repository
  • More than 50 year history of storing, sharing data
  • Based at U Michigan
  • Dominant player in sociology, demography, political science

Data Dryad

  • Focus on data linked to publications
  • Strong presence in biological sciences, but not domain-specific
  • http://datadryad.org/

Dataverse

  • Domain general repository + software for repositories
  • Institute for Quantitative Social Sciences at Harvard
  • http://dataverse.org/

Databrary

Open Science Framework

PSU’s ScholarSphere

National Database for Autism Research (NDAR)

  • Now the NIMH data archive
  • Data aggregation, secondary analyses
  • Details about measures, individual participants
  • https://ndar.nih.gov/

TalkBank/CHILDES

WordBank

Other options

  • Personal/lab website
  • GitHub

Data.World

Publishing data

Challenges to sharing

Why share? (Meyer, 2018)

  • Journals, funders may require
  • Some questions are too important not to share
  • “Openness and transparency are (or should be) universal values that reflect scientific ideals” (Gilmore, Cole, Verma, Aken, & Worthman, 2020)
  • Bolster credibility

Preparing to share

  • Don’t promise to destroy data
  • Don’t promise NOT to share data
  • Don’t promise that research analyses of the collected data will be limited to certain topics

  • DO get consent to retain and share data
  • DO incorporate data-retention and -sharing clauses into IRB templates
  • DO be thoughtful when considering risks of re-identification

  • DO consider working with a data repository
  • DO be thoughtful when selecting a data repository

Challenges…

  • Risks to researchers
  • Respecting diversity
  • Balancing benefits with costs

Your open science portfolio

Should do

  • ORCID
  • non-PSU email for professional use

Might do

Funder policies

NSF

“Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.”

https://www.nsf.gov/bfa/dias/policy/dmp.jsp

National Institutes of Health (NIH)

National Institute on Aging (NIA)

NIA expect that ADNI deidentified data will be made available to the general scientific community within a very short timeframe. ADNI recommends full, open access of all de-identified ADNI imaging and clinical data to indviduals who register with the ADNI and agree to the conditions in the "ADNI Data Use Agreement" and who undergo limited screening.

National Institute of Child Health and Human Development (NICHD)

Regardless of the amount requested, investigators are expected to include a brief 1-paragraph description of how final research data will be shared, or explain why data-sharing is not possible. Applicants are encouraged to discuss data-sharing plans with their NIH program contact.

National Institute on Drug Abuse (NIDA)

The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers to expedite the translation of research results into knowledge, products and procedures to improve human health. Data sets for CTN protocols will be available after (1) the primary paper has been accepted for publication, or (2) the data is locked for more than 18 months, whichever comes first.

National Institutes of Health (NIH)

Expects investigators seeking more than $500K in direct support in any given year to submit a data sharing plan with their application or to indicate why data sharing is not possible.

It is NIH policy that the results and accomplishments of the activities that it funds should be made available to the public. PIs and funding recipient institutions are expected to make the results and accomplishments of their activities available to the research community and to the public at large.

Autism-related research funded by NIH

NIMH

“The National Institute of Mental Health (NIMH) has established an informatics infrastructure to enable the sharing and use of data collected from human subjects in clinical research by the entire research community. Researchers funded by NIMH are strongly encouraged to deposit data from human subjects into this infrastructure. In addition, non-NIMH funded researchers with related data are welcome to deposit their data.”

https://grants.nih.gov/grants/guide/notice-files/NOT-MH-15-012.html

Resources

Software

This talk was produced on 2020-04-07 in RStudio using R Markdown. The code and materials used to generate the slides may be found at https://github.com/psu-psychology/psy-525-reproducible-research-2020. Information about the R Session that produced the code is as follows:

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets 
## [6] methods   base     
## 
## other attached packages:
## [1] forcats_0.5.0   stringr_1.4.0   dplyr_0.8.5    
## [4] purrr_0.3.3     readr_1.3.1     tidyr_1.0.2    
## [7] tibble_3.0.0    ggplot2_3.3.0   tidyverse_1.3.0
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.0.0 xfun_0.12        haven_2.2.0     
##  [4] lattice_0.20-40  colorspace_1.4-1 vctrs_0.2.4     
##  [7] generics_0.0.2   htmltools_0.4.0  yaml_2.2.1      
## [10] rlang_0.4.5      pillar_1.4.3     withr_2.1.2     
## [13] glue_1.3.2       DBI_1.1.0        dbplyr_1.4.2    
## [16] modelr_0.1.6     readxl_1.3.1     lifecycle_0.2.0 
## [19] munsell_0.5.0    gtable_0.3.0     cellranger_1.1.0
## [22] rvest_0.3.5      evaluate_0.14    knitr_1.28      
## [25] fansi_0.4.1      broom_0.5.5      Rcpp_1.0.4      
## [28] backports_1.1.5  scales_1.1.0     jsonlite_1.6.1  
## [31] fs_1.3.2         hms_0.5.3        digest_0.6.25   
## [34] stringi_1.4.6    grid_3.6.2       cli_2.0.2       
## [37] tools_3.6.2      magrittr_1.5     crayon_1.3.4    
## [40] pkgconfig_2.0.3  ellipsis_0.3.0   xml2_1.2.5      
## [43] reprex_0.3.0     lubridate_1.7.4  assertthat_0.2.1
## [46] rmarkdown_2.1    httr_1.4.1       rstudioapi_0.11 
## [49] R6_2.4.1         nlme_3.1-145     compiler_3.6.2

References

Gilmore, R. O., Cole, P. M., Verma, S., Aken, M. A. G., & Worthman, C. M. (2020). Advancing scientific integrity, transparency, and openness in child development research: Challenges and possible solutions. Child Development Perspectives, 14(1), 9–14. https://doi.org/10.1111/cdep.12360

Meyer, M. N. (2018). Practical tips for ethical data sharing. Advances in Methods and Practices in Psychological Science, 2515245917747656. https://doi.org/10.1177/2515245917747656