Survey 03

Modified

November 20, 2024

Purpose

This page documents the data processing steps involved with Survey 03 in PSYCH 490.012. The survey questions were adapted from those discussed in Chopik, Bremner, Defever, & Keller (2018) on Wednesday, November 20.

The page also serves as a learning opportunity for exploring how to generate a set of plots using functional programming techniques.

Our Survey

We show below three ways to link to the survey.

Note

These are all equivalent, but we show them here to illustrate how this can be done.

The long URL that Google Forms provides can be shortened using the Google interface. We use the shortened form to show the clickable link below.

Link: https://forms.gle/gGAVmWjCHRVDekMo6

QR Code

Code

survey_03_qr <- qrcode::qr_code("https://forms.gle/gGAVmWjCHRVDekMo6")
plot(survey_03_qr)

Embedded survey

Preparation

First, we load the external packages (groups of R commands) that we will be using.

Gathering

Next, we download the data from the Google Sheet where it is collected. Dr. Gilmore has stored his Google account credentials in a special environment file that can be accessed by the R command Sys.getenv("GMAIL_SURVEY").

Code

if (!dir.exists('csv')) {
  message("Creating missing `csv/`.")
  dir.create("csv")
}

if (params$update_data) {
  options(gargle_oauth_email = Sys.getenv("GMAIL_SURVEY"))
  googledrive::drive_auth()
  
  googledrive::drive_download(
    "PSYCH 490.012 Fall 2024 Survey 03 (Responses)",
    path = "csv/survey-03-crewell-et-al.csv",
    type = "csv",
    overwrite = TRUE
  )
  
  message("Data updated.")
} else {
  message("Using stored data.")
}

The data file has been saved as a comma-separated value (CSV) format data file in a special directory called csv/.

Note

Because these data might contain sensitive or identifiable information, we only keep a local copy and do not share it publicly via GitHub. This is achieved by adding the name of the data directory to a special .gitignore file.

Cleaning

Next we load the data file and clean it.

Code

survey_03 <-
  readr::read_csv("csv/survey-03-crewell-et-al.csv", show_col_types = FALSE)

# Google Forms puts the full question in the top row of the data file.
# We use the names() function to extract and print the original questions.
survey_03_qs <- names(survey_03)
survey_03_qs

 [1] "Timestamp"                                                                                                                                   
 [2] "The field of psychology has problems replicating results"                                                                                    
 [3] "Replication of research is only a problem in the field of psychology"                                                                        
 [4] "The incentive structure in psychological research can undermine the broader goals of science"                                                
 [5] "The results from studies with low statistical power are by definition incorrect"                                                             
 [6] "Researchers who perform replication studies are not qualified to conduct psychological research"                                             
 [7] "It is important for a researcher to report all measures and experimental conditions that were included in a study"                           
 [8] "For a researcher, how important is choosing a sample size before running a study?"                                                           
 [9] "How important is it to make data publicly available so that results can be verified by other researchers?"                                   
[10] "How important are decisions in data collection, analysis, and reporting in affecting how likely a researcher will find a significant effect?"
[11] "How important is it to report studies that “don’t work out?”"                                                                                
[12] "How important is it that results from a psychology study are counterintuitive (e.g., different from what you would expect)?"                 
[13] "Any comments?"

Clean/shorten names

For plotting and analyses, it’s usually easier to shorten the questions by creating a short name that reflects the underlying idea or construct. We’ll use the rename() function from the dplyr package for this.

Code

new_names <-
  c(
    "timestamp",
    "psych_problems_replicating",
    "replication_problem_psych_only",
    "incentives_undermine",
    "low_power_incorrect",
    "replicators_unqualified",
    "report_all_measures_important",
    "decide_n_before_important",
    "share_data_important",
    "collection_analysis_decisions_affect",
    "report_null_findings_important",
    "counterintuitive_results_important",
    "comments"
  )

# Swap out old (long) names for new (short) names
long_names <- names(survey_03)
names(survey_03) <- new_names

Drop “test” data

Code

survey_03_no_test <- survey_03 |>
  dplyr::mutate(comments = case_match(comments,
                                      NA ~ "none",
                                      .default = comments)) |>
  dplyr::filter(!(stringr::str_detect(comments, "test")))

Data dictionary

We’ll pause here to start building a data dictionary, a file that explains the origin, format, and usage of our dataset.

Code

# Make new data frame with long and short names for reference
survey_03_data_dictionary <-
  tibble::tibble(q_long = long_names, q_short = new_names,
                 q_type = c("none", "agree", "agree", "agree", "agree",
                            "agree", "important", "important", "important",
                            "important", "important", "important", "none"))

survey_03_data_dictionary |>
  knitr::kable(format = 'html') |>
  kableExtra::kable_classic()

q_long	q_short	q_type
Timestamp	timestamp	none
The field of psychology has problems replicating results	psych_problems_replicating	agree
Replication of research is only a problem in the field of psychology	replication_problem_psych_only	agree
The incentive structure in psychological research can undermine the broader goals of science	incentives_undermine	agree
The results from studies with low statistical power are by definition incorrect	low_power_incorrect	agree
Researchers who perform replication studies are not qualified to conduct psychological research	replicators_unqualified	agree
It is important for a researcher to report all measures and experimental conditions that were included in a study	report_all_measures_important	important
For a researcher, how important is choosing a sample size before running a study?	decide_n_before_important	important
How important is it to make data publicly available so that results can be verified by other researchers?	share_data_important	important
How important are decisions in data collection, analysis, and reporting in affecting how likely a researcher will find a significant effect?	collection_analysis_decisions_affect	important
How important is it to report studies that “don’t work out?”	report_null_findings_important	important
How important is it that results from a psychology study are counterintuitive (e.g., different from what you would expect)?	counterintuitive_results_important	important
Any comments?	comments	none

We’ll add other items to the data dictionary later.

Visualizations 1.0

Develop and test helper functions

I would like to retrieve the “long” form of the question from the data dictionary so that I can use it in my plots.

Code

# Retrieve the "long" question from the survey_03 data dictionary
retrieve_long_q <- function(this_q_short, data_dict = survey_03_data_dictionary) {
  assertthat::is.string(this_q_short)
  data_dict |>
    filter(q_short == this_q_short) |>
    select(q_long) |>
    as.character()
}

retrieve_q_type <- function(this_q_short, data_dict = survey_03_data_dictionary) {
  assertthat::is.string(this_q_short)
  data_dict |>
    filter(q_short == this_q_short) |>
    select(q_type) |>
    as.character()
}

retrieve_long_q("psych_problems_replicating")

[1] "The field of psychology has problems replicating results"

Code

retrieve_q_type("psych_problems_replicating")

[1] "agree"

Then, I can create my own histogram function that can pull the specific data for a variable from the data frame. And I wrap this in a second function that retrieves the long question name.

Code

my_hist <- function(data, var, q_long = "test", q_type = "none") {
  
  axis_label <- "Rating"
  if (q_type == "agree") {
    axis_label <- c("Strongly disagree <--> Strongly agree")    
  } else if (q_type == "important") {
    axis_label <- c("Not at all important <--> Very imporant")
  }
  
  data |>
    ggplot() +
    aes(x = {{var}}) +
    geom_histogram(bins = 9) +
    xlim(c(.5,5.5)) +
    ylim(c(0, 10)) +
    ggtitle(q_long) +
    xlab(axis_label)
}

my_hist_q <- function(var, data) {
  this_q <- retrieve_long_q(var)
  this_type <- retrieve_q_type(var) 
  my_hist(data, .data[[var]], this_q, this_type)
}

my_hist_q(var = "replication_problem_psych_only", data = survey_03)

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

True confessions

I’m showing the final products above. Creating and testing these took a little bit of time.

Before we plot the data, let’s remember what the scales look like:

Figure 1: Response options for some questions in Survey 03

Response options for some other questions in Survey 03 Now, we’re ready to print histograms for all of the data.

Code

my_vars <- names(survey_03)[2:(dim(survey_03)[2]-1)]

purrr::map(my_vars, my_hist_q, survey_03)

[[1]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[2]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[3]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[4]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[5]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[6]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[7]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[8]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[9]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[10]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).


[[11]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

Code

purrr::map(my_vars, my_hist_q, survey_03)

[[1]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[2]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[3]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[4]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[5]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[6]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[7]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[8]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[9]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[10]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

[[11]]

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).

To do

Some titles are too long to fit over the plots. This should be fixed in a future version.

As one approach, I need a function that splits a long string by inserting a line feed \n character at the break point(s).

Another approach would be to generate Rmarkdown code that includes the long question as a figure caption (fig.cap="My title").

Visualizations 2.0

Here, we try plotting the histograms again using principles we learned from metaprogramming.

Define the functions.

Code

# my_freq_plot <- function(data = survey_03, var, q_type = "important") {
#   axis_label <- "Rating"
#   if (q_type == "agree") {
#     axis_label <- c("Strongly disagree <--> Strongly agree")
#   } else if (q_type == "important") {
#     axis_label <- c("Not at all important <--> Very imporant")
#   }
#   
#   data |>
#     ggplot() +
#     aes(.data[[var]]) +
#     geom_freqpoly(na.rm = TRUE,
#                   show.legend = FALSE,
#                   bins = 30) +
#     xlim(c(.5, 5.5)) +
#     xlab(axis_label)
# }

my_freq_plot <- function(var = "psych_problems_replicating", data = survey_03) {
  
  q_type <- "none"
  q_type <- retrieve_q_type(var)
  
  axis_label <- "Rating"
  if (q_type == "agree") {
    axis_label <- c("Strongly disagree <--> Strongly agree")
  } else if (q_type == "important") {
    axis_label <- c("Not at all important <--> Very imporant")
  }
  
  data |>
    ggplot() +
    aes(.data[[var]]) +
    geom_freqpoly(na.rm = TRUE,
                  show.legend = FALSE,
                  bins = 30) +
    xlim(c(.5, 5.5)) +
    xlab(axis_label)
}

# my_freq_plot_q <- function(var = "psych_problems_replicating", data = survey_03) {
#   #this_q <- retrieve_long_q(var)
#   this_type <- retrieve_q_type(var)
#   data %>%
#     my_freq_plot(.data[[var]], this_type)
# }

# return_plot <- function(data, var) {
#   knitr::knit_child(
#     text = c(
#       "### Histogram for: `{var}`",
#       "\n",
#       "```{r, echo = F}",
#       "print(my_freq_plot(var, data))",
#       "```"
#     ),
#     envir = environment(),
#     quiet = TRUE
#   )
# }

# return_section <- function(data, var) {
#   chunk_hdr <- knitr::knit_expand(text = c("### Responses for: `{this_var}`", "\n"),
#                                   this_var = var)
#   
#   # Build fig.cap from ground up
#   fig_name <- paste0("#fig-dist-", var)
#   fig_cap <- paste0("'Distribution of responses to ", var, "'")
#   fig_caption <- paste0("fig.cap = ", fig_cap)
#   
#   plot_chunk_hdr <- paste0("```{r ",
#                            fig_name,
#                            ", echo = FALSE, warning = FALSE, ",
#                            fig_caption,
#                            "}")
#   
#   plot_chunk <- c(plot_chunk_hdr, "print(my_hist_q(var, data))", "```")
#   
#   question_long <-
#     paste0("\nQ: '", retrieve_long_q(var), "'")
#   
#   knitr::knit_child(
#     text = c(chunk_hdr, plot_chunk, question_long),
#     envir = environment(),
#     quiet = TRUE
#   )
# }

return_section <- function(data, var) {
  chunk_hdr <- knitr::knit_expand(text = c("### Responses for: '{{this_var}}'", "\n"),
                                  this_var = var)
  
  # Build fig.cap from ground up
  fig_name <- paste0("#fig-dist-", var)
  fig_cap <- paste0("'Distribution of responses to ", var, "'")
  fig_caption <- paste0("fig.cap = ", fig_cap)
  
  plot_chunk_hdr <- paste0("```{r ",
                           fig_name,
                           ", echo = FALSE, warning = FALSE, ",
                           fig_caption,
                           "}")
  
  plot_chunk <- c(plot_chunk_hdr, "print(my_freq_plot(var, data))", "```")
  
  question_long <-
    paste0("\nQ: '", retrieve_long_q(var), "'")
  
  knitr::knit_child(
    text = c(chunk_hdr, plot_chunk, question_long),
    envir = environment(),
    quiet = TRUE
  )
}

Run using lapply().

Code

these_vars <- names(survey_03)[2:12]

res <- invisible(lapply(these_vars, return_section, data = survey_03))
cat(unlist(res), sep = "\n")

Responses for: ‘psych_problems_replicating’

Figure 2: Distribution of responses to psych_problems_replicating

Q: ‘The field of psychology has problems replicating results’

Responses for: ‘replication_problem_psych_only’

Figure 3: Distribution of responses to replication_problem_psych_only

Q: ‘Replication of research is only a problem in the field of psychology’

Responses for: ‘incentives_undermine’

Figure 4: Distribution of responses to incentives_undermine

Q: ‘The incentive structure in psychological research can undermine the broader goals of science’

Responses for: ‘low_power_incorrect’

Figure 5: Distribution of responses to low_power_incorrect

Q: ‘The results from studies with low statistical power are by definition incorrect’

Responses for: ‘replicators_unqualified’

Figure 6: Distribution of responses to replicators_unqualified

Q: ‘Researchers who perform replication studies are not qualified to conduct psychological research’

Responses for: ‘report_all_measures_important’

Figure 7: Distribution of responses to report_all_measures_important

Q: ‘It is important for a researcher to report all measures and experimental conditions that were included in a study’

Responses for: ‘decide_n_before_important’

Figure 8: Distribution of responses to decide_n_before_important

Q: ‘For a researcher, how important is choosing a sample size before running a study?’

Responses for: ‘share_data_important’

Figure 9: Distribution of responses to share_data_important

Q: ‘How important is it to make data publicly available so that results can be verified by other researchers?’

Responses for: ‘collection_analysis_decisions_affect’

Figure 10: Distribution of responses to collection_analysis_decisions_affect

Q: ‘How important are decisions in data collection, analysis, and reporting in affecting how likely a researcher will find a significant effect?’

Responses for: ‘report_null_findings_important’

Figure 11: Distribution of responses to report_null_findings_important

Q: ‘How important is it to report studies that “don’t work out?”’

Responses for: ‘counterintuitive_results_important’

Figure 12: Distribution of responses to counterintuitive_results_important

Q: ‘How important is it that results from a psychology study are counterintuitive (e.g., different from what you would expect)?’

Post hoc thoughts

The keys to getting this to work were as follows:

In return_chunk(), generate separate text strings for the header (chunk_hdr), plot chunk (plot_chunk), and long question. See also the sequence for building a suitable string for fig.cap.
Combine these separate pieces within knitr::knit_child() with the text= parameter.
In my_freq_plot(), use the aes(.data[[var]]) syntax to turn the string value for var into an unquoted variable in the dataset.

Bottom line: It’s always better to bite off smaller chunks.

References

Chopik, W. J., Bremner, R. H., Defever, A. M., & Keller, V. N. (2018). How (and whether) to teach undergraduates about the replication crisis in psychological science. Teaching of Psychology, 45(2), 158–163. https://doi.org/10.1177/0098628318762900