This document summarizes an analyis of the p-hacking exercise. In it, we gather data about what individual students did and try to make sense of it.

Quantitative analysis

It often saves typing to load a set of commands into memory. In R, groups of useful commands are called ‘packages’. We can load a set of useful packages into memory by issuing the following command:

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Note

If you are interested in a career related to data science, tidyverse is a very powerful set of tools you will want to know more about.

Then I download the Google Sheet to a directory/folder called csv/ using the file name p-hacking-fa23.csv.

googledrive::drive_download(file ="PSYCH 490.009 2023 Fall P-hacking", path ="csv/p-hacking-fa23.csv", type ='csv', overwrite =TRUE)

File downloaded:

• 'PSYCH 490.009 2023 Fall P-hacking'
<id: 1JI_Qih4wCzUrYTQYE3dpvVx2C7GdzfUZq0a3QhqQyeE>

Saved locally as:

• 'csv/p-hacking-fa23.csv'

Note

What does CSV mean?

Why are CSV files often used in data analysis?

One answer is that CSV files are inter-operable and largely reusable, two of the characteristics recommended for sharing data under the FAIR principles (Wilkinson et al., 2016).

Next, I read the CSV file using the read_csv() function.

How many different combinations of variable choices are there?

There are \(n=4\) measures of political control; \(n=4\) measures of economic performance; \(n=2\) ‘other’ factors; \(n=2\) prediction choices; and \(n=2\) political parties to focus on.

We can use the combinat package to help us figure this out.

And there is only one way to pick 4 among 4. Make sense?

If we add these up ‘4 + 6 + 4 + 1’ = 15 we get the number of different choices we can make (15) about how many combinations of political power measures are possible.

Since there are also 4 different choices of economic performance measures, we know that there are 15 ways to pick these. Now we can calculate how many different possible combinations of variables there are.

n_combos <-15*15*2*2*2

We multiply because each of the choices (political power, economic performance, party, better or worse is independent).

So, there are \(n=\) 1800 of variables we could have chosen. How does this impact the conclusions we can and should draw?

Combine with Spring 2023 data?

We did the same exercise in Spring 2023. Let’s combine our data with theirs.

`summarise()` has grouped output by 'party', 'prediction'. You can override
using the `.groups` argument.

party

prediction

econ_measures

n_preds

democrats

better

employment

18

democrats

better

gdp

16

democrats

better

inflation

10

democrats

better

stocks

9

republicans

worse

employment

7

republicans

worse

gdp

6

republicans

worse

inflation

6

republicans

better

stocks

5

democrats

worse

employment

4

democrats

worse

stocks

4

republicans

better

employment

4

republicans

better

inflation

4

republicans

better

gdp

2

democrats

worse

gdp

0

democrats

worse

inflation

0

republicans

worse

stocks

0

Figure 5: Respondents’ choices of economic measures in their analyses

References

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A., … Mons, B. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18