More slow R

2025-03-04

Rick Gilmore

Prelude

Announcements

Last time…

  • Communicating uncertainty and risk

Challenges

  • Communicating variability
  • Communicating statistical info/summaries
  • What are our goals?

Are the means different?

Code
two_sets <- readr::read_csv("../include/csv/two_sets_1_sd.csv", show_col_types = FALSE)
 t.test(value ~ sample_name, data = two_sets, var.equal = TRUE)

    Two Sample t-test

data:  value by sample_name
t = -8.3795, df = 198, p-value = 9.726e-15
alternative hypothesis: true difference in means between group x0 and group x1 is not equal to 0
95 percent confidence interval:
 -1.3760882 -0.8517859
sample estimates:
mean in group x0 mean in group x1 
      -0.1014284        1.0125087 

Your thoughts?

  • Group means differ; estimated difference in [.36, .90]
  • Reject null hypothesis of no difference, t(198)=4.57, p<.0001.
  • Which figure best meets our goals?

Today

  • More Slow R

More Slow-R

Storing things

  • Long-term (between work sessions) in files.
  • Short-term (during your work session) in volatile memory.
    • e.g., object names
 [1] "_feynmann.qmd"                                  
 [2] "_gilmore-photo-bio.qmd"                         
 [3] "_merton.qmd"                                    
 [4] "_metadata.yml"                                  
 [5] "_reddish-green.Rmd"                             
 [6] "img_gilmore_bio"                                
 [7] "wk01-2025-01-14-course-intro.html"              
 [8] "wk01-2025-01-14-course-intro.qmd"               
 [9] "wk01-2025-01-16-semiotics-data-viz.html"        
[10] "wk01-2025-01-16-semiotics-data-viz.qmd"         
[11] "wk02-2025-01-21-govt-biz.html"                  
[12] "wk02-2025-01-21-govt-biz.qmd"                   
[13] "wk02-2025-01-23-art-sports-journ.html"          
[14] "wk02-2025-01-23-art-sports-journ.qmd"           
[15] "wk03-2025-01-28-making-data.html"               
[16] "wk03-2025-01-28-making-data.qmd"                
[17] "wk03-2025-01-30-figure-types.html"              
[18] "wk03-2025-01-30-figure-types.qmd"               
[19] "wk04-2025-02-04-figure-components.html"         
[20] "wk04-2025-02-04-figure-components.qmd"          
[21] "wk05-2025-02-11-stim-to-sensation.html"         
[22] "wk05-2025-02-11-stim-to-sensation.qmd"          
[23] "wk05-2025-02-13-sensation-to-perception.html"   
[24] "wk05-2025-02-13-sensation-to-perception.qmd"    
[25] "wk06-2025-02-18-cognition-to-understanding.html"
[26] "wk06-2025-02-18-cognition-to-understanding.qmd" 
[27] "wk06-2025-02-20-designing-viz.html"             
[28] "wk06-2025-02-20-designing-viz.qmd"              
[29] "wk07-2025-02-25-intro-to-r.html"                
[30] "wk07-2025-02-25-intro-to-r.qmd"                 
[31] "wk07-2025-02-27-viz-uncertainty-risk.html"      
[32] "wk07-2025-02-27-viz-uncertainty-risk.qmd"       
[33] "wk08-2025-03-04-more-slow-r.qmd"                
[34] "wk08-2025-03-04-more-slow-r.rmarkdown"          
[35] "wk08-2025-03-06-critiquing-figs.qmd"            
[36] "wk09-2025-03-18-why-r.qmd"                      
[37] "wk10-2025-03-25-gathering-cleaning.qmd"         
[38] "wk10-2025-03-27-making-plots.qmd"               
[39] "wk11-2025-04-01-more-ggplot.qmd"                
[40] "wk11-2025-04-03-intro-to-python.qmd"            

Storing things

  • Directory/folder hierarchy
  • Like a tree with branches
[1] "/Users/rog1/rrr/psych-490-data-viz-2025-spring/src/slides"

Sequences and repetitions

 [1] -6 -5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
 [1] 10  9  8  7  6  5  4  3  2  1
 [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32
[1] "I love R" "I love R" "I love R" "I love R"

Data workflow

  • Collect
  • Gather
  • Clean
  • Visualize/plot
  • Analyze

Data workflow we can script

  • Collect
  • Gather
  • Clean
  • Visualize/plot
  • Analyze

Data workflow we can script

  • Collect
  • Gather
  • Clean
  • Visualize/plot
  • Analyze

Gather data

  • How to acquire/download
    • Download manually
    • Download programmatically (via code)
  • How to save
    • Text formats, e.g., comma-separated values (csv), best

Download manually

  • Visit forms.google.com

Download manually

  • Pick a form

Download manually

  • Click on more options (…) menu

Download manually

Download manually

  • Go to default download location (varies by computer)
  • Downloads as a zip (compressed) file

Download manually

  • Open compressed file

Download manually

  • Move file to target location

Giphy.com

Download automatically

  • Two manual steps
    • Create Google Sheets file: Click on “Link to Sheets”

Download automatically

Download automatically

  • Login to Google
  • Download specific URL
  • Save where I decide (include/csv) and with the name I choose (PSYCH-490.003-Exercise-3.csv)

Download automatically

[1] "assignments.csv"                   "NSFG_2022_2023_FemPregPUFData.csv"
[3] "PSYCH 490.003 Exercise 3.csv"      "PSYCH-490.003-Exercise-3.csv"     
[5] "two_sets_1_sd.csv"                

Import CSV

[1] 25  9
[1] "Timestamp"                                                                                
[2] "Favorite Icecream Flavor"                                                                 
[3] "Best Pet Type"                                                                            
[4] "How confident are you with your math skills? [How confident are you in your math skills?]"
[5] "How confident are you with your math skills? [How creative are you?]"                     
[6] "How many concerts have you gone to?"                                                      
[7] "How many credits are you taking this semester?"                                           
[8] "What is the date of your favorite holiday?"                                               
[9] "Comments"                                                                                 

Import CSV

[1] 25  9
[1] "Timestamp"                                                                                
[2] "Favorite Icecream Flavor"                                                                 
[3] "Best Pet Type"                                                                            
[4] "How confident are you with your math skills? [How confident are you in your math skills?]"
[5] "How confident are you with your math skills? [How creative are you?]"                     
[6] "How many concerts have you gone to?"                                                      
[7] "How many credits are you taking this semester?"                                           
[8] "What is the date of your favorite holiday?"                                               
[9] "Comments"                                                                                 

Clean then visualize or vice versa?

Favorite Icecream Flavor
          Chocolate Mint Chocolate Chip          Strawberry             Vanilla 
                  5                   9                   4                   7 

Under the hood

  • R prefers that variable names not have spaces
  • When names have spaces, we have to communicate that to R
  • So, we wrap Favorite Icecream Flavor in backticks (“`”).
  • the xtabs() function does cross-tabulations

Another cross-tabulation

                        Best Pet Type
Favorite Icecream Flavor Cat Dog
     Chocolate             2   3
     Mint Chocolate Chip   3   6
     Strawberry            2   2
     Vanilla               2   5

Clean then visualize

  • Rename variable names: Shorter, remove spaces
  • But capture the actual questions for later

Why script?

  • Much more reproducible and robust
  • Especially for complex sequences of tasks
  • Be kind to your future (forgetful) self

Work session

DataCamp status

DataCamp leaderboard

Next time

Critiquing figures

Resources

References