Goals of this markdown

This markdown is designed to provide an introduction to data visualization in R. Primarily it will cover ggplot2; although a few advanced options are also covered. Questions about code can be directed to Alicia Vallorani (auv27@psu.edu).

Read in data

df <- read.csv("../data/ggplot2_tutorial_vallorani.csv", stringsAsFactors = FALSE) %>%
  mutate_at(vars(bi, sex, socbid_group, prosoc_group), as.factor) 

Examine data structure

These data are drawn from a cross-sectional study assessing attention to the social environment and socio-emotional behaviors. Children are 5-7 and complete a social dyad with a novel peer. Children are of differing temperament: one fearful (BI) and one non-fearful (BN). Mobile eyetracking and behavioral data are collected across the 5 free-play interaction. General expecations for analysis would be to see BI children engaging in fewer social behaviors than BN children.

Describe data structure

id: participant id did: dyad id sex: 1 = boy; 2 = girl bi: 0 = BN; 1 = BI biq: continuous measure of temperamental fearfulness proportion_peerbody: dwell time to peer body proportion_peerface: dwell time to peer face proportion_self: dwell time to self proportion_toys: dwell time to toys proportion_other: dwell time to other stimuli proportion_socconv: time in social conversation (discussion about topics other than play) proportion_playconv: time in play conversation (discussion about ongoing play) socialbid_group: 0 = few social bids; 1 = medium social bids; 2 = many social bids prosoc_group: 0 = few prosocial behaviors; 1 = medium prosocial behaviors; 2 = many prosocial behaviors

str(df)
## 'data.frame':    10 obs. of  15 variables:
##  $ id                 : int  7063 7078 7092 7067 7074 7083 7111 7089 7088 7116
##  $ did                : int  1 3 3 1 2 4 2 4 6 6
##  $ sex                : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 2 2
##  $ age                : num  6.9 5.78 6.18 6.96 6.54 ...
##  $ bi                 : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 2 1 1 2
##  $ biq                : int  67 108 126 114 101 121 116 106 97 122
##  $ proportion_peerbody: num  0.0629 0.1314 0.0447 0.0572 0.1404 ...
##  $ proportion_peerface: num  0.0015 0.0829 0 0 0.3445 ...
##  $ proportion_self    : num  0.01798 0.00136 0.09463 0.02448 0.00154 ...
##  $ proportion_toys    : num  0.889 0.693 0.615 0.799 0.364 ...
##  $ proportion_other   : num  0.0285 0.0917 0.2459 0.1192 0.1499 ...
##  $ proportion_socconv : num  0 0.0113 0.0203 0 0.1032 ...
##  $ proportion_playconv: num  0.382 0.127 0.377 0.383 0.225 ...
##  $ socbid_group       : Factor w/ 3 levels "0","1","2": 1 1 1 3 3 1 2 3 2 2
##  $ prosoc_group       : Factor w/ 3 levels "0","1","2": 1 1 1 2 2 3 3 3 2 2

Examining univariate distributions - histograms

This section walks through making histograms for single and multiple variables.

# Looking at a histogram for a single variable
ggplot(df, aes(proportion_toys)) +
  geom_histogram(bins = 5) # you can change the bin value to best fit your data

# Looking at histograms for all variables ggplot option
ggplot(df %>% select(starts_with("proportion"), age) %>% # selecting non-binary variables
         gather(), aes(value)) + # grouping for visualization
    geom_histogram(bins = 5) + 
    facet_wrap(~key, scales = "free_x") # free_x allows for differing x-axes 
## Warning: Removed 4 rows containing non-finite values (stat_bin).

Examining zero-order relations - scatterplots

This section walks through how to make a simple scatterplot between two variables. Additionally, you can add a fit line and look at how scatterplots may vary across groups.

ggplot(df, aes(x=proportion_toys, y=proportion_socconv)) +
  geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

  #stat_smooth() 
  #stat_smooth(method = "lm", color = "black", se = TRUE) 
  #facet_wrap(~bi) 
  #facet_wrap(~socbid_group)
  #face_grid(bi~sex)

Examining zero-order relations - scatterplot matrices

This section walks through examining multiple relations simultaneoulsly. These plots are created using GGally which couples with ggplot2.

ggpairs(df %>%
          select(starts_with("proportion")) %>%
                 na.omit(), progress=FALSE, lower = list(combo = wrap("facethist", bins=6)))

# Matrix including dichotomous variables
ggpairs(df %>% 
          dplyr::select(bi, socbid_group, proportion_peerbody, 
                 proportion_socconv) %>% # subsetting for visualization
          dplyr::rename(socbid = socbid_group, peerbody = proportion_peerbody, 
                 socconv = proportion_socconv) %>% # renaming variables for figure
          na.omit(), progress=FALSE, lower = list(combo = wrap("facethist", bins=6)))

        #mapping = aes(color = bi)) # coloring the plot by group

Examining group differences - bar graphs

This section walks through a basic bar graph. Once the base graph is made, you can make edits to the theme and axes to your preference. You can also include error bars

ggplot(df, aes(x = bi, y = proportion_peerbody)) +
  geom_bar(stat = "identity") 

  #theme_bw() 
  #theme(panel.grid.major.x = element_blank(),
        #panel.grid.minor.x = element_blank(),
        #panel.grid.major.y = element_blank(),
        #panel.grid.minor.y = element_blank(),
        #panel.spacing = unit(1.5, "lines")) 
  #labs(x = "Behavioral Inhibition Group",
       #y = "Proportion of Time Looking at Peer Body") 
  #scale_x_discrete(limits = c("1", "0"),
                   #labels = c("BI", "BN"))

## Adding error bars
df_sum <- summarySE(df, measurevar="proportion_peerbody", groupvars=c("bi")) # creating a summary of the variables of interest to extract error bars using Rmisc package
df_sum
##   bi N proportion_peerbody         sd         se         ci
## 1  0 5          0.12189033 0.04926121 0.02203028 0.06116587
## 2  1 5          0.06188318 0.03557169 0.01590814 0.04416808
ggplot(df_sum, aes(x=bi, y=proportion_peerbody)) + # we use the summary we created to plot
    geom_bar(position=position_dodge(), stat="identity") +
    geom_errorbar(aes(ymin=proportion_peerbody-se,
                      ymax=proportion_peerbody+se), 
                  width=.2, position=position_dodge(.9))

Examining interactions - bar graphs

This section walks through how to look at a interaction between dichotomous variables using a bar graph. After creating the base graph, you can make additional changes to asthetic elements such as changing colors, legends and bar direction.

ggplot(df, aes(x=bi, y=proportion_socconv, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") 
## Warning: Removed 2 rows containing missing values (geom_bar).

  #scale_fill_brewer(palette = "Set2") # pre-designed color palettes
  #labs(fill = "Sex") 
  #scale_fill_manual(labels = c("Boys", "Girls"), # Rename fills
                    #values = c("#ff8c00", "#5898d7")) # select your own color options
  #theme(legend.justification=c(-0.1,1), legend.position=c(.75,.95)) 
  #coord_flip()

Examining interactions - 3 variable interactions

This section provides an example of how to examine a three-way interaction where one variable is dichotomous.

ggplot(df, aes(x=proportion_toys, y=proportion_socconv, color = sex)) +
  geom_point() +
  stat_smooth(method = lm, se = FALSE) 
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

  #facet_wrap(~bi)

Examining interactions - continuous

This section provides some advanced options for examining continuous interactions using the jtools package.

# Run your model to feed into jtools functions
lm1 <- lm(proportion_socconv~biq*proportion_toys, df) 

# Creating a +/- 1SD plot
interact_plot(lm1, pred = biq, modx = proportion_toys,
              plot.points = TRUE,
              x.label = "Behavioral Inhibition",
              y.label = "Proportion of Time Engaging in Social Conversation",
              legend.main = "Prop Dwell Toys")

# Creating a regions of significance plot
johnson_neyman(lm1, pred = biq, modx = proportion_toys, alpha = 0.05)
## JOHNSON-NEYMAN INTERVAL 
## 
## When proportion_toys is INSIDE the interval [0.25, 0.80], the slope of
## biq is p < .05.
## 
## Note: The range of observed values of proportion_toys is [0.36, 0.89]

Publication quality graphs and exporting

This is an example of the work that goes into creating a plot worthy of publication. There are also a couple of ways to export your final plot depending on if you would like to make further edits in a program such as inkscape or illustrator.

df %>%
  dplyr::select(id, bi, proportion_peerbody, proportion_socconv) %>%
  gather(key = type, value = proportion, proportion_peerbody, proportion_socconv) -> social

social_plot <- ggplot(social, aes(type, proportion, fill = bi)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  scale_x_discrete(limits = c("proportion_peerbody", 
                              "proportion_socconv"), 
                   labels = c("Dwell Peer", "Social Conversation")) +
  labs(x = "Type of Behavior",
       y = "Proportion of Time Engaged in Behavior",
       fill = "Temperament") +
  scale_fill_manual(labels = c("BN", "BI"), values = c("#ff8c00", "#5898d7")) +
  theme_bw() +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        panel.spacing = unit(1.5, "lines")) +
  theme(axis.text = element_text(size = 12)) +
  theme(axis.title = element_text(size = 14)) +
  theme(legend.text = element_text(size = 12)) +
  theme(legend.title = element_text(size = 14)) +
  theme(legend.justification=c(-0.1,1), legend.position=c(.72,.95))

ggsave("../figures/social_plot.png", plot = social_plot, 
       width = 6, height = 6, dpi = 300) #make a 6 x 6 inch PNG file with 300 DPI
## Warning: Removed 2 rows containing missing values (geom_bar).
# Vector graphic: for editable figures in Inkscape or Illustrator
svg("../figures/social_plot.svg")
plot(social_plot)
## Warning: Removed 2 rows containing missing values (geom_bar).
dev.off()
## quartz_off_screen 
##                 2
#And for fun, just print it here (for html display)
plot(social_plot)
## Warning: Removed 2 rows containing missing values (geom_bar).