This markdown is designed to provide an introduction to data visualization in R. Primarily it will cover ggplot2; although a few advanced options are also covered. Questions about code can be directed to Alicia Vallorani (auv27@psu.edu).
df <- read.csv("../data/ggplot2_tutorial_vallorani.csv", stringsAsFactors = FALSE) %>%
mutate_at(vars(bi, sex, socbid_group, prosoc_group), as.factor)
These data are drawn from a cross-sectional study assessing attention to the social environment and socio-emotional behaviors. Children are 5-7 and complete a social dyad with a novel peer. Children are of differing temperament: one fearful (BI) and one non-fearful (BN). Mobile eyetracking and behavioral data are collected across the 5 free-play interaction. General expecations for analysis would be to see BI children engaging in fewer social behaviors than BN children.
id: participant id did: dyad id sex: 1 = boy; 2 = girl bi: 0 = BN; 1 = BI biq: continuous measure of temperamental fearfulness proportion_peerbody: dwell time to peer body proportion_peerface: dwell time to peer face proportion_self: dwell time to self proportion_toys: dwell time to toys proportion_other: dwell time to other stimuli proportion_socconv: time in social conversation (discussion about topics other than play) proportion_playconv: time in play conversation (discussion about ongoing play) socialbid_group: 0 = few social bids; 1 = medium social bids; 2 = many social bids prosoc_group: 0 = few prosocial behaviors; 1 = medium prosocial behaviors; 2 = many prosocial behaviors
str(df)
## 'data.frame': 10 obs. of 15 variables:
## $ id : int 7063 7078 7092 7067 7074 7083 7111 7089 7088 7116
## $ did : int 1 3 3 1 2 4 2 4 6 6
## $ sex : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 2 2
## $ age : num 6.9 5.78 6.18 6.96 6.54 ...
## $ bi : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 2 1 1 2
## $ biq : int 67 108 126 114 101 121 116 106 97 122
## $ proportion_peerbody: num 0.0629 0.1314 0.0447 0.0572 0.1404 ...
## $ proportion_peerface: num 0.0015 0.0829 0 0 0.3445 ...
## $ proportion_self : num 0.01798 0.00136 0.09463 0.02448 0.00154 ...
## $ proportion_toys : num 0.889 0.693 0.615 0.799 0.364 ...
## $ proportion_other : num 0.0285 0.0917 0.2459 0.1192 0.1499 ...
## $ proportion_socconv : num 0 0.0113 0.0203 0 0.1032 ...
## $ proportion_playconv: num 0.382 0.127 0.377 0.383 0.225 ...
## $ socbid_group : Factor w/ 3 levels "0","1","2": 1 1 1 3 3 1 2 3 2 2
## $ prosoc_group : Factor w/ 3 levels "0","1","2": 1 1 1 2 2 3 3 3 2 2
This section walks through making histograms for single and multiple variables.
# Looking at a histogram for a single variable
ggplot(df, aes(proportion_toys)) +
geom_histogram(bins = 5) # you can change the bin value to best fit your data
# Looking at histograms for all variables ggplot option
ggplot(df %>% select(starts_with("proportion"), age) %>% # selecting non-binary variables
gather(), aes(value)) + # grouping for visualization
geom_histogram(bins = 5) +
facet_wrap(~key, scales = "free_x") # free_x allows for differing x-axes
## Warning: Removed 4 rows containing non-finite values (stat_bin).
This section walks through how to make a simple scatterplot between two variables. Additionally, you can add a fit line and look at how scatterplots may vary across groups.
ggplot(df, aes(x=proportion_toys, y=proportion_socconv)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).
#stat_smooth()
#stat_smooth(method = "lm", color = "black", se = TRUE)
#facet_wrap(~bi)
#facet_wrap(~socbid_group)
#face_grid(bi~sex)
This section walks through examining multiple relations simultaneoulsly. These plots are created using GGally which couples with ggplot2.
ggpairs(df %>%
select(starts_with("proportion")) %>%
na.omit(), progress=FALSE, lower = list(combo = wrap("facethist", bins=6)))
# Matrix including dichotomous variables
ggpairs(df %>%
dplyr::select(bi, socbid_group, proportion_peerbody,
proportion_socconv) %>% # subsetting for visualization
dplyr::rename(socbid = socbid_group, peerbody = proportion_peerbody,
socconv = proportion_socconv) %>% # renaming variables for figure
na.omit(), progress=FALSE, lower = list(combo = wrap("facethist", bins=6)))
#mapping = aes(color = bi)) # coloring the plot by group
This section walks through a basic bar graph. Once the base graph is made, you can make edits to the theme and axes to your preference. You can also include error bars
ggplot(df, aes(x = bi, y = proportion_peerbody)) +
geom_bar(stat = "identity")
#theme_bw()
#theme(panel.grid.major.x = element_blank(),
#panel.grid.minor.x = element_blank(),
#panel.grid.major.y = element_blank(),
#panel.grid.minor.y = element_blank(),
#panel.spacing = unit(1.5, "lines"))
#labs(x = "Behavioral Inhibition Group",
#y = "Proportion of Time Looking at Peer Body")
#scale_x_discrete(limits = c("1", "0"),
#labels = c("BI", "BN"))
## Adding error bars
df_sum <- summarySE(df, measurevar="proportion_peerbody", groupvars=c("bi")) # creating a summary of the variables of interest to extract error bars using Rmisc package
df_sum
## bi N proportion_peerbody sd se ci
## 1 0 5 0.12189033 0.04926121 0.02203028 0.06116587
## 2 1 5 0.06188318 0.03557169 0.01590814 0.04416808
ggplot(df_sum, aes(x=bi, y=proportion_peerbody)) + # we use the summary we created to plot
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=proportion_peerbody-se,
ymax=proportion_peerbody+se),
width=.2, position=position_dodge(.9))
This section walks through how to look at a interaction between dichotomous variables using a bar graph. After creating the base graph, you can make additional changes to asthetic elements such as changing colors, legends and bar direction.
ggplot(df, aes(x=bi, y=proportion_socconv, fill = sex)) +
geom_bar(stat = "identity", position = "dodge")
## Warning: Removed 2 rows containing missing values (geom_bar).
#scale_fill_brewer(palette = "Set2") # pre-designed color palettes
#labs(fill = "Sex")
#scale_fill_manual(labels = c("Boys", "Girls"), # Rename fills
#values = c("#ff8c00", "#5898d7")) # select your own color options
#theme(legend.justification=c(-0.1,1), legend.position=c(.75,.95))
#coord_flip()
This section provides an example of how to examine a three-way interaction where one variable is dichotomous.
ggplot(df, aes(x=proportion_toys, y=proportion_socconv, color = sex)) +
geom_point() +
stat_smooth(method = lm, se = FALSE)
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
#facet_wrap(~bi)
This section provides some advanced options for examining continuous interactions using the jtools package.
# Run your model to feed into jtools functions
lm1 <- lm(proportion_socconv~biq*proportion_toys, df)
# Creating a +/- 1SD plot
interact_plot(lm1, pred = biq, modx = proportion_toys,
plot.points = TRUE,
x.label = "Behavioral Inhibition",
y.label = "Proportion of Time Engaging in Social Conversation",
legend.main = "Prop Dwell Toys")
# Creating a regions of significance plot
johnson_neyman(lm1, pred = biq, modx = proportion_toys, alpha = 0.05)
## JOHNSON-NEYMAN INTERVAL
##
## When proportion_toys is INSIDE the interval [0.25, 0.80], the slope of
## biq is p < .05.
##
## Note: The range of observed values of proportion_toys is [0.36, 0.89]
This is an example of the work that goes into creating a plot worthy of publication. There are also a couple of ways to export your final plot depending on if you would like to make further edits in a program such as inkscape or illustrator.
df %>%
dplyr::select(id, bi, proportion_peerbody, proportion_socconv) %>%
gather(key = type, value = proportion, proportion_peerbody, proportion_socconv) -> social
social_plot <- ggplot(social, aes(type, proportion, fill = bi)) +
geom_bar(stat = "identity", position = position_dodge()) +
scale_x_discrete(limits = c("proportion_peerbody",
"proportion_socconv"),
labels = c("Dwell Peer", "Social Conversation")) +
labs(x = "Type of Behavior",
y = "Proportion of Time Engaged in Behavior",
fill = "Temperament") +
scale_fill_manual(labels = c("BN", "BI"), values = c("#ff8c00", "#5898d7")) +
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.spacing = unit(1.5, "lines")) +
theme(axis.text = element_text(size = 12)) +
theme(axis.title = element_text(size = 14)) +
theme(legend.text = element_text(size = 12)) +
theme(legend.title = element_text(size = 14)) +
theme(legend.justification=c(-0.1,1), legend.position=c(.72,.95))
ggsave("../figures/social_plot.png", plot = social_plot,
width = 6, height = 6, dpi = 300) #make a 6 x 6 inch PNG file with 300 DPI
## Warning: Removed 2 rows containing missing values (geom_bar).
# Vector graphic: for editable figures in Inkscape or Illustrator
svg("../figures/social_plot.svg")
plot(social_plot)
## Warning: Removed 2 rows containing missing values (geom_bar).
dev.off()
## quartz_off_screen
## 2
#And for fun, just print it here (for html display)
plot(social_plot)
## Warning: Removed 2 rows containing missing values (geom_bar).